Talk:Storage virtualization

From Wikipedia, the free encyclopedia

[edit] Article Improvements

  • Suggested structure.... deleted as its now in place in the article ... Notice the complete lack of vendor and product references. Some of the storage virtualization technologies are still emerging and others are waning, so it would be hard to give a fair treatment to all. If anyone wants to take that on, be my guest! Plowden 20:57, 19 August 2006 (UTC)
I'll happily start drafting some content as outlined above, may need some grammatical checking, but content wise not a problem (been working in Storage Virt for 5+ years) Agree that leaving vendor specifics and product references out is a good idea - also an unbiased view of implementation approach is needed, as each vendor tends to try and sell theirs as the 'correct' approach. Baz whyte 11:38, 25 February 2007 (UTC)
Sure, go ahead. It looks quite good. I may also chip in sometimes. :) --soumসৌমোyasch 02:42, 26 February 2007 (UTC)
  • OK, so I've made a stab at this, hopefully a vast improvement on what was there, but still needs a few bits filled in - and most def needs gramatical checking / wikifying! Baz whyte 20:37, 26 February 2007 (UTC)
  • I have undone the edits changing the grammatical correctness of "data" being a plural noun. In the general case "data is" is almost always used rather than the strange looking "data are" - while I understand that a single piece of data is a datum - this almost never used. data being both singular and plural is the more common use in the IT industry and as such "data are" does not read well. Baz whyte 18:52, 29 July 2007 (UTC)
  • A subjective statement such as '"data are" does not read well' doesn't stand up well against objective facts of grammar. In fact, to many, "data is" is just plain grating and distracting. Then again, I also tend to use predicate nominatives correctly, which sounds odd to some folks. In any case, most technical journals require that "data" be used as a plural noun (see the Wikipedia on Data), and this is a technical article. Rknasc 20:08, 30 July 2007 (UTC)
  • OK, so I bow down. I've corrected all meta-data instances to use the correct grammar and replaced the plan "data is" with "information is" - hopefully everyone is happy :) Baz whyte 20:44, 3 August 2007 (UTC)
  • I see some issues with this article. I will try to classify them:

1. Section "Implementation Approaches" -> "Storage-device based" -> "Cons" "Storage utilisation optimised only across the connected controllers" - That's pretty obvious fact and it could not be defined as disadvantage. If you do not connect a controller by some means to the hosts (by FC SAN, iSCSI LAN, proprietary connection etc.), it will not be available anyway. I would say instead, that "Depending on the specific storage virtualization product it is possible that some controllers could not be virtualized, because of interface incompatability or design limitations". Most of the cases when a storage controller could not be virtualized fall in the following categories:

- The controller has proprietary / obsolete interface not supported by the storage virtualization product(i.e. different from Fiber Channel or Ethernet)
- The controller requires host-based drivers / management software for fail-over or other purposes
- The controller is "locked" by design to interoperate only with specific (usually) same-vendor host (e.g. HP NonStop Modular I/O System, IOAM / FCDM)

"Replication and data migration only possible across the connected controllers and same vendors device for long distance support" - Absolutely not true - most of the storage virtualization solutions allows you all kind of remote replication services in purely heterogenous environment. I would say this is one of their greates benefits - just to name some of them: DataCore SANMelody / SANSymphony, Hitachi Data Systems USP V Enterprise Storage Controllers, EMC Invista, IBM SVC etc. etc. "Downstream controller attachment limited to vendors support matrix" - That's a valid statement, but it should be part of the story about the proprietary and non-industry standard controllers. Also "Downstream controller" phrase is a bit confusing, becasue the underlaying controllers service both down & up streams of data ;-). Usually the storage virtualization vendors use the term "Back-end storage" "I/O Latency, non cache hits require the primary storage controller to issue a secondary downstream I/O request" - This is generally true, but the same happens in every storage controller if there is cache miss - the I/O is being requested from the slow hard-drives. The main reason for the added latency is the fact that you have more "hops" from the host to the "downstream" (back-end) storage controller - the I/O request goes from the Host through the SAN switch to the Virtualization Appliance / Server / Controller, then if the cache misses, a secondary I/O request is generated that goes to though the SAN switch to the "donwstream" (back-end) storage controller, which checks its cache, and if it misses as well, the I/O request is serviced by the hard-drives. The way back is again through the Storage Virtualization Appliance / Server / Controller. On the other hand the added latency is neglible in 99% of the cases (a few microseconds per I/O cache miss) in comparison with the typical I/O response time which is in the range of milliseconds. It is important to note another fact - the DataCore software works on any x86 server and uses up to 80% of the available RAM for block-level caching. The latest 32-bit SANSymphony supports 32GB of cache and this cache is A magnitude cheaper than the storage controllers' cache memory (10-20 times lower price in USD per Gigabyte) and also a magnitude faster than the controllers' cache (e.g. DDR2 1066MHz vs. SDRAM 266MHz). So, in a real-world scenario the virtualization usually improves the overall performance considerably, in terms of response time, IOPS and throughput (MB/s). 2. Section "Implementation Approaches" -> "Storage-device based" -> "Specific Examples"

- IBM SVC is not exactly a "Storage-device based", because it is an appliance that could be added to any SAN and could utilise wide range of 3rd party storage controllers
- DataCore software products are just software products that could be run on any industry standard x86 server with Windows 2003 Server OS. They could utilise an extremely wide range of 3rd party storage controllers AND appliances, including any internal storage presented in the x86 server they are running on. The supported interfaces to the back-end storage controllers are Fiber Channel (1/2/4/8Gb), iSCSI (10/100/1000/10G Ethenret), Infiniband 4x. The storage compatability matrix is only limited to the storage controllers capable of presenting LUNs to Windows 2003 Server through any of the previously mentioned interfaces, which practially means almost every storage controller on the market I could think of.

3. Other suggestions of improvements - A bit more detailed explanations about the Thin Provisioning / Dynamic Provisioning features provided by most of the storage virtualizaiton products (more than 10 vendors currently) - the thin provisioning is cute, but it has drawbacks as well - deleting files from a virtual volume (VLUN etc.) DOES NOT reclaims / frees-up data blocks from the back-end storage. It can't by design, because all solutions on the market work below the Host OS level and THEY DO NOT understand what's going on in the file sytem of a particular host. If you delete a file, only the file system of the host knows which blocks are logically free. The fact that some file systems or 3rd party utilities can zero-out the blocks once occupied by deleted files does not help. A block full of zeroes is still a block with data. Unfortunately, there is no "NULL" or "EMPTY" value in the mathematics and no file system could write a "NULL" or "EMPTY" value in a storage block. This is done by the various RDBMS... - Addition to the "Capabilities" section - Interface bridging - almost all available solutions could bridge Fiber Channel SANs to iSCSI LANs regardless of the hosts and the backed storage controllers - Addition to the "Capabilities" section - Vendor / Storage Arragy Generation bridging - try to implement replication of volumes between IBM DS400 and EMC Clarion storage controllers for example. No way, unless you virtualize them before... - Addition to the "Capabilities" section - Policy Based Quality of Service - Hitachi and DataCore support it in their latest products. Pooling is one thing, setting IOPS and Throughput QoS polices per VLUN / Storage Pool is a much bigger thing. - Addition to the "Capabilities" section - Continuous Data Protection / Storage Transaction Journaling - currently this feature is truely supported only by DataCore Traveller Add-On. This allows in-band synchronous replication through Fiber Channel to a CDP server which logs in-order all block-level transactions to a LOG database (just like Oracle or MS SQL RDBMS do). This gives you the possibility to select any past moment in time (even years ago, up to a minute) and then generate a 100% consistent virtual representation of a CDP protected VLUN containing the data as they looked in the selected past moment. In order to achieve consistency the system looks for the last successfull commit flag. This works for any VLUN, regardles of the file-system or the host that uses the LUN... - There are even more thins to incude, but I am running out of time now.

Please, provide some feedback and I could do some editions to the article.

10th of May 2008 - Vasko —Preceding unsigned comment added by 213.240.236.130 (talk) 21:53, 9 May 2008 (UTC)

[edit] Needing work

Host based - need some more detail and contents on the various styles of host based storage virtualization
Block vs NAS - needs completing