Talk:RAID/archives/2007-10-19
From Wikipedia, the free encyclopedia
[edit] Removed the link to http://www.wiebetech.com whitepaper
Tried to calculate the failure probability. But this is no scientific paper, the numbers are made up (experience from reality). It's useless and more of a advertorial anyway.
[edit] Apu Nahasapeemapetilon
i would change it but i dont know who the original guy is... or is Apu Nahasapeemapetilon right?
[edit] questionable links
anyone curious whether or not some of these links meet WP guidelines on external links? Some of these are blogs, and others are of dubious origin. comments? anyone think some of them should go? // 3R1C 22:49, 22 December 2006 (UTC)
[edit] still way long
he article is rather unsightly =\ suggestions? 3R1C 15:41, 15 November 2006 (UTC)
- Maybe a separate page for each of the RAID levels? Poweroid 17:18, 15 November 2006 (UTC)
- Sounds fair to me, but I've put nothing into this article, and I would not feel comfortable just subverting someone elses work =P 3R1C 14:59, 18 November 2006 (UTC)
- I agree. A seperate article describing the different types of RAID would be the best solution. A brief explanation of the most common types, 0 and 1, should be all that's left in this article, along with a link to the 'Types of RAID' page for more information. Walther Atkinson 21:44, 15 December 2006 (UTC)
- Sounds fair to me, but I've put nothing into this article, and I would not feel comfortable just subverting someone elses work =P 3R1C 14:59, 18 November 2006 (UTC)
- I quite like the article as it is. It may be long, just looking at the word count. On the other hand, just cutting it into different articles makes it even less clearly aranged and concise. Leave it as it is. Johannes121 17:01, 30 November 2006 (UTC)
I I was thinking. In fact, all that needs to be done is to create a simple table under a section called "RAID Levels". Each raid config (RAID 5, etc) would be listed there, hyperlinked to the article that corresponds. (And yes, the current raid level summaries would be moved. -GUEST on 12/1/06
I think it could be simply split into these separate pages:
- RAID
- contains: History, RAID Implementations, Hardware vs. Software, etc... in other words, the current sections 1,2,6,7,8
- what the other Guest said about a table, containing each configuration name and diagram/summary.
- (perhaps even make separate pages that go into more detail over each RAID config, but keeping the original "umbrella section" pages)
- Standard RAID Levels
- Nested RAID Levels
- Proprietary RAID Levels
Also, it would not be difficult to split this article up, as all that is necessary is creating more article pages and cutting and pasting articles from this original article to it's new home. Basically, the wiki community just needs to decide on how to split it.) --DEMONIIIK 07 December 2006
- then everyone should grab a section and split it.
I'm going to go ahead and do one of em nowStandard, Nested, and Proprietary sections split. // 3R1C 00:56, 19 December 2006 (UTC)
[edit] removed link
removed link, its outright wrong, ad infested and hinting corporate slant. I've studied this and worked with raid 0s and 1s for ages and none of that article rang true.
[edit] Confusing Passage
The paragraph below doesnt' make sense somehow. I can't tell if it means there is performacne enhancements to be gained from RAID 1 and 2 or there is not. The advantages of using RAID1 or RAID2 are that if a single disk fails you can use the other disk and no data is lost, and, because two disks are being used at the same time, the throughput is doubled - resulting in a highly noticeable decrease in access times for the disks which greatly adds to performance of the system. Therefore, each disk in the system is independent and there is no increase in performance. In this configuration "independent" or "inexpensive" depends primarily on perspective and either use of the word is correct.
Also, the Raid0 section is poorly written.
[edit] reliability
[edit] Possible error
With RAID 1, MTBF ought to be the square of that of the individual disks, not double.
Secondly, theoritically the RAID throughput performance can be doubled. But in practicle, RAID 3++ would slow down the performance especially with large files such as database. I'm not talking about Oracle database only, but also mySQL and other kind of files with large partition size. For example, emails and Images storage.
- --19:16, 13 April 2006 (UTC) : din
- Product of the MTBF's, actually; the disks aren't required to be an exact match.
--Baylink 17:47, 5 September 2006 (UTC)
[edit] Reliability scaling.
Guys, isn't a probability of two simultaneous events a product of the probabilities of the independent events. I this light I don't think that reliability of RAID-1 increases linearly with the number of disks.
- Agreed. And in the previous sentence, assuming reliability is the opposite of failure, the probability of non-failure is (1 - (pr(n of n drives fail) = pr(single disk failure)^n)). In the given example, going from one to two drives, reliability goes from 1 - f to 1 - f^2, instead of "increasing by a factor of two." In general I think this article leaves a lot to be desired as far as the mathematics of performance and reliability or RAID go. Psyno 23:05, 9 July 2005 (UTC)
- It's important to make the distinction between "the odds that one drive in an array will fail" which go up as the number of drives, and "the odds that the array itself will fail", which, indeed, go *down* as the number of drives. The mathematical portions of the piece do indeed need to be cleaned up.
--Baylink 17:49, 5 September 2006 (UTC)
[edit] Reliability
I wonder if we should include a section on the misunderstandings people have about RAID reliability. I would hope most people who set up RAID arrays for specilist purposes know what they're doing but a lot of people who do it for fun (or whatever) don't appear to comprehend what RAID (esp 1 and 5) does and doesn't do. Surprisingly a lot of people seem to think it is a good 'alternative' to a backup and fail to understand why 2 hard disks failures more or less simulatenous is not that unlikely (power supply failures etc) and that RAID also doesn't help in the event of natural disasters, viruses etc (of course backups may not help either depending on where you store them). Also when it comes to performance (of RAID 0) perhaps again we should elaborate on the performance advantages (e.g. discussing why 2 disks does not mean double performance in all cases due to the importance of caches, seek times etc for non burst reading) and describe why RAID 0 is not always the best solution for performance reasons (independent disks are probably the best option for most general users) Nil Einne 21:09, 29 March 2006 (UTC)
- I agree. As mentioned in my response to RAID 6, it might be useful to create a Code Theory section that goes into the advantages / disadvantages for the different types of RAIDs and why certain mechanisms were created.
- Pyth007 19:31, 13 April 2006 (UTC)
Yes, if we *assume* that disk failures are independent, then (in theory) the probability of both drives failing in RAID 1 is much less than the probability of just one drive failing. However, in practice, "those with RAID 1 are marginally *more* likely to lose data than those without any RAID at all." -- http://www.bestpricecomputers.co.uk/reviews/home-pc-raid/ (2005)
I'd much rather this article clearly state what *actually* happens, rather than give some theoretical formula that has nothing to do with the real world. -- User:DavidCary --70.189.73.224 03:31, 30 May 2006 (UTC)
- Yes people who use raid 1 as a substitute for rather than in addition to backup are likely to get burnt (and including a warning of that would be a good thing) and many people use raid 0 even though it brings them no real benifit (that article admits is *IS* good for is applications like video when you need to move shitloads of contiguous data arround). But the fundamental premise being pushed by that article that raid itself sucks is just BS. And a raid controller failure shouldn't be an issue for a raid 1 setup (it could well be for a raid 0 or parity based raid though and its something to resarch carefull). Plugwash 01:10, 31 May 2006 (UTC)
[edit] This article needs work
Everyone here should read PCguide.com's section on RAID [1] -- it explains it much better than the user who tried really hard to write the article on wikipedia did - sorry dude. Note that ACNC's website's description of RAID 1 is really RAID 1+0 -- RAID 1 must be exactly 2 disks.
- RAID 1 is mirroring and doesn't necesarily limit the number of mirrors, though 2 is by far the most common number and certainly it's what I normally think of. Still, there was consideration of using a 3 or more disk RAID setup to help the seek rate of Wikipedia at one point. Jamesday 21:33, 19 Jul 2004 (UTC)
[edit] Reworked entirely!
OK, someone else started the new stub, and I added lots more info. Then I went back and diffed the original (infringing) article and put in all the text since then wherever I felt it belonged. So the new Temp stub is ready to roll! --Sfoskett 21:38, 12 Jul 2004 (UTC)
- The possibly infringed palce was [2] by a more direct link. I'll also review the article sometime in the next week or so. Jamesday 21:33, 19 Jul 2004 (UTC)
[edit] Rework Diagrams?
Does anyone mind if I rework the diagrams from ascii into something more readable, such as a couple of PNGs? The only problem I can see is that a PNG would be harder to repair if I copied a mistake from these ascii diagrams. If anyone who knows what they are doing can confirm that these diagrams are okay, it'd be much appreciated --huwr 08:25, 29 Jul 2004 (UTC)
- Considering that a year later we still have the same ascii diagrams, you never got around to making new diagrams. Since your comment you and others have fixed to diagrams so I figure they're probably right by now. Taking into account the fact that I have nothing better to do, I'll make them and post a link when I'm done for peer approval.--Kryptknight 01:57, July 23, 2005 (UTC)
I added a diagram and expanded a couple of others. --Avernar 04:42, 23 July 2005 (UTC)
-
-
- Raid-6 is wrong. You can't be blamed, our article was vastly incorrect for more than a year. I've corrected the article. --Gmaxwell 05:44, 22 September 2005 (UTC)
-
--216.232.71.212 04:42, 6 October 2005 (UTC)
- The main article is currently using ascii diagrams, and the multi-coloration of the above images detracts from communicating-across the concepts... I would be willing to spend a few minutes and make some jpeg's in M$paint. Optimally, we should use the diagrams from http://www.acnc.com/raid.html, but they are undoubtedly copyrighted. A recreation of them (in differrent colors, different style of arrows, perhaps?) would be what i would aim for. The7thone1188 21:56, 3 June 2006 (UTC)
- The ASCII diagrams are incomplete in that they fail to convey parity source, while the colors do this quite admirably. I think it would be a great addition. Dr1819 13:51, 10 June 2006 (UTC)
- Also, the AC&NC depiction of [RAID 10]is incorrect (and yes, they're undoubtedly copyrighted). And I disagree with their recommendations and comments as well, particularly on RAID 6's dual parity. The claim it's slow, yet controllers with one chip to generate XOR parity and the other to generate the RS code, the only additional performance hit comes from writing the parity twice instead of once. Dr1819 14:35, 10 June 2006 (UTC)
[edit] Vandalism
What is up with all the vandalism of this particular item? I have 1,000 articles in my watchlist, and none is vandalized as often as this one. And the vandalism is just stupid - bold text, added leQtters, inserted single POOP words. Is this used as a sample page somewhere? Is it some newbie ICE-CREAM (original: initiation) course? I hope we don't have to lock it down... --SFoskett 18:02, Sep 16, 2004 (UTC)
- I too have seen quite a bit of vandalism on this item. It is beginning to get to the stage where more work is done undoing vandalism than improving the article. --huwr 23:46, 17 Sep 2004 (UTC)
-
- Not sure if it's still going on occasionally, but if this is a problem a tip is to use this Google query: link:http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks. This will perform a Google search on pages linking to this article. I actually found a bunch when just trying it out, but I couldn't see any high traffic sites at a first sight, but I could be wrong. -- Jugalator 00:24, July 25, 2005 (UTC)
-
-
- Now, sure, it doesn't happen NEAR as often as it used to (I'm guessing...), but... perhaps it is time to just semi-lock it. After all, if someone wants to actually make an intelligent change, it's not exactly impossible to either request a change or just sign up, edit an unlocked page successfully, and wait a week. Let's do it! (By the way, I decided to make SFoskett's post more exciting by adding the other three examples of vandalism to his comment... teehee...) --DEMONIIIK 05:34, 22 February 2007 (UTC)
-
-
-
-
- Why all the vandalism? what do people have against raid arrays? Royallywasted
-
-
-
-
- More vandalism in the form of "Nates gay". Vandals has bad grammar. -- Gnomeza 18:34, 13 September 2007 (UTC)
-
[edit] Irrelevant
The wikilink irrelevant in the sentence "RAID-0 is useful for setups such as large read-only NFS servers where mounting many disks is time-consuming or impossible and redundancy is irrelevant." redirects to Renormalization group which is something not related to the meaning of irrelevant here. --ReiVaX 19:43, 14 Oct 2004 (UTC)
I'm pretty sure that the RAID-6 discussed in the "RAID-6 section" is very different from the RAID-6 discussed in the "double parity" section. I don't really know which is right. 204.0.197.190 20:59, 4 Feb 2005 (UTC)
[edit] You need add more RAID levels
You guys need to add more raid levels in "Nested Raid" on RAID 60, RAID 61, and RAID 50.
[edit] External links deleted?
Why exactly were the external links deleted in this revision? --Damian Yerrick 23:52, 21 July 2005 (UTC)
- Looks like someone removing useful information - I noticed that both of the other removals by that IP had already been undone. I've added back the external links and categories. Jamesday 13:31, 23 July 2005 (UTC)
[edit] I'd like to know more about RAID 1 writing
This is what I saw the article had to say about RAID 1 writing:
- When writing the array acts like a single disk as all writes must be written to all disks.
It's seriously not too much ;-), and I was particularly wondering if RAID 1 writes were done in parallel or sequentially. I.e. is the writing time of complexity O(1) or O(n) where n is the number of drives? Does it differ between RAID controllers? This may apply to other RAID modes; I haven't checked that far as I was mostly interested in RAID 1 for a possible future computer for now. -- Jugalator 00:28, July 25, 2005 (UTC)
- Yeah, I should expand on that. Ideally they writes are done in parallel. There are several things that can force the transfer of data to the drives to occur sequentially. A poor controller design will do it. If both of the drives are on the same IDE cable. For SCSI the transfer over the bus is done sequentially but since this is so much faster than the drives can write to the platters the writes are overlapped:
Disk1: TTWWWWWWWWWWWWWWW Disk2: TTWWWWWWWWWWWWWWW
Where T = SCSI bus transfer and W = Disk Write
- This applies to reads as well and the other RAID modes. When I add this to the article I'll probably put it up in the common section. - Avernar 04:36, 25 July 2005 (UTC)
[edit] Raid 6 vs Double Parity
The current RAID6 sections descibes double parity. It describes a system in which the folowing blocks exist (but are not shuffled in this fasion):
A1 A2 A3 PA B1 B2 B3 PB C1 C2 C3 PC P1 P2 P3 --
where:
PA= A1 XOR A2 XOR A3 PB= B1 XOR B2 XOR B3 PC= C1 XOR C2 XOR C3
P1= A1 XOR B1 XOR C1 P2= A2 XOR B2 XOR C2 P3= A3 XOR B2 XOR C3
(in reality other parity system may be used, but XOR works perfectly well)
Anyway that is taking parity in two different directions which is what the Double Parity section claims makes it different from RAID-6. That section claims Raid-6 has parity in only one direction, but with two copies. I suspect that is correct and the author of the RAID-6 section was just confused.
- UGH! no this is completely an utterly wrong. Raid-6 is an implimentation of Reed-Solomon erasure codes. It turns out that xor is a perfectly valid syndrome for a RS code, so raid-6 just adds another non-overlapping syndrom. Sadly the extra syndrome is much more expensive to use than XOR. Raid 6 is striped exactly like raid-5, except there are two parity blocks to distribute. The system proposed would not handle a two disk failure. Raid-6 does. And RS codes can be generalized to handle any number of redundant disks, and as long as their failure is detectable, as many disks can die as there are redundant ones. --Gmaxwell 04:11, 22 September 2005 (UTC)
- and here, I found a citation [3]. --Gmaxwell 04:14, 22 September 2005 (UTC)
- I've corrected the nonsense in the page. --Gmaxwell 05:42, 22 September 2005 (UTC)
[edit] Meaning of RAID
On a slightly less technical issue, there seems some confusion over the possible meanings of the acronym RAID. At the start of the article it says that "Redundant Array of Inexpensive Disks" is simply incorrect. Later it becomes clear that it is the original meaning, and later still it appears that it may be correct after all. (As usual, I suspect this is a side-effect of multiple editors working on the page.) Could someone sort this out, please?Jon Rob 08:33, 17 November 2005 (UTC)
- The use of the word "incorrect" is incorrect, since the original paper was "A Case for Redundant Arrays of Inexpensive Disks (RAID)". Shawnc 14:37, 25 November 2005 (UTC)
- I also learnt that it was "Inexpensive" from work/manuals/the MSCE course and I was rather shocked to see it linked from google as independent (why I followed the link) so I'd like to see "Independent" removed or it changed to state that RAID is commonly called "Redundant Array of Inexpensive Discs." (Note that Array is not plural either - is this correct?) eps 13:58, 2 February 2006 (UTC)
Also I don’t understand (and thus a good point to add to it), how RAID can actually be called "Independent". Considering its a backup system, and the disk do anything but operating independently. Also the term "Inexpensive" is the original term used by the guy who invented it. Saying that the term inexpensive is not used because hard drives are cheaper is moronic. The idea came about because SLED (Single Large Expensive Disk), was indeed expensive (and obviously still is) and offered no redundancy.[4]--TheWickerMan 02:42, 18 November 2005 (UTC)
The disks are considered independent in that they are more independent than would be several platters on the same spindle. That said, I'm not defending "independent" over "inexpensive" -- I'd prefer the original nomenclature as well. Just like a DVD is NOT a "Digital Versatile Disc". Ugh. Drives me nuts. --Stevestrange 22:56, 16 December 2005 (UTC)
[edit] RAID 100 diagram
There is an apparent controversy over the RAID 100 diagram. Here's what I think it should be, as per this revision:
RAID 0 /-------------------------------------\ | | RAID 0 RAID 0 /-----------------\ /-----------------\ | | | | RAID 1 RAID 1 RAID 1 RAID 1 /--------\ /--------\ /--------\ /--------\ | | | | | | | | 120 GB 120 GB 120 GB 120 GB 120 GB 120 GB 120 GB 120 GB A1 A1 A2 A2 A3 A3 A4 A4 A5 A5 A6 A6 A7 A7 A8 A8 B1 B1 B2 B2 B3 B3 B4 B4 B5 B5 B6 B6 B7 B7 B8 B8
It has since been altered in subsequent revisions. I claim this revision is incomplete because a RAID 100 is definitely a stripe of a stripe of mirrors; that is, a stripe of RAID 10s, rather than a simple RAID 10. It was then (and is currently) deleted altogether. I think the diagram should be included for clarity and completeness, but I'd like not to get into an edit war over it. mAtt 21:07, 22 December 2005 (UTC)
- No additional input in five days; I'm reinserting the above diagram. mAtt 20:27, 27 December 2005 (UTC)
[edit] Raid 1.5
Raid 1.5 is not described properly. Raid 1.5 has mirroring, striping, and parity for two disks.
- Marketting gimmik. You can't mirror and stripe with only two disks. If you stripe half the disk then mirror each block to the other disk you just end up with mirroring with a different on disk block order. As for parity with two disks once you mirror you don't need to also add parity. The DFI documentation does not mention parity, only "stripping and mirroring simultaneously using two drives". Please show me a diagram of the on disk block order of how this is done that's not just RAID 1 with a different block order (the free disk space available with RAID 1.5 is the same as RAID 1). It's only available on two DFI motherboards that are being phased out. BTW, nowhere on HighPoint's site can I find any mention of RAID 1.5.
- Avernar 18:43, 9 April 2006 (UTC)
[edit] Plagiarism? RAID 1
The RAID 1 section has the same content as the RAID 1 section on http://www.aaa-datarecovery.com/raid_tutorial.htm. I'm not sure who copied from whom. --Westonmr 20:31, 16 January 2006 (UTC)
- I'm guessing the external site copied from Wikipedia. The WP page got to that particular revision slowly; if it had been copied and pasted there would have been an edit changing the entire RAID 1 section. There's a similar situation with the external page's RAID 0, JBOD, and RAID 5 sections, and arguably some others as well. I don't think there's a reason to change anything about WP's current revision, and I claim that aaa-datarecovery.com is being Bad. !mAtt™ 21:07, 16 January 2006 (UTC)
[edit] More types of RAID
I would like to Know if a 'spanned volumes' is the same as JBOD? --Ed Fraser
new Raid!!
- There is a new Kind of Raid we got to add!! Its RAD150!! Also Known as 1+5+0! My ThunderRaid controller onboard SATA does it! It special feature, multi-nested-levels, Data is Striped over distributed parity arrays over Mirrors! Very clever feature. Also popular with Linux Experts!!!
-
- Why not add it then? The entire point of wiki is to allow new information to be added quickly by anyone, yes? There's already a section on nested RAID levels and one for Proprietary levels (not sure which this is). Although I'd advise you avoid the excessive exclamation marks on the actual wiki node or it'll just get deleted ;-) Darksidex 12:14, 26 July 2005 (UTC)
Listen. You need to update this Raid Info. For I have a tech who saying he Know it all from your website. But there New Type of raid. Like the Onboard raid where the controler Use that can use one Hard drive. Like the MSI MS 6380 Promise 20265r Onboard Raid. This guy keep going Only a raid controler can run Only 2 hard drives and I told him Mine runs one. Its called single drive stripped array. Its so if Like me who has too many things on My 4 Ide cable which are full. but have one more hard drive. So Please Update Your info. I can run One, Two or three or Four hard drives. See I know this is not the 15 hard drives your talking about but the Onboard hard drive a motharboard would have.
- You can have an array of any number of volumes you like on a single drive, but from any practical point of view you like to name, it is utterly pointless. It reduces performance by a massive amount (because of dramatically increased access times), and also decreases reliability (because RAID volumes are always more fragle than a single volume). In fact, the only thing it increases is the cost. Tannin 09:38 20 Jun 2003 (UTC)
On another topic, the first sentence of the article reads, "The goal of a redundant array of independent disks (originally known as a redundant array of inexpensive disks) -- or RAID -- is to provide large reliable virtual disks that can be much larger than commonly available disk drives."
After "much larger", I would add the words, "and more reliable".
--Gil Dawson
Discussing the characteristics of RAID 1, "One write or two reads possible per mirrored pair. Twice the read transaction rate of single disks, same write transaction rate as single disks."
I wonder if the terms "read" and "write" might not be backwards? Seems that one read would be sufficient, while two writes would be required.
--Gil Dawson
No, it's correct that reading is twice as fast and writing is about the same speed. The reason is because the drives are capable of working simultaneously. Thus the same write can happen twice on both drives. Reading though, gets a speed boost because it can be distributed allowing each drive to read different parts of a file simultaneously and combining the parts in memory.
--Rev. Johnny Healey
would RAID-53 count as one of the RAID levels? Not too sure... searchstorage RAID definition includes raid-53 Applegoddess 04:42, 9 Jun 2004 (UTC)
[edit] Style
The various levels of RAID would be easier to understand if the main article used 1+5+0 et cetera rather than 150. 150 could easily be confused with something else that uses the number 150, and wouldn't imply the nesting/adding of levels. Last Avenue 01:02, 27 January 2006 (UTC)
Am I the only person who thinks the jump into the rather terse discussion of Reed-Solomon code in the section for RAID 6 is jarring? I dosen't seem consistant with the other sections. Surely it's useful information, but perhaps not at the beginning of the section.
- For the numbering style of nested arrays, I believe that the heading for RAID 50 was designed best. It provided the more common single number as the main term, but kept the 5+0 style for ease in understanding the semantics.
- As for the Reed-Solomon discussion, I agree that article went from a fairly basic discussion of RAID technologies and suddenly plunged into code and number theory. Perhaps moving these two paragraphs to the end of the RAID 6 section would feel less jarring. Alternatively, a Coding Theory section or moving the Reed-Solomon paragraphs into the History section may be more suitable. Have the History go through disk-mirroring and pre-RAID ideas, then move into discussing classical RAID technologies with parity and striping, and finish the History section with "modern" approaches like RAID 6 and nested arrays, using the coding / number theory to show reasons for these new arrangements. (I actually like this best, especially considering that the Reed-Solomon paragraphs begin by considering the RAID 5 paradigm and extend it to RAID 6)
- Pyth007 18:31, 13 April 2006 (UTC)
[edit] JBOD
JBOD -- this is, as far as I know, just a bunch of disks, not a neccessarily a concetination. There are many RAID controllers available whose vendors claim they are able to do "raid 0, raid 1, raid 5 and jbod" or so, and they mean: The read controller can act as a non-raid hard disk controller, giving access to each of the disks individually. There are many wrong definitions of this term out there and JBOD may be used for concatination, too, but I think the term is much more unclear than stated in the article. Right? Kruemelmo 15:17, 30 January 2006 (UTC)
- No, JBOD is concatenation. Running a RAID controller in non-RAID mode as a normal controller doesn't have a special term.
- Avernar 19:02, 9 April 2006 (UTC)
-
- Citation, please? My experience, too, is that JBOD is most commonly used to apply to a bunch of drive hanging in a RAID-style rackshelf, but which are just on a controller channel and being used for non-RAID purposes. The configuration the article currently mentions is really a longitudinal stripe, and therefore, probably a variant of RAID 0.
--Baylink 17:42, 5 September 2006 (UTC)
- Citation, please? My experience, too, is that JBOD is most commonly used to apply to a bunch of drive hanging in a RAID-style rackshelf, but which are just on a controller channel and being used for non-RAID purposes. The configuration the article currently mentions is really a longitudinal stripe, and therefore, probably a variant of RAID 0.
-
-
- Trying to find a good citation on the net about JBOD is a pain in the butt. The best info seems to be in the newsgroups (usenet). The info I have found basically says that when people talk about a raid controller card and what modes it supports JBOD means spanning. When people talk about storage array boxes like SANs it can also mean non-RAID/non array disks. And the high end raid card manufacturers use the second meaning (since they also do DAS/NAS/SANs). No wonder there's so much confusion about what JBOD means: it has two meanings.
-
-
-
- We can change the section title to Spanning/Concatenation and put in the text that it's usually called JBOD and make another section for JBOD as well. Or we can put both meanings in the same section. Which one sounds less confusing?
-
-
-
- Technically there's nothing wrong with our description as a single disk is just a span of 1 disc. Looks like some storage boxes use this definition as well as you can have individual disks or concatenation in their JBOD mode. Interesting piece of information I found: someone mentioned that when he was using JBOD mode on his controller the drives showed up as RAID arrays of 1 disk each, but when he turned RAID off they showed up as normal SATA drives.
-
-
-
- Avernar 09:18, 28 October 2006 (UTC)
-
The article is giving people to much hope when it comes to recovery after a failed disk in JBOD. The data on the other disks will be there, but the file system will most likely be a mess, so they may be very difficult to recover. KNaranek
Regardless of which definition is correct, there should be some definition included, or the redirect corrected, as currently JBOD redirects to this article, and yet the article makes no mention of the term.--Tmetro 17:58, 25 December 2006 (UTC)
[edit] Who introduced the incorrect information about JBOD?!
There is now an entire section of this article which is incorrect.
JBOD is not a RAID mode -- it is the opposite. It literally means the disks are not acting a set -- not even a linear concatenation.
Also, concatenation is a RAID mode -- it is just a special case of striping where the width is the same as the volume.
Is there some debate on these points or can it be changed immediately?
- When talking about RAID modes JBOD usually means concatenation (spanning). Yes, it's not an official raid mode. It's not redundant but neither is RAID 0.
- When talking about storage in general, like SANs for instance, JBOD can refer to the discs as not being part of an array. A lot of people who deal with big data storage use the term this way.
- Concatenation is not a RAID mode. It is not a special case of striping. You can't set up a raid controller in RAID 0 mode with a block size that big. Even if you could it wouldn't be the same on disk layout as JBOD with different size disks. The size of the smallest disk would be you're first set of stripes and then the remaining space would be striped next if the controller can do it, otherwise it would be wasted. JBOD doesn't waste space with disks of different sizes.
- I don't believe that the section should be replaced. Perhaps we should rename the section to Concatenation/Spanning and put in the text that this is called JBOD when people talk about raid modes.
- Avernar 06:33, 28 October 2006 (UTC)
-
- JBOD does not mean spanning or concatenation. JBOD means 'just a bunch of disks' Any indutrial RAID controller I have used (and I work with them every day in R&D) will simply provide a single physical disk if you assign it as a JBOD.
- Baz whyte 19:57, 26 February 2007 (UTC)
[edit] Redundant Array of Independent DIMMs
How about a new page (with appropriate disambiguation) for memory RAID? Not much information out there on it, but it is available and fairly easy to configure on, for instance, a Dell PowerEdge:
http://support.dell.com/support/edocs/systems/pe6850/en/it/h2147c60.htm#wp1081442
(Scroll down to Memory RAID Support.)
The contingency being: is R.A.I.D. (D now standing for DIMM, just to make matters confusing) the correct acronym?
--64.139.53.163 19:11, 2 March 2006 (UTC)
- Is this technique limited to DIMMs specifically, or can it be implemented with memory modules in general? I propose RAIMM. Then you could have a RAIMM for your RIMM RAM, or a Redundant Array of Inexpensive RAM. It being RAIR to use inexpensive RIMM, let's just call it RAR. Marketing will be on that like a tiger. Is there room for ROM? That wouldn't be a ROAR, but a ROAM. I've obviously neglected REM for too long; kindly leave the room. --Markzero 15:31, 27 April 2006 (UTC)
-
- "Seeing as how the VP is such a VIP, shouldn't we keep the PC on the QT? Because if it leaks to the VC, you could end up an MIA, and then we'd all be put on KP."
--Baylink 17:46, 5 September 2006 (UTC)
- "Seeing as how the VP is such a VIP, shouldn't we keep the PC on the QT? Because if it leaks to the VC, you could end up an MIA, and then we'd all be put on KP."
[edit] Matrix Raid
I don't get the sentence no
[quote] "Currently, most (all?) of the other cheap RAID BIOS products only allow one disk to participate in a single array." [/quote]
Surely that should be the other way round. no? allow a disk to participate in only a single array?
- It depends on the intention of the author as to what the word "only" describes (as well as which term gets emphasized by its singularity). As quoted, the sentence means that there are several different allowances (ie RAID cards have different ways of utilizing disks inside of arrays), but that the RAID controllers allow only one of these methods to be used. In the alternate phrasing (ie "... allow a disk to participate in only a single array"), the "only" is describing the array, and so its semantics are geared more toward the idea of the RAID controlers allowing only one array as opposed to a disk being shared in multiple arrays. In fact, the sentence could be changed to "... RAID BIOS products only allow one disk to participate in only a single array", which gets across both meanings; however the multiple uses of the word would actually detract from readability. Since the semantics of the sentence vary slightly with placement of the word "only", I'd say "don't fix what ain't broke" (the only real difference that the placement of the word "only" would have is if it was used to describe the disk, as in "... allow only a disk to participate in a single array". Here it sounds as though the controllers are allowing a disk (as opposed to other forms of media) to participlate in an array; clearly this would be a case where the sentence meaning differs from what the author intended).
- Pyth007 19:22, 13 April 2006 (UTC)
[edit] Not restriping disks?
If I have
A,B,C disks A,B are for data, C for parity something like RAID 4
A,B are full and so is C I add a new buffer D,E I do not wish to "resize A,B"
Can I simply not start writing to D and simply modify the parity on C to do C= parity A,B,D, E ? if parity is XOR then simply Xorin C with new data on D xor E should suffice... (you may have to maintain a pointer as to how much on C is old parity and how much is new etc as data comes on D,E)
In other words one may not need to "restripe" everything? Is this done somewhere? It would imply treating "disks added" as bunch across which there is a stripe.
-alokdube@hotpop.com
>>>>> With present CPU speeds, software RAID can be faster than hardware RAID, though at the cost of using CPU power which might be best used for other tasks
This is totally unsubstantiated. I am unaware of a hardware raid solution that doesn't operate at the maximum hardware I/O speeds. Provide a reference to substantiate this claim.
See: http://www.redhat.com/docs/manuals/linux/RHL-9-Manual/custom-guide/s1-raid-approaches.html
"With today's fast CPUs, Software RAID performance can excel against Hardware RAID."
>>>>>
--> wouldnt matter would it? as long as one can "plug off the BOD and put it elsewhere" and not worry, whether it is software or hardware it should not matter. However in both cases "not restriping" is a benefit.
[edit] X-RAID
This writing is definitely NOT up to the quality required to post to the article (I am extremely tired right now, which does not help my writing ability); could someone please proofread and post to the main article (or at least provide some suggestions on reworking it when I'm more awake)? Thanks! 70.92.160.23 05:24, 24 March 2006 (UTC)
Infrant Technologies's X-RAID is a RAID method that allows for the dynamic expansion of existing disk volumes without losing any data already present on the volume. It currently is limited to a maximum of 4 drives per RAID volume, but certain OEM products utilising custom Infrant boards have room for up to 7 drives (although the X-RAID mode will still only work with a maximum of 4 drives per volume, work is being done to increase the maximum number of drives per volume). X-RAID utilises a proprietary extension to regular Linux volume management and runs using the Ext3 filesystem. This means that X-RAID volumes can be mounted by a standard Linux installation, even when connected to a non-Infrant device. When two drives are installed in an X-RAID device, the X-RAID system runs in a redundant RAID 1 mode; with three or more drives, the device runs in a mode similar to RAID 5. X-RAID allows for the automatic expansion of volumes by replacing all disks in a volume with larger-sized disks. Once all disks are replaced, the volume automatically expands to fill the newly available space. It also allows for starting with only one disk and adding disks on-the-fly without losing any of the data already existing on the volume. Technical details about the X-RAID system are currently unavailable due to a pending U.S. patent filing.
[edit] Merge proposal
Since spanned volumes are AFAIK basically Windows NT's built in software RAID concentaton/JBOD I don't see any reason to have a dedicated article anymore then we don't have a dedicated article for striped volumes and mirrored volumes. Of course JBOD/concentation is not RAID but it's discussed here as it probably should be because it's the most suitable article. Having said that, we probably should add a bit more about Windows NT's built in software RAID/dynamic disks since it's not really discussed here... Nil Einne 20:16, 29 March 2006 (UTC)
- I've had another thought. Alternatively, we could very briefly mention Windows NT dynamic disks in the article and perhaps under RAID 0, RAID 1 and JBOD and make a dynamic disks article to discuss them in detail (since this article is getting rather big) Nil Einne 20:20, 29 March 2006 (UTC)
I don't think it should be merged. This article is already pretty big. Definitely make it a prominent link, (or a section, with a "main article" link, and brief descriptions here), but there is too much information in the raid levels article to jam it all in here. Kaldosh 08:39, 18 March 2007 (UTC)
[edit] This article should not be merged with spanned volume.
This article should not be merged with spanned volume. While some forms of non fault tollerent raid might relate to spanned volume, spanned volume is not fault tollerent. Do not merge the articles.
- I agree. It's also a popular search term (RAID <whatever>), and trying to dig info about RAID specs out of another article would be annoying.
- ~ender 2006-04-10 02:32:AM MST
[edit] RAID 0 typo?
"for example, if a 120 GB disk is striped together with a 100 GB disk, the size of the array will be 200 GB"
Should that be 220Gb? Or is there some kinda storage space loss that wasn't posted? 10% loss seems like a lot of space to lose. Ghostalker 23:37, 3 April 2006 (UTC)
- No, it's correct. The reason for the storage space loss was posted, the sentence before the example reads: "A RAID 0 can be created with disks of differing sizes, but the storage space added to the array by each disk is limited to the size of the smallest disk". It's just a design descision for performance. The last 20 GB would not be striped. Almost all hardware raid implementions do this. The linux software RAID 0 driver will use the remaining space (with three disks you'd end up with striping across three disks for the size of the smallest disk, then striping across two disks up to the size of the middle disk, and then just normal storage up to the end of the largest disk). Not sure what other unix like OSes do.
- Avernar 18:55, 9 April 2006 (UTC)
It is correct that a 120GB drive and 100GB drive in RAID 0 will yield 200GB. @Avernar - I am not aware of any Linux software RAID drivers being able to use the remaining space of the larger drive in RAID 0. Please document or link to the correct product and version/implimentation. Thanks. As far as I know, Windows XP cannot use in any way the extra space on the larger drive. Freedomlinux 20:21, 25 August 2006 (UTC)
- It's not done automagically by the Linux software RAID drivers in the kernel, but you can configure this manually. Like I have done: 120+120+200: 3*100MiB /boot RAID1 + 3*1GiB swap + 3*119GiB / RAID5 (238GiB) + remaining 80GiB windoze. Amedee 12:43, 18 October 2006 (UTC)
- It's in the raid0.c file. Took me a little bit to find an article about it. Go to RAID0 Implementation Under Linux. Search for "stripe zones". Avernar 08:24, 28 October 2006 (UTC)
[edit] Independent vs Inexpensive: Beginning of article, gross error
"In Raid 0 there is no fault tolerance; therefore, each disk in the system is independent and there is no increase in performance."
Excuse me ? There is no increase in performance ? And clearly this is not a typo. People editing should at least know what they're talking about, this is polution!
Also, this whole paragraph about Independent vs Inexpensive is not consistent at all. It talks about data being independent (loosely defining the meaning of the usage) and claiming RAID 0,1,2 fit the definition while 4,5 (where the author says "stripping and stripping + parity", like RAID 0 wasn't striped!) would not fit the "independence" definition.
Contrast the logic: The first part of the paragraph loosely says RAID 1 and 2 are fault tolerant, therefore called independent, and later we have, "In Raid 0 there is no fault tolerance; therefore, each disk in the system is independent".
This paragraph seems like a mess to me, I would flush it.
[edit] Independant?!
Sorry but WHAT THE HELL ARE YOU ON ABOUT???!
RAID arrays are entirely dependant on every disk in them whilst they are functioning - no matter what RAID level is being used. If the array was to fail, then the disks would be independant but the are certainly NOT when the array is functioning. Correct definition of the term RAID is most definately Redundant array of INEXPENSIVE disks. Period.
- I'm no expert (I came to this page to learn about it) but a quick internet search shows that 'independent' is a very common interpretation of the 'I'. Also, that section in the article made sense to me: In RAID 1 and 2, multiple independent copies of data are stored: One can be lost without affecting the other, they are independent. The other raids treat the hard drives as one continuous medium, and so cannot logically be seperated. Then again, I really have no clue about RAID... BlankAxolotl 01:22, 26 April 2006 (UTC)
-
- The drives are independent in the sense that each drive has its own motor, its own low level controller and is generally seperate from the rest of the drives (unfortunately they do often share a common power supply though). Yes all drives are used by the raid controller but they are not dependent on each other. Plugwash 15:46, 26 April 2006 (UTC)
-
-
- Then how does this line from the article fit in: "This is because these RAID configurations use two or more disks acting as one; there is nothing independent about RAID levels 0, 4 and 5.". (edit of original post:) Actually, here is where I think the confusion is: We can talk about independence for either access time or for redundancy of data: If we are talking about access time, we mean that different disk can be accessed independently, at the same time. If we are talking about redundancy, we mean that the data on one disk is independent of the data on the others: One disk can fail, and the data on the other disks still makes sense. (In raid 0,4,5 losing one disk makes the data on the other disks no longer make sense: we might have only portions of certain files (as I understand form article)). The article seems to alternately use these two definitions. I think that whole section is confusing and needs to be reworked, but it is not my place to do that. Maybe just remove it and just say 'some call it independent, others inexpensive' BlankAxolotl 18:51, 26 April 2006 (UTC)
-
-
-
-
- This section of the talk page is definitely redundant :-) (I'm not the one who wrote it, but I wrote the one just above ("independent blabla gross error"). Read this. The quick version is this: the section I'm referring to in the article is crap, like I said above, I would flush it.
-
-
-
-
-
-
- Ya, at first I thought it made sense, but then I realized that it had only confused me, so had to correct myself. Anyway, since the people who actually know about raid aren't doing anything, I went ahead and deleted that section. (You seem to know about raid. You should change things you see are wrong! people can always change it back) It wasn't an interesting section anyway (I skipped it when I first read the article) BlankAxolotl 16:21, 27 April 2006 (UTC)
-
-
-
-
- I'm not sure how the opening of this thread can get away with this, nor how the article can continue to report the acronym as "inexpensive" disks. Does that mean I can only "RAID" inexpensive disks? What if I want to "RAID" costly brand new 750GB drives? Or what if i want to RAID (in software for example) very costly 2TB hardware RAIDs? The "independant" is correct, and the article should be changed to reflect this. "Independant" is to refer to the disk being able to be recognized by the system alone -normally-, but some controller is going to now use that otherwise independant drive in an array. Thus, Redundant Array of Independant Disks. It's an array... of disks... which would otherwise be independant. So it's correct.
- The history of the term RAID goes back to berkeley (where it was created) - they intended for it to be "inexpensive" (even the acronym) but modern usage has changed RAID so that it's overwhelmingly "independant". The wiki opening should probably reflect this change.
- Correct, in the original article (1988) RAID drives were inexpensive, as compared to SLEDs (Single Large Expensive Drives). Even your costly brand new 750GiB drives are inexpensive compared to a SLED.Amedee 12:53, 18 October 2006 (UTC)
- A real argument could be that RAID 0 isn't redundant, but it we all understand that at some point RAID becomes a verb and a noun, so we can accept RAID 0 as a RAID level.
- jclauzet - Jun. 30, 18:51:13 UTC
- The opening sentence reports both meanings and identifies which came first. A few paragraphs below there's some more on the topic. RAID 0 isn't redundant, which is noted at the beginning of the RAID 0 section. Aluvus 19:23, 30 June 2006 (UTC)
- Sigh, I got more caught up in the thread than the facts. You're right. The thread poster was so passionate I got caught up. It's not hard to read about RAID's berkeley roots on the web. I'm not the right person to add it to the history section, but it seems a good idea
- The opening sentence reports both meanings and identifies which came first. A few paragraphs below there's some more on the topic. RAID 0 isn't redundant, which is noted at the beginning of the RAID 0 section. Aluvus 19:23, 30 June 2006 (UTC)
[edit] Benchmarks
Can anywone find benchmarks to support the claims that RAID 0 is faster? Some authors say it's not.
- I can't supply benchmarks, but the basic principle of RAID 0 allows for dramatically increased performance (up to N * 100%, where N is the number of disks in the array) both reading and writing, because each disk in the striped set can do its own read and write operations simultaneously. So if you have 3 disks striped together, you can theoretically read 3 times the data in one time segment as a single disk. It's up to the controller to sequence the data correctly so that it's consistent and correct to the system performing the I/O. — KieferSkunk (talk) — 23:01, 2 March 2007 (UTC)
[edit] New diagrams are Wrong
Sorry, Fadookie, but I think your nice new diagrams are slightly wrong/incomplete. (note I am no expert on raid, I came here to learn about it). The raid 0 and 1 diagrams seem good.
However, the raid 3 diagram is wrong because it needs to have the 'B' blocks, like in the old text diagrams. So A10 A11 A12 become B1 B2 B3. Additionally it would be nice if the 'P' secions would somehow indicate which parity data they hold (as they do in the old text diagrams). I would also change the wording of the accompanying text to "A simultaneous request for any B block would have to wait." from what it is. (It's better to wait 'til the diagrams are there, in case they change the numbering)
The raid 4 diagrams are wrong because they do not emphasize how the parity block for the 'A' blocks is on a seperate disk from the 'A' blocks (and same for the other letters). Also, I do not know about raid, but the old text diagrams also seem to indicate that the parity A block has parity covering all three other disks. If the A blocks are meant to be continuous, it also implies that even continuous data is split up among the disks. As I say, I don't know how it works, safest is to do it just like in the original text diagrams (where A is split up).
The raid 5 and 6 diagrams again doesn't show which parity block is for which data, and does not split up A blocks (and other letters) like in the text diagrams. I would again assume the text diagrams are right, and do just like them.
I haven't really looked at the rest of your diagrams. Raid 1.5 also looks wrong, but I didn't look carefully. Raid 1+0 and 10 seem OK, but better double-check. I have no clue for the double parity one.
Although the diagrams are a little bit wrong, they do look very nice!
BlankAxolotl 00:36, 26 April 2006 (UTC)
- I didn't make the diagrams, I just grabbed them off wikimedia commons. I think they were made by someone on the German Wikipedia.
- How about the ones here? Talk:RAID#Rework_Diagrams.3F
- -Fadookie Talk 11:34, 27 May 2006 (UTC)
- I quite like the look of the diagrams currently on the page. I can probably produce a set of diagrams matching that visual style but conforming to the information in the ASCII art versions, if we're settled that the ASCII renderings are correct. I'm not familiar enough with the ins and outs of some of the RAID levels to make corrections. Aluvus 10:01, 28 May 2006 (UTC)
- Done for some. Still need a little retouching, but I think they still communicate things a lot more clearly than the ASCII versions. I have added them in, revert if you object. Aluvus 04:20, 8 June 2006 (UTC)
- They look really good (better than the original!), and are now correct I think. Maybe the thumbs are a little too small, but I don't think you have control over that. Great! BlankAxolotl 02:03, 16 June 2006 (UTC)
- Done for some. Still need a little retouching, but I think they still communicate things a lot more clearly than the ASCII versions. I have added them in, revert if you object. Aluvus 04:20, 8 June 2006 (UTC)
- I quite like the look of the diagrams currently on the page. I can probably produce a set of diagrams matching that visual style but conforming to the information in the ASCII art versions, if we're settled that the ASCII renderings are correct. I'm not familiar enough with the ins and outs of some of the RAID levels to make corrections. Aluvus 10:01, 28 May 2006 (UTC)
[edit] Introduction: Too Long
I think the introduction paragraphs need to be trimmed down, moving the extra information to appropriate sections. The first two paragraphs alone are enough to introduce the topic, the rest would flow better in appropriate sections rather than prominently at the top. What do you think? DanDanRevolutiontalk 15:13, 4 May 2006 (UTC)
[edit] Confusion about confusion
From the 0+1 section,
"A RAID 0+1 (also called RAID 01, though it shouldn't be confused with RAID 10)"
The likely confusion is with RAID 1, not RAID 10, surely?
- This is probably the case, though I'm not the author. -DanDanRevolution 19:05, 23 May 2006 (UTC)
While it is possible that users could mistake RAID 01 for the longhand version of RAID 1. However, it RAID 01 is also VERY easily confused with RAID 10.
Example: Assuming you have 10 hard drives RAID 01 is making 2 RAID 0 arrays, then mirroring the RAID 0 arrays with RAID 1.
You end up with 2 RAID 0 arrays, each of 5 disks, then mirror the two sets of 5 together with RAID 1 to get a RAID 01 volume
RAID 10 is making 5 RAID 1 arrays, then striping them together with RAID 0.
You end up with 5 RAID 1 arrays, each containing 2 drives. Then, the 5 RAID 1 arrays are striped with RAID 0 to yield a single RAID 10 volume with 10 disks.
I am sorry if this is not exactly that clear, but I am tired and the concept is also slightly confusing. See http://www.pcguide.com/ref/hdd/perf/raid/levels/mult.htm
[edit] Good Article nomination has failed
The Good article nomination for RAID/archives/2007-10-19 has failed, for the following reason:
- It violates the "well written" requirement outlined in WP:WIAGA. It's currently disorganized. The introduction is too long, the images have conceptual errors, and there is an outstanding {{cleanup}} tag. DanDanRevolution 05:19, 22 May 2006 (UTC)
[edit] What is "Advanced Data Guarding" ?
ADG links to Advanced Data Guarding which redirects to RAID. Does Wikipedia:Redirect require us to mention it at the beginning of the RAID article?
http://searchstorage.techtarget.com/sDefinition/0,,sid5_gci1027532,00.html claims that RAID 6, and Advanced Data Guarding (RAID_ADG), and diagonal-parity RAID are the same thing.
But I know that RAID 6 and diagonal-parity RAID, while very, very similar, are not exactly the same. RAID-6 calculates the Q block from the same row of blocks that it calculates the parity P block (but with a different function). Diagonal-parity calculates its 2nd parity per row using the same function (XOR) that the row-parity P block is calculated with, but using a "diagonal" set of blocks.
So ... what is Advanced Data Guarding? Is it exactly the same as RAID 6, but with a catchier name? If not, how is it different? -- User:DavidCary --70.189.73.224 03:14, 30 May 2006 (UTC)
[edit] assorted comments:
-- i think it would be helpful to establish up front that some raid terminology has entered common usage in ways that may not be strictly correct and to then use this idea as a structural element throughout the rest of the article. that is, for each topic make a point of treating both the correct and pop/marketing meaning of each term. that opens up the queston of what is "correct." for starters, let me suggest "as specified in the standards documents" or as "empirically demonstrable".
an example of something "empirically demonstrable" would be the idea that not all "hardware raid" implementations are implemented in hardware to the same degree. some simple mirroring or striping controllers implement all logic in hardware, some sophisticated controllers run embedded raid software on a general purpose cpu, sometimes with a coprocessor for parity. there may not be standards defining this stuff but many raid controllers have diagnostics that will tell you what's inside so you can check this.
-- some empirical data regarding performance and reliability of various raid levels might be of interest as well. calculations are very handy things, but the proof is in the pudding.
-- my original understanding of the term JBOD was that it meant just that. Just a Bunch of Drives, working independently. no concatenation, no nuttin! at that time folks were refering to concatenations as "cats" and jbods as jbods. since then i've seen the term jbod used to refer to cats, which i consider incorrect and confusing. this another area where discussing the strict and popular usage of a term would be helpful.
-- contrary to the article and in agreement with a previous comment, AFAIK raid 0 does not provide redundancy but does provide a performance benefit (all else being equall) which is the advantage it has over a concatenation. performance and reliability considerations are sometimes independant of each other.
-- as one would expect, the article focused mostly on theory. it's impossible to explore all the implementation considerations and all the permutations in a concise article (if at all). however, i think it would be worth noting that generally implementation specifics have a lot to with actual performance and that in real life "all else" is rarely equal. (go ahead and stripe all the disks you want. put a slow interconnect between the disks and your hosts, you won't see the benefit, etc.)
-- there is enough confusion about terminology in the field that we need to be very careful about checking definitions when communicating, especially when starting work with a new team, or kicking off a new client/vendor relationship.
-- i'd like to see some attention to the trade-offs involved in selecting raid configuations and how changing hardware economics change those trade-offs over time.
for example, in some cases application requirements that might have called for raid 5 on a dedicated controller a few years ago can now can be satisfied by software raid 1 because disks are bigger and cheaper and cpu cycles are cheaper too. there are also considerations of how the app behaves. offloading raid work from the cpu(s) makes a lot sense when the cpu(s) have things they can do in the mean time. if you have a host that's dedicated to a single i/o bound application that is just going to sit and wait for disk reads to complete before getting on with it, you might as well use software raid, as the cycles it uses are effectivley free. (yes, i know, that was very broad, but i'm just trying illustrate the type of subject matter i had in mind.)
-- i'm curious about the write algorthms used in various raid 1 implementations. theoretically, raid 1 should be fast on reads (if a round-robin, or race algorithm is used) and slow on writes as writes on both disks need to complete before the transaction is complete. one can imagine optimizations to the write, but they ain't free. i wonder what's actually being used especially in controllers that provide hardware assistance. i've been asked about this a few times and don't know the answer. maybe someone reading this does?
- ef
[edit] RAID parity blocks
I have places tags to merge-in the section RAID parity blocks from the article Parity bit. I think this should be included on the same page as RAID, as it pertains only to RAID, and should only be referenced from the Parity bit article for further reading. This is my username; I decided to make oneThe7thone1188 21:28, 3 June 2006 (UTC)
- I agree. --DanDanRevolution 18:50, 4 June 2006 (UTC)
- I concur. It's an implementation of parity specific to RAID and would greatly increase user's comprehension of how RAID's implementation of parity works. Dr1819 11:56, 10 June 2006 (UTC)
OK then; It's a week later, and I have just, moved the section, removed the banners, and removed all the old links to the parity bit page. I put it at the beginning of the RAID Levels section, because that seemed logical. I've done my part, so I will unwatch the page. Please move the section if you think there is a better or more logical place for it. Thanks! The7thone1188 03:40, 11 June 2006 (UTC)
[edit] Z RAID
Conceptually, it's called Z RAID, as it's the end all, be all, of RAIDs. Ideally, Z RAID would be able to use a nearly unlimited number of drives of varying manufacturers, speeds, and interface connections, whether locally grouped or distributed, and would include automated detection and addition to the volume, as well as automated load balancing and throughput optimization while using a triple-parity mechanism similar to the dual-parity mechanism used by RAID 6.
This section of the Talk page is intended to foster serious consideration about the principle shortcomings of nearly all current RAID designs, and stimulate industry interest and innovation in coming up with a solution that can work for all applications.
Criteria:
1. 32-bit addressing would be required to support a virtually unlimited number of drives.
2. Storage database (similar to NTFS' MAT) required to support drives being from varying manufacturers, of differing sizes, and throughput speeds and access latencies.
3. Associated software management required to work in conjunction with the HD controller to analyze size, throughput, and latencies to provide for automated load balancing and throughput optimization.
4. It would be nice if you could use any network shared resource, including excess server or workstation space on the network.
Future benefits - This approach, if perfected, could mean the end of operating systems residing on either the workstation or the server. Instead, a single, multi-user OS could reside on all the computers, with enough fault tolerance such that you could remove a third of the computers without loss of data or program/OS code - and the OS would automatically rebuild it's self-image to re-establish the requisite redundancy on the smaller footprint, prompting the administrator to "Add storage - capacity currently too small to support maximum level of parity."
Comments, anyone? Dr1819 15:02, 10 June 2006 (UTC)
- Sounds cool, but I think it sounds like it's a bit too hard to implement... A hardware RAID controller needs disks that are all physically the same, because it's most efficient. Something like Google's cache server filesystem/network was built from the ground up with a need for high fault-tolerance and multi-read/multi-write of large files. A web-based OS would be best designed around something like that, assuming it's big and bloated (because of cool features), instead of some Damn Small Linux release, or similar. Basically, it's a cool idea, but there're already better solutions to the problem. =D The7thone1188 03:48, 11 June 2006 (UTC)
-
- better solutions to the problem -- please don't taunt me. Tell me what these solutions are.
-
- a bit too hard to implement -- the first law of engineering: If someone does it, it must be possible.
- It sounds like you want a fault-tolerant distributed file system, aka Distributed fault tolerant file systems and Distributed parallel fault tolerant file systems. Examples include MogileFS or Gfarm file system or Hadoop Distributed File System. Coincidentally (?), your suggestion of the term "Z RAID" sounds very similar to one of those systems, "zFS".
- --70.189.77.59 03:30, 26 October 2006 (UTC)
[edit] RAID1 read speed is more like a single disk read speed
Let's assume that data is something like ABDCEF... spread across two disks. If stripe size is less then one disk cylinder size, the first disk reads A while the second one reads B. After that, the first one should read C and the second one should read D, but the first disk has to wait while data B passes under it's reading head (data is ABC... and not ACB..., remember) before head reaches data C. The same thing happens with the second disk - one of them will always read even and other one odd stripes, but they are not able to 'skip' data that other disk has already read. If stripe size is bigger then one disk cylinder size, after parallel reading of A and B, the first disk should issue a seek to 'jump over' the next one/few cylinders where data B is stored to reach cylinder where data C begins. So, here we have non-sequential reads with a lot of seek + disk rotation latency penalty. Here's one example: 2 WD Raptors (sda and sdb) form md0 software RAID1:
manchester ~ # hdparm -t /dev/md0 /dev/md0: Timing buffered disk reads: 178 MB in 3.00 seconds = 59.26 MB/sec manchester ~ # hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 176 MB in 3.03 seconds = 58.09 MB/sec manchester ~ # hdparm -t /dev/sdb /dev/sdb: Timing buffered disk reads: 184 MB in 3.01 seconds = 59.14 MB/sec manchester ~ #
As you can see, streaming speed of the RAID1 is the same as streaming speed of a single disk.
- I looked at the linux raid 1 code and it looks like it's optimized for random access and for a sequential read per disk. The read balancing algorithm basically first tries to select the disk in which the next sequential read sector is the requested sector. If none is found then the closest disk is chosen. So for a single sequential read, one disk will most likely do all the work. Too bad hdparm doesn't take a starting sector number as you'd be able to start two copies at different spots in the array and each would give you 60 MB/sec.
- The raid article discusses theoretical maximums. Each raid implementaion (linux, windows, hardware) will be different. Some will handle sequential better and some will handle random better. How close an implementation comes to the theoretical maximum shows how good it is. As for "one of them will always read even and other one odd stripes", that's just one example implementation (although you should call them blocks in raid 1). Nothing stops the software/firmware from starting the other disk on another head/cylinder as you suggest.
- As an example: The linux raid 1 driver will handle two sequential reads at full speed. The nforce raid controler in my windows box doesn't and seeks like crazy.
- Avernar 05:23, 11 July 2006 (UTC)
-
- If you wanted to optimise for total time to read in a large file you would split the file in half and read in one hald from each drive. Trouble is the OS doesn't know in advance how much data the app will ask for so it can't really do that, reading in big blocks would approximate that goal fairly well though.
-
- From your description I suspect the linux software raid was optimised for server use, servers generally do a lot of random access as different clients are requesting different things at the same time so it makes more sense to give different work to the drives than to try and split the same job between them. Plugwash 18:50, 16 April 2007 (UTC)
[edit] "There are even some single-disk implementations of the RAID concept."
This claim is made in the last paragraph of the introduction, then never mentioned again as far as I can see. Is this true? How can you habe a single-disk implementation of the RAID concept? Can someone more knowledgeable than me elaborate on this, or remove it? --Stormie 00:02, 14 July 2006 (UTC)
Yes, it is possible to run RAID on a single drive. I have no idea, though, why anyone would want to do it. The point of RAID is to offer redundancy and/or performance increase. While it is possible to RAID on 1 drive, it is not reccomended because of decreased storage space, no redundancy, decreased performance, and no fault tolerance. As with many things, just because it can be done, does not mean it should. Because it is very infrequently done, pointless, and inconvenient, not to mention that it would likely require its own section, I hesitate to expand on the topic. However, because the information is correct and slightly informative, I do not believe it should be removed. Freedomlinux 20:38, 25 August 2006 (UTC)
I have expanded on it a while back (yeah, that's me, the IPd guest... anyway) but now that I look at it, it really is in a random spot. The section talks about history of RAID, and randomly, it pops up with, "There are single-disk versions! Just so you know," and makes even less sense, since it goes from talking about the history to, "There are EVEN single-disk..." (Hey, I just expanded on it, I didn't read around it...). I propose on moving it somewhere, but not giving it it's own section unless we absolutely have to... does anyone have any suggestions on where to move it? -- DEMONIIIK 06:32, 16 February 2007 (UTC)
[edit] Wrong image with RAID 1.5
The article writes "RAID 1.5 is a proprietary RAID by HighPoint and is sometimes incorrectly called RAID 15." However, the image associated with this claims to be a diagram of RAID 1.5, while RAID 15 is illustrated.
I have never edited any Wiki's before, so I figured I'd best put this on for discussion instead.
[edit] Probable mistake in RAID 10 section
There's an error, or at least an unclear section, in the Linux RAID 10 section which reads:
"This md driver should not be confused with the dm driver, which is for IDE/ATA chipset based software raid (ie. fakeraid)."
There are a lot of RAID drivers in Linux, and three subsystems: MD, LVM (now LVM2), and DM. The MD driver supports various raid levels and is entirely software oriented -- nothing to do with IDE specifically. The DM driver (device mapper) is a subsystem which supports various RAID modes and again has nothing to do with IDE specifically.
128.123.64.215 02:46, 25 July 2006 (UTC)
[edit] Vinum volume manager
I've created a article about the vinum volume manager (software raid) See: Vinum_volume_manager
I've added a link to See Also
Also, This site: [Logical Volume Manager Performance Measurement] might prove usefull for the hardware vs. software raid section. I added it to the external links section.
I leave it upto other people to properly integrate this into the article, since I only have experience with software raid and never used hardware raid...—Preceding unsigned comment added by Carpetsmoker (talk • contribs) 00:26, July 29, 2006
[edit] Hardware RAID compared to Software RAID.
In my opinion this should be split into a new article. for obvious reasons...—Preceding unsigned comment added by Carpetsmoker (talk • contribs) 01:56, July 29, 2006
I disagree. It belongs with the body of RAID article, or wit ill get overlooked. - Chris. Jan 1, 07
- On the same topic, I've made an attempt at cleaning up the hardware vs software RAID section - there were a lot of meta-data comments in the source, and the prose was even worse than mine! I've tried to wikify as best I can, but am still not 100percent with it.
- I'm thinking, a general discussion of the 3 types followed by a pros /cons for each i.e. :
-
- Hardware RAID
- Pros
- Cons
- Software RAID
- Pros
- Cons
- Hybrid RAID
- Pros
- Cons
- Hardware RAID
- Any thoughts ? Baz whyte 20:04, 26 February 2007 (UTC)
[edit] RAID0 seek times
I removed this sentence:
"If the accessed sectors are spread evenly among the disks then the apparent seek time would be reduced by half for two disks, by two-thirds for three disks, etc., assuming identical disks. For normal data access patterns the apparent seek time of the array would be between these two extremes"
Because it's completely nonsense. Let's take two disks as example... the I/O can start, as soon as the first bit of the data is found.
Case 1: If this first bit is on the disk with the faster seek time, the I/O can start as soon as this disk has accessed the sector. But if the slower disk hasn't found its first block of the data, until the faster disk has reached the end of its first block, the faster disk has to wait for the slower one. For a block size of 128k an a transfer rate of 50MB/s, the disks needs only 2.5ms to read a block.
Case2: If this first sector is on the disk with the longer seek time, the faster disk has to wait.
For Case 1, the seek time can be below the one of a single disk, but the maximum of the reduction is defined by the block size and the transfer rate. For Case 2, the seek time will be higher, only limited by the maximum seek time of the slower drive.
When using synchrinized spindles, both disks have to perform the same operation, thus the seek time will be near the one of a single drive.
Even if the two disks could read/write completely independently, there still is no reduction by one half for two drives or two-thirds for three drives. Everyone who has some knownledge in the theory of probabilities will know, that the reduction of seek times in this case depends on the distribution of the seek times and the distribution of the relative head postions.
Just an easy example: Two identical drives, seek times are 2ms, 5ms or 8ms, all with a probability of 33.33%.
? Average seek time for a single drive: 5ms
Probabilities for two drives:
First 2ms, second 2ms ? p = 1/9, resulting seek time: 2ms
First 2ms, second 5ms ? p = 1/9, resulting seek time: 5ms
1. 2ms, 2. 8ms ? p = 1/9, 2ms
1. 5ms, 2. 2ms ? p = 1/9, 2ms
1. 5ms, 2. 5ms ? p = 1/9, 5ms
1. 5ms, 2. 8ms ? p = 1/9, 5ms
1. 8ms, 2. 2ms ? p = 1/9, 2ms
1. 8ms, 2. 5ms ? p = 1/9, 5ms
1. 8ms, 2. 8ms ? p = 1/9, 8ms
Combined:
2ms, p=5/9
5ms, p=3/9
8ms, p=1/9
Average seek time = 5/9 * 2ms + 3/9 * 5ms + 1/9 * 8ms = 3,67ms
Or another example: If all seek times are in a range between 3ms and 20ms with an average of 11.5ms, how could they go down to 2.3ms for 5 disks?
--JogyB 10:39, 31 July 2006 (UTC)
- Isn't this true for RAID1 as well?
195.159.43.66 14:55, 25 September 2006 (UTC)
This affects all RAID levels, as soon as data is read from or written on two or more disks simultaneaously - at least afaik. --JogyB 07:48, 26 September 2006 (UTC)
Correction: The situation is different for RAID1, as both disks contain all of the data... in this case they are able to read completely independent. --JogyB 21:28, 30 October 2006 (UTC)
- It's not completely nonsense or even partially nonsense. Might not be explained the best way but not nonsense.
- You've made the bad assumption that the disk access must be synchronized. One disk does NOT have to wait for the other to finish what it's doing. If I issue a read for a block on one disk and it takes forever (bad disk, whatever) and then issue reads for 10 blocks on disk 2 (you'd have to know the block size to do this right) the 10 blocks while be read regardless of what's going on on the first disk.
- Second, synchronised spindles means exactly what it says. The spindles are synchronised, not the heads. This is most useful in RAID 1 during a write operation where all disks must be written at once.
- Your probability table is for a brain dead controller in RAID 1 mode which doesn't know which head is closer to the block requested. Your table is saying that both drives would seek and the first drive reaching the block would stop the other drive's seek which is also incorrect. The other drive would still have to complete it's seek (can't abort the read command) so each result would be the maximum number, not the minimum. Your average would be closer to 8ms which would be slower than a single drive which is why I call it a brain dead controller. The table could also be for one single sector access in RAID 1. But you can't measure average seek with one sector.
- In RAID 0 each disk has a different set of data. When you ask for a block (or just one sector in the block, doesn't matter) only one disk in the array is physically capable of retrieving that data. The other disk doesn't have that block so even if it seeked to the same cylinder as the other drive and made it there faster, it would accomplish squat.
- In my example you'd be retrieving one block from each disk in the RAID 0 array. For a two drive array that means two blocks. A single drive (no raid) would also have to grab two blocks to do the same work. Each disk in the raid array would seek to it's block in 5ms (average) at the same time for a total of 5ms. The single disk would need to do two 5ms seeks one after the other for a total of 10ms. 5ms/10ms = 1/2 = half.
- Take your last example with a 11.5ms average seek. A seek benchmark program lets say averages the seek of 1000 random sectors for a single disk and gets 11.5s. 11.5s / 1000 = 11.5ms. Now what happens on your 5 disk RAID 0 array? Each disk does 200 seeks in 2.3s and since they're all doing this simultaneously (and independently) it takes 2.3s total. Now the benchmark calculates 2.3s / 1000 = 2.3ms. That's why I used the word apparent. It looks like the array is performing as a single 2.3ms drive. On RAID 0 the 200 sectors would have to be distributed evenly across the discs to get exactly 2.3ms. If all the sectors happened to fall on one disc then you'd get 11.5ms. Normal access would be somewhere in between. For RAID 1 it's not a problem since each disk has the same data so each drive would take exactly 200 sectors each.
- I'm reverting that edit due to your bad assumptions and that my original text is still correct.
- Avernar 07:46, 28 October 2006 (UTC)
- You're talking about bad assumptions?
- Your assumption means that a disk never read linearly and each block had to be searches separately or at least the degree of fragmentation was n times lower (n = number of disks) for the RAID set... that's nonsense. In reality, the degree of disk framentation will (on average) be exactly the same for a single disk and a RAID set, as the RAID set appears as a single disk for the operating system. So if the single disk has to perform a seek, the RAID set also has to and then we're back at my assumptions. I looked at a lot of Benchmark results and tested several systems myself - I never found the effect you're describing.
-
- When you're benchmarking seek times you DON'T want to read linearly. You want to seek as much as possible with as little data transfer as possible. Usually that means issuing a verify command for a sector so no data will be transfered but that may not always be possible so you have to read at least one sector and subtract out the time it takes for the read+transfer.
-
-
- In this point I disagree. You want to perform as many seeks as possible, however it still should have something to do with "real" application. So a seek test should still be performed with a little data transfer.
- JogyB 10:47, 13 November 2006 (UTC)
-
-
-
-
- Depends on the benchmark. There's the synthetic kind that measures just one attribute like I describe and there's the "real world" benchmark that you describe. Depends on what information people want. A bunch of synthetic benchmarks measuring different things (seek times, throughput, etc) lets people make an educated decision if a real world benchmark doesn't exist that simulates the access pattern that their particular application does.
- Avernar 05:01, 14 November 2006 (UTC)
-
-
-
- The only assumption I made was thinking that you were talking about low level block I/O. You're talking about file I/O. File fragmentation and how multiple applications access the files can't be predicted. It's the file system that sees the array as a single disk. For software raid the OS's raid driver sees the disks as independent, for a raid add-in card the card's driver knows about the discs. Only for an external RAID enclosure would the OS not have a clue about the drives. The Linux software RAID 1 driver knows exactly where the heads are. For RAID 0 the driver or hardware controller doesn't have a choice. It's the access pattern from the file system and block cache levels that determine which disk to go to. Do the requests favor one disk over the other or are things pretty even?
-
- "So if the single disk has to perform a seek, the RAID set also has to and then we're back at my assumptions." Now here's why I said you made a bad assumption (no offense meant, BTW). Both discs in the RAID 0 set DO NOT seek for a single read request, only the one that has the data. This is a fact. I'm not wrong about this. If a RAID controller can't do this it's badly designed as it would have to issue a read for the same LBA on both disks even though on one disc it would be the wrong data in RAID 0. Note: There is no Seek command, it's done as part of the Read command. Forgive me for beating this point to death but if you disagree on this point the rest of what I say is moot. So let me know and will discuss this point first.
-
-
- There's no discussion needed... I think our problem really was that we were talking about different levels. But still the seek has to performed on both disks when the file is larger than the block size of the RAID. So you have two seeks where the single drive only has one and this leads to the same apparant seek time for single disk and RAID0 (when the disks are synchronized, impossible to tell what happens when they are not). And this is exactly the situation you're referring to in the article.
- JogyB 10:47, 13 November 2006 (UTC)
-
-
-
-
- No, I was talking about RAID 0 but with a specific access pattern. I do agree with you that we do need the long transfer situation in there as well that you describe above (sequential file reads). But we should also keep what I wrote about the short transfers (random file reads). I'll add that to the article in the next few days or put it in here first so we can look it over before putting it in the main article.
-
-
-
-
-
- Disc synchronization is not a problem as the buffer cache will take care of things for regular file reads. In Windows for example you can give a flag on the open to optimize for sequential reads or random reads. So for a sequential file read the data from one disk will go into the buffer and the disk can go off to service other application's read requests. When the other disk gets around to read it's data the file operation for that application will finally complete. If you're using IO Completion Ports then the buffers you supply can fill out of order so it's no problem there. And for a application load the OS uses memory mapped files so they blocks can complete out of order as well. With more that one active application using the discs a sequential read at the application layer may not cause a sequential read at the block layer. That's why I like describing what happens at the block level as only the reader (of the article) knows what access pattern his applications/system is likely to produce. Like I mentioned elsewhere it's probably a good idea to make a section that gives examples of what block/disc access patterns different applications and different machines (desktop/video edit/server/database) generate.
-
-
-
-
-
- Avernar 05:01, 14 November 2006 (UTC)
-
-
-
- Now if you do two reads in a row in parallel on random parts of the disk (remember, block level not file level) then two things can happen: 1) Both blocks are from the same disk and the array has to do them serially like a single non raid disk or 2) each block is on it's own disk so the array can do them in parallel at the exact same time.
-
- Since we can't predict what's happening at the application and file system layer I presented two possible and diametrically opposed extremes at the block level. The first extreme is that all blocks requested are all odd blocks or all even blocks (worst case). One disc gets all the requests and therefore the array performs like a single non raid disk. The second extreme (best case) is that the blocks are requested in a perfect alternating odd/even pattern. Both disks are seeking and reading in parallel and you get half the apparent seek time as compared to a single non raid disk (you'd also get twice the throughput too).
-
-
- This is only true if single blocks are read randomly. As soon as there is linear reading (of at least as many blocks as there are disks in the RAID) this is not true anymore. I try to make an example:
- Single Disk: A0 A1 B0 B1 C0 C1 D0 D1 E0 E1 F0 F1 G0 G1 H0 H1...
- RAID Disk 1: A0 B0 C0 D0 E0 F0 G0 H0...
- RAID Disk 2: A1 B1 C1 D1 E1 F1 G1 H1...
- Now if you request A0 C1 E0 F1 then you are absolutely right.
- But if you request A0 A1 B0 B1 and F0 F1 G0 G1 then the single disk has to perform two seeks and the RAID has to perform four seeks distributed on two disks - exactly the same situation for RAID and Non-RAID.
- You're focussing a little bit to much on the block level... in most cases it will be a sequence of blocks rather than a single block that is read.
- JogyB 09:56, 13 November 2006 (UTC)
-
-
-
-
- The block (stripe size) is typically around 64K by default. So in your example the read of four blocks would be reading 256K worth of data. Throughput would be the bigger factor here rather than seek time. This access pattern is typical of application loads, image loads, audio and video presentation. As long as the filesystem is not too badly fragmented then seek times are not a big issue. Now for a database there will be a lot of small reads typically 4K in size all over the disk. Now here the F0 F1 G0 G1 pattern is more likely and only 6.25% of each block is read. The 4 x 4K = 16k worth of data will just fly from the platters and through the buses and now seek times are the bigger factor.
- Yes I'm focusing a lot on the block level as it's what's between the filesystem and the array. If you don't know what's going on at the block level you have an incomplete picture of what's going on. You even resorted to a block level example above to prove your point so you can see why it's important. Now it's also important to link what type of access at the file level generates what type of block request pattern at the block level. In your case you didn't consider a database access pattern. Could be an idea for a new section, what at the filesystem level causes what at the block level. I saw a question in the newsgroups where someone was asking what raid level should they use for a database.
- Avernar 05:01, 14 November 2006 (UTC)
-
-
-
- Now the problem is that if the parallelism is broken anywhere along the request chain from the application to the discs then you're going to always get the worst case. Just because your benchmarks didn't show it doesn't mean it's not possible especially if it was written to test a single drive and is not multi-threaded. Here's all the things that have to happen to get that extreme:
-
- 1) Benchmark has to be multi-threaded (or use I/O completion ports on Windows), at least one thread for each disk in the array.
- 2) Bypass the file system and block cache and talk to the block layer directly.
- 3) Each thread issues a read (or verify if it can do it) for the data on one disc only. Split the threads evenly. Need to know the stripe size to do this properly.
- 4) Request only 1 block, we're measuring seek times here not throughput. Spread the requests all over the disks randomly.
- 7) The raid chip/controller/driver must not serialize the requests.
- 5) The low level IDE/SCSI device drivers must not serialize the requests.
- 6) The BUS must not serialize the requests. For SCSI TCQ (Tagged Command Queuing) must be enabled. For IDE each must be on it's own controller channel.
-
- If all that happens then you WILL get half the apparent seek time of single disk. Now if you change #3 and have each thread randomly pick a block then you'll get more realistic results instead of the extreme best case but it shouldn't fall close to or at the worse case. If you get the performance of a single drive (worst case) then one of the steps other than #3 is causing a problem. Note: for RAID 1 the change to #3 would not change anything since each disc has the exact same data.
-
-
- Again: This is absolutely correct for randomly reading single blocks. However, this will nearly never happen, so you're describing a best case with no reference to practical application. And I don't think that this is really interesting to people reading this article. It could be mentioned, but in the way the text is written now, readers may think that you will always get half the seek time when using RAID 0. But - as already mentioned - is not what Benchmarks of RAID0 systems show. Sure, these are synthetic Benchmarks, but not that far away from "real" applictaion as the one you are suggesting.
- JogyB 10:47, 13 November 2006 (UTC)
-
-
-
-
- "However, this will nearly never happen, so you're describing a best case with no reference to practical application" Nope, database. :) See the answer to your question below for a reason for it. But I see your point where readers might think they will always get that seek time. Let me know what you think after I clarify those sections.
- Avernar 05:04, 14 November 2006 (UTC)
-
-
- And to that: "One disk does NOT have to wait for the other to finish what it's doing. If I issue a read for a block on one disk and it takes forever (bad disk, whatever) and then issue reads for 10 blocks on disk 2 (you'd have to know the block size to do this right) the 10 blocks while be read regardless of what's going on on the first disk."
- Read once more what I've written... "as all disks need to access their part of the data before the I/O can be completed". Look at the last word: completed. If the second disk needs forever to read it's block, the first drive can read 10 billion blocks, however the transfer will never be accomplished. Even if the two disks worked completely independent, still the slower of the two disks would define the end of the transfer. Just think about starting an application... won't work with every second block missing.
-
- Again, you're talking file I/O and I was talking block I/O. I agree that the completion for a read for a single file would not complete if one of the discs were taking forever. But 10 other files being read by 10 other applications might succeed if they lucked out and the data they wanted happened to be on the other disk or already in the block cache. But reading a single file is not usually a seek intensive operation. A database on a busy server on the other hand would put a lot of seek stress on the raid system.
- If the data is completely in the cache or on the other disk, then you are right. But if only a single block hat to be read from the disk with the bad seek time, this will affect the whole transfer. Also see my exmaple above.
- JogyB 09:56, 13 November 2006 (UTC)
- Again, you're talking file I/O and I was talking block I/O. I agree that the completion for a read for a single file would not complete if one of the discs were taking forever. But 10 other files being read by 10 other applications might succeed if they lucked out and the data they wanted happened to be on the other disk or already in the block cache. But reading a single file is not usually a seek intensive operation. A database on a busy server on the other hand would put a lot of seek stress on the raid system.
-
-
-
- Not for a multi-threaded database application. See my reply above. :)
- Avernar 05:01, 14 November 2006 (UTC)
-
-
- I'm reverting your edit because you're original text was incorrect and still is incorrect. Better think about your assumptions.
- JogyB 20:49, 30 October 2006 (UTC)
-
- Now you're being rude. You changed it the first time but that's OK since you didn't know I was still around. It would have been more polite to discuss a request for a change first since it's not something that's obviously right or wrong. So I changed it back with the implied hint "I don't agree with you, everyone else thought it was OK, let's discuss if you think I'm wrong and it needs to be changed.". I'm changing it back since at the moment you're the only one who thinks it's wrong.
-
-
- No, I'm not. Look at the "What RAID cannot do" section. And this is what I heard of several people using RAID and read in several articles. In fact, I never heard or read about the reduction of seek times for RAID0. ;)
- Maybe you can show me an article or website supporting your statement (already looked for one, but most websites are referring to Wikipedia).
- I'll leave your text in the article, let's discuss this first.
- JogyB 09:56, 13 November 2006 (UTC)
-
-
-
-
- I assume you're talking about point #3. Yes for a desktop system you're not going to get much out of RAID 0 unless you're doing a lot of video editing, the doubled read AND write throughput does wonders there. The person who wrote that only focused on desktops and not other things like file or database servers. Like I've mentioned above, I've seen server admins reading this article to get information as well. And he's wrong about the no seek performance improvement. Heck, even you agreed above that the database access pattern that I keep describing will improve seeking. No sure what he means by buffer performance...
- Avernar 05:01, 14 November 2006 (UTC)
-
-
-
- If you're still not convinced let's discuss it further. Convince me I'm wrong and I'll even correct it myself. If anyone else has questions or an opinion join right in. I want this article to be accurate as well. I'll keep checking the discussion page daily as it's not emailing me when someone adds to the discussion...
-
- Avernar 03:32, 13 November 2006 (UTC)
- Just one question: Is it clear to you, that we're talking about RAID0 and not about RAID1? For RAID1, it is nearly correct what you're writing (in other words: you're describing the best case), however the situation is different for RAID0, as each disk contains only half of the data.
- JogyB 21:28, 30 October 2006 (UTC)
-
- Yes I'm talking about RAID 0. And you're this close to understanding what I'm talking about. You say that I described the best case for RAID 1. Now here's the core of my argument: The best case for RAID 0 is the same as the best case for RAID 1 as long as one condition is met, that the requests for blocks are spread evenly so that half the requests are for data on disc A while the other half is for data on disc B.
-
- And here's the other part of my argument: The real world results are going to be between the best case and the worst case. That's what I said in the article and I don't think anyone could find fault with that one. Doesn't matter if 99% of people are closer to the worst case and 1% are closer to the best case. I'm still right.
-
-
- You're right concerning the best case, yes. But as general statement (as it is written now) it is wrong... in such a case it's always better to talk about the average (if only a single value shall be provided). As example: If the income of people in the US is between 5.000$ and 1.000.000$ a year and only 1% is close to the maximum, is it correct to write that people in the US earn 1.000.000$ a year? I don't think so.
- JogyB 10:47, 13 November 2006 (UTC)
-
-
-
-
- There's three reasons why I put those "models" of the best and worst cases in there. First it helps people quickly compare the different raid levels on a more academic kind of level without having to worry about too many details. Second is that you can use those models to figure out the performance characteristics of the hybrid raid levels (1+0, 10, 150, and ones that we don't know about) without having to benchmark them. Third it lets people figure out what performance they'd get if the average number does not apply to their situation. An average or expected number should also be provided but you do have to specify under what circumstances this occurs as different applications and different machine roles have different access patterns. I'd LOVE to see benchmark numbers for all the real world situations but it would be a lot of work. Hopefully we can add all those numbers some day.
- Avernar 05:01, 14 November 2006 (UTC)
-
-
-
- Now if you're getting the worst case then that's statistically highly improbable and I'd suspect there's something wrong with your system or the test you're doing.
-
-
- You're example says that the best case is statistically highly improbable ;)... and in my opionion this is true for RAID0 - or give me an example when huge amounts of single block reading will be done. However, all Benchmarks (my own, in forums, websites, computer magazines) I've seen until now show a slightly increased seek time for RAID0 (ok, nearly all, in a few the seek time was reduced by 0.1ms).
- JogyB 10:47, 13 November 2006 (UTC)
-
-
- Avernar 03:32, 13 November 2006 (UTC)
-
-
- Just one question: Do you really think that random access of single blocks is the main application when operating a RAID0 (or RAID1)? JogyB 23:08, 13 November 2006 (UTC)
-
-
-
-
- YES!!! For a database server. For any kind of performance on a database server the indexes need to be cached in RAM for the most frequently used tables. The cache is usually primed on startup. From then on most of the access (unless you're running some kind of report) is for single rows out of the database or for rows scattered across the table based on some search criteria. Think of the the DMV or VISA as an example. Thousands of requests for individual records all over the disk. I believe that SQL server uses a 4K or 8K page size and I've heard of one that goes as low as 512 bytes but don't remember which one.
-
-
-
-
-
- RAID 0 for a database, maybe (if you need the space, can't afford a lot of disks, and the machine is part of several identical ones in a cluster). RAID 1 with more than just two discs, yes. Hybrid levels that use RAID 0 as a sub component, YES. This is why I think that this information is important.
-
-
-
-
-
- Avernar 05:01, 14 November 2006 (UTC)
-
-
-
-
-
- And I just thought of a desktop example. P2P applications especially BitTorrent do a LOT of random reading and writing of small blocks all over one or more files simultaneously. RAID 0 would be perfect as RAID 1 would slow down because of the writes.
- Avernar 05:21, 14 November 2006 (UTC)
-
-
-
-
-
-
- I'm writing it as one answer, as I think we have quite the same opinion now. The database also came to my mind when I went to bed tonight, but I was to tired to get up again ;) (my last post was at 00:08 local time). Ok, so I think we agree that it depends on the access pattern... on a desktop system, you might get about the same seek time as for a single disk, a database (or applications with similar access pattern) will profit of reduced seek times (halved for two disks in the best case). I'll leave it to you to add this to the article (and please also to RAID 1, you will get a reduced seek time in any case, but the reduction described now is again only the best case), as I think you're a native speaker and I'm not.
- JogyB 08:20, 14 November 2006 (UTC)
-
-
-
[edit] Clarification on "proprietary".
Just my opinion, but perhaps a better term would be "non-standard"? Proprietary typically implies that the specification isn't readily available, but several of the mentioned alternatives are openly available. -Matt 20:23, 5 August 2006 (UTC)
[edit] Moved some sections, removed disclaimers
I especially think that a section which explicitly "assumes knowledge of RAID configurations" and directs to another site for the "basic explanations" is contrary to the purpose of a general-knowledge encyclopedia. I removed that disclaimer and moved that section down below the explanations of RAID configurations to hopefully give newbies some background, so they can understand Wikipedia content w/o having to reference other sites. Hope this edit didn't offend anyone, but we can certainly discuss it here and I won't be offended if someone thinks a revert is in order. Icewolf34 19:46, 8 August 2006 (UTC)
[edit] Something missing in RAID 5
Text from the RAID 5 article:
"The parity blocks are not read on data reads, since this would be unnecessary overhead and would diminish performance. The parity blocks are read, however, when a read of a data sector results in a cyclic redundancy check (CRC) error."
I understand that a CRC error is when the parity information extracted from the data blocks is different form the parity block. Am I wrong?
If I am not wrong, how is it possible to know that a CRC exists without reading the partity block?
Pau
- modern hard drives have inbuilt CRC and other data integrity checks and on a read are able to either give back the data or report an error. The chances of a modern drive giving back 'wrong' data (i.e. different from that written) are very small. --Ali@gwc.org.uk 21:03, 24 September 2006 (UTC)
-
- While I certainly hope Ali is correct, I think this is an important enough fact that it ought to be mentioned in this RAID or perhaps the hard drive article. Can you give a reference? --76.209.28.72 18:11, 30 June 2007 (UTC)
[edit] Windows LVM/RAID issues
The section about JBOD implies Windows lacks "LVM" mechanism and only supports JBOD. However Windows XP supports software RAID 0 and 1 through using logical volume based abstraction. Windows 2000 Server and Windows Server 2003 versions additionally support software-based RAID 5. All versions support any kind of hardware-implemented RAID flavors.
- Dynamic vs. Basic Storage in Windows 2000
- How To Use Disk Management to Configure Dynamic Disks in Windows XP
- How To Establish a Striped Volume with Parity (RAID-5) in Windows Server 2003
SSG 06:09, 29 September 2006 (UTC)
- LVM in Unix/Linux means being able to span disks together to create a pool of blocks called the Physical Volume. Logical Volumes (partitions) can be dynamically created, destroyed, expanded and shrunk on the fly in the Physical Volume. The pieces of the Logical Volume may not be contiguous on the disk. You can think of it like a "file system" for logical volumes. The logical volumes can be fragmented. The link to the LVM page explains this further.
- On Windows the logical volumes (partitions) must be all in one piece. They can span disks but can't have other partitions in the middle. Even though Windows may have something called a "Logical Volume" it's not the same as LVM.
- Avernar 06:51, 28 October 2006 (UTC)
-
-
- But do they really support RAID-5 or is it actually RAID-4? The Help in Windows XP Professional describes its software RAID-5 as using one drive for parity- which is NOT RAID-5, it's RAID-4.
-
[edit] cleanup request
This article used to be much shorter and better. I suggest we limit the introduction to 3 or 4 sentences, and that descriptions of the various RAID types on this page be changed to summaries with a link to separate wikipedia pages for each type, where detailed analysis of algorithms and things can occur. There is lots of unexplained jargon and formulas here which are completely useless for someone looking up RAID in general. Also, nonstandard and vendor-specific RAID implementations need to get their own pages or be excised. $0.02 Perle 23:20, 12 October 2006 (UTC)
- Agreed. Super Jedi Droid 09:49, 8 November 2006 (UTC)
-
- I have two SATA controller which can do RAID. Windows is implying that I can setup an array. I came here to find the answer, but this page appears to be written for persons who actually know what is RAID. How can this be a candidate to a "great page" or whatever honor?
[edit] The section on RAID-5E and RAID-6E has been rewritten to introduce bogus information
RAID-5E and RAID-6E refers to RAID-5 and -6 with spare when the spare is an active part of the block-rotation scheme. This spreads the I/O load out over all drives including the spare drive, and such a scheme is faster than RAID-5 or -6. Whomever wrote that not only is it not faster, but it is in fact impossible to make it faster, introduced active falsehood into the article. Hpa Wed Oct 25 00:43:46 UTC 2006
[edit] About Atomic Write Failure
On the subject of atomic write failure: Although it is rarely handled on the same level as RAID, it need not be. Modern OSes that use journalling have solved this problem already at the file system level. NTFS, EXT3, and ReiserFS all use journalling to ensure that writes are atomic. Implementing it on the same level as RAID would be redundant.
CobraA1 07:49, 19 November 2006 (UTC)
[edit] RAID 2 Bit or Byte??
Someone has changed the word bit to byte in the RAID 2 section. Other sites that haven't copied from here have it as bit. Can someone confirm this as I don't have any experience with RAID 2. Avernar 01:20, 23 November 2006 (UTC)
- The original (for the last year or so at least) was bit, it was changed without comment by an anon recently to byte; and then flipped back to bit by another anon - it would appear to have been accidentally reverted back to byte so I've restored bit, which would be correct as my memory and the few references I have on hand support. Let me know if we need to cite. Kuru talk 05:24, 23 November 2006 (UTC)
- I'm the one who changed it to byte on the 22nd of November believing I was fully reverting an otherwise odd edit. I'm leaving it as "bit" Poweroid 14:28, 23 November 2006 (UTC)
Can you Change a Raid Level? if so How?
Mrmojorisinca 13:59, 6 December 2006 (UTC)mrmojorisinca
[edit] RAID 10 on two disks
I've removed the paragraph about Linux Raid 10 on two disks from the nested raid levels section. Running raid 10 on two disks is exactly the same as running raid 1 on two disks except that the blocks are in a different order. This just adds more CPU overhead (raid 1 vs raid 10 processing) and will increase drive seeks if you have a couple of linear reads going on. And if one disk fails the other would have to seek like crazy which would reduce it's life expectancy. Not a good idea.
Also the Linux Raid 10 driver is covered in detail in the Proprietary RAID levels article and and briefly in the Nested RAID levels (when it's configured like standard RAID 10) so it doesn't need to be mentioned in the general article.
Avernar 10:27, 2 January 2007 (UTC)
[edit] Copyright violation
Ok, User:Matt0401 added a copy of the guide from found here (and added to WP here) and asked for permission to use on wikipedia here (I assume the forum user Matt Welde is User:Matt0401). I haven't found the permission response from the author but I have left a note on Matt's talk page asking to provide it with a request to reply here. I look forward to the response. Cburnett 18:01, 24 January 2007 (UTC)
- I see where someone asked for permission, but did they ever actually get permission, and did the original poster specifically say the text is under a free license? Those are the important things. Even being told that "you can use this content on Wikipedia" (which it doesn't look like he was told so) is not enough to allow it here. — BRIAN0918 • 2007-01-24 18:54Z
- Why is permission from the author insufficient? Cburnett 20:03, 24 January 2007 (UTC)
On a separate side note, I would definitely not be opposed for a rewrite with sources... :) Cburnett 20:07, 24 January 2007 (UTC)
[edit] IDE RAID -> ATA RAID
I just moved the contents of IDE RAID to ATA RAID, because IDE is an incorrect term to refer to ATA. However, I don't really think the ATA RAID article needs to exist at all, since this information is pretty much contained in this article. Unless anyone has a good reason it should stay, I'm going to change the ATA RAID page back to a redirect and integrate any relevant info into this article. Timbatron 07:16, 21 February 2007 (UTC)
[edit] Merge Discussion ("Standard RAID levels" into "RAID")
[edit] RAID and Controller Failures
With typical RAID systems (which are set up for increased reliabilty (RAID 1, for example)), not throughput), controller failures are significantly more often the cause for failure than disk failures.
So, for example, two persons set up RAID systems. The first guy's RAID conroller fails after 6 months, the other guy experiences a RAID controller failure after 3 years. Who is the lucky guy? The first one, obviously, since he'll be able to run to the next computer store and buy an identical controller as a replacement. ;-)
The problem is that different RAID controllers may organize the data differently on the disk in the array. Data which was written to a disk array with one controller may not be readable by a different controller.
Note that this may apply to software RAID as well: I once experienced a failure of a Windows NT 4 server, which used an external SCSI box with a set of disks via software RAID. As the main server box failed (not the external SCSI box), the SCSI box was simply moved over to the next NT 4 server machine. The RAID setup was not recognized there; resulting in data loss. (In theory, the data about disk configurations should be exported to an "emergency disk" right after any configuration change, so it can be reused after a failure. Unfortunately, Microsoft required the use of a 1.44MB floppy disk as a "safe" storage for this data. Which makes the prodeure of generating such an "emergency disk" impossible when this data exceeds 1.44 MB. Go figure). Fortunately, the srorage space on the external SCSI box was only used for temporary files at the time. =8-O
Anyway. The above mentioned issue is a significant (and often overlooked!) problem with RAID systems. It should be addressed in the artcile as well...otherwise, I might to begin to argue that it's not NPOV ;-)
Just joking. About the NPOV thing, the controller failure issue is real.
--Klaws 08:27, 6 March 2007 (UTC)
Unfortunately, I have to agree - I myself had a controller failure and had to use special software to recover my data. I thought RAID was going to be great, but now I'm just going to use a second harddrive as a backup and not use RAID at all. It just isn't worth it if the controller easily fails. CobraA1 09:47, 8 March 2007 (UTC)
[edit] Restore to version of Sept 2006?
It was suggested that I read the Wikipedia article on RAID. The person who suggested it was thinking of an earlier version of the article, from September 2006. For example, this is a good page: http://en.wikipedia.org/w/index.php?title=RAID&oldid=76647398
What we really like about that page is that it contains descriptions of RAID0+1, RAID1+0, etc., that are apparently very useful.
Maybe if I have time I will get around to actually reading both of them and editing them. —The preceding unsigned comment was added by Neurogeek (talk • contribs) 21:36, 15 March 2007 (UTC).
[edit] Why do they have batteries?
Will someone in the know please post a description of the role, size, pros/cons of batteries in hw raid controllers? MrZaiustalk 21:40, 17 March 2007 (UTC)
- I'm pretty sure it's to keep the cache from being lost which could yield corrupt/inconsistent data on the drives if lost. But I wouldn't say I'm "in the know." Cburnett 23:31, 17 March 2007 (UTC)
-
- I've been outta the game for a while, but here's my take. Disk controllers are often equipped with Write-Back Cache (WBC). WBC is a chunk of memory that stores 'writes' to the disk. When an OS sends a write, it hits the cache, and the controller then signals 'write complete'. The benefit is that memory works in Nanoseconds while disk-drives work in Milliseconds. The cache is later 'flushed' down to the disk. The battery does exactly what CBurnett says. It keeps the memory 'alive', should the power fail. Once the power is restored, the cache is 'flushed' to the disks, and everyone is happy.
-
-
- This is true. By the way the cache is a popular but optional add-on to RAID controllers. Generally cache has nothing to do with basic idea of RAID. --Kubanczyk 06:06, 25 June 2007 (UTC)
-
[edit] Explaining RAID to Mgmt
Explaining anything to management is a chore. Trying to get them to understand the cost/benes of different Raid levels is like pulling teeth! I learned a nice gimmick to do this. RAID is a 'triangle'. The three sides are: 1) COST, 2) PERFORMANCE and 3) AVAILABILITY. You can get any two, but never three. RAID-0 is good cost and performance but bad availability. RAID-1 is good performance and availability but bad cost. RAID-5 is good cost and availability but bad performance.
[edit] I'm not sure the addition to Basic Function is clear in its meaning.
The edit in question, states:
At the very simplest level, RAID combines multiple (and now even on X-large single) hard disk drives into a single logical unit.
I'm not sure I understand the meaning of "(and now even on X-large single)" and I'm guessing it should be removed. Any insights? I don't want to edit it out just because _I_ don't understand it. Dr. Zed 15:04, 22 April 2007 (UTC)
- I rewrote the whole section. The "X-large single" didn't make sense at all. Cburnett 16:19, 22 April 2007 (UTC)
[edit] Merge with RAID controllers
Since the two topics are so closely related, does anyone think these two topics should be merged? Royallywasted 07:10, 6 May 2007 (UTC)
- They should not be merged. I added {{main|RAID#Hardware RAID}} to Disk controller#Hardware RAID so each half could be better written. Cburnett 15:17, 6 May 2007 (UTC)
- Shouldn't be merged. There should be RAID about ideas and the other separate article RAID controller about specific (i.e. hardware) implementation of these ideas. --Kubanczyk 06:02, 25 June 2007 (UTC)
[edit] Slightly incorrect
The phrase "a multi-threaded operating system... can perform overlapped I/O, allowing multiple read or write requests to be initiated without waiting for completion on each request" is slightly incorrect. A system with asynchronous I/O can also allow that. (that is not theoretical. The following would go to far to make it to this page, but An example is the original Mac OS, which even in 1984, when it only ran one application at a time, theoretically could reorder asynchronous I/O requests.
[edit] "throughput" replaced with "I/O performance"
In the lead section i can read They offer, depending on the scheme, increased data reliability and/or throughput.. I think the term throughput is misleading, since it would suggest the popular belief that RAID schemes do only affect sequential performance and do not improve non-sequential performance (such as random 2KB reads or writes), while in fact RAID is able to improve both sequential and non-sequential performance. Therefore, i have changed 'throughput' into 'I/O performance'. Anyone disagrees? --FluffleS 15:10, 20 June 2007 (UTC)
[edit] Multithreaded???
In a multi-threaded operating system (such as Linux, FreeBSD, Mac OS X, Windows NT/2000/XP/Vista and Novell NetWare) the operating system can perform overlapped I/O, allowing multiple read or write requests to be initiated without waiting for completion on each request. This is the capability that makes RAID 0/1 possible in an operating system.
I don't believe it. There is nothing about RAID which says that it needs to happen on a multi-threaded operating system, not need a request be overlapped. These may be useful things for performance (they can also be nasty things for integrity) but they are not essential. Indeed, I thought it was often an option as to whether the application you wrote wished to use asynchronous or synchronous I/O. I've been bold - give a cite if you want to put it back in. Spenny 13:19, 4 September 2007 (UTC)
[edit] RAID 2
We need to have some discussion on RAID 2, either here or in the 'non-standard RAID' article. There is almost nothing said about RAID 2, as to what it is, or how it can be used. I ran across an article in the December 2002 IEEE Computer Magazine that states "Raid-2 requires the use of nonstandard disk drives and is therefore not commercially viable". But that is all that it says about it. We need more information on RAID 2, even if it not used, and why it is not being used. 147.240.236.8 17:56, 27 September 2007 (UTC)