Talk:RAID

From Wikipedia, the free encyclopedia

RAID was a good article nominee, but did not meet the good article criteria at the time. There are suggestions below for improving the article. Once these are addressed, the article can be renominated. Editors may also seek a reassessment of the decision if they believe there was a mistake.

Reviewed version: No date specified. To provide a date use: {{FailedGA|insert date in any format here}}


the meaning

Any disk failure destroys the array, which becomes more likely with more disks in the array. A single disk failure destroys the entire array because when data is written to a RAID 0 drive, the data is broken into fragments. ... When one sector on one of the disks fails, however, the corresponding sector on every other disk is rendered useless because part of the data is now corrupted.

Does a sector failure destroy just data from corresponding sectors, or from the entire array? It's not entirely clear at the moment. --Tom Edwards 15:02, 20 October 2007 (UTC)

Contents

[edit] RAID 1

"Provides fault tolerance from disk errors and single disk failure..." Correct me if I'm wrong, but if more than 1 disk fails in a RAID 1 of more than 2 disks, can't it still recover from the failure since they are all mirrored? Viper5030 (talk) 18:21, 5 December 2007 (UTC)Viper5030

You mean if disk 1 mirrors disk 2 and disk 3 mirrors disk 4. In that case yes, unless two in a pair fails. But it's arguable that they're two independent raid systems, and then no.- (User) WolfKeeper (Talk) 19:06, 5 December 2007 (UTC)
No, I think you misunderstand. It's possible to have a single RAID 1 array with more than 2 drives, where all the drives are mirrors of each other. What Viper is saying, and I agree with, is that to account for the case that there are more than 2 drives the sentence should read; "Provides fault tolerance from disk errors and failure of all but one of the drives". - Concentric (20th May 08 22:04 GMT) —Preceding unsigned comment added by 152.78.174.146 (talk) 21:05, 20 May 2008 (UTC)
Please source this. Bulbous (talk) 15:59, 21 May 2008 (UTC)
Speaking theoretically it is a case of a replication-based fault-tolerant system in which a majority quorum is used to determine "which value is correct?" RAID has absolutely zero guarantees on data integrity so it is possible for both drives in a 2-disk RAID 0 to have the exact same error and you will not see an error produced by the RAID system: the data matches. (That's assuming it checked for equality to being with, which I think is a false assumption).
So if you had a 5-disk RAID 0 you could assume that if 3 or more drives read the same value then that is the correct value. If you have 3+ drives online still then you could assume that they hold the right value if they all match. If you have a 4-disk system and 2 fail then you have no majority of disks to "vote" on the correct value.
Of course, this assumes that the RAID controller verifies correctness across drives on read. With RAID 0 you can always recover from any number of lost disks provided you have at least one drive standing (just re-mirror it!). Just hope that that one disk hasn't been corrupted in any way. Cburnett (talk) 20:50, 5 December 2007 (UTC)

[edit] RAID 2

Where is RAID 2 (parity via hamming codes)? —Preceding unsigned comment added by 69.202.78.173 (talk) 01:46, 24 October 2007 (UTC)

[edit] RAID 3

The comment in the description of RAID 3 doesn't make sense from a raw I/O perspective: with both RAID 3 and RAID 5 all disks are involved in the write -- there isn't any more load on the parity disk with RAID 3 than the other drives in either situation. However it would appear there is a performance advantage on reads with RAID 5 vs. RAID 3. Does anyone know if this is stated incorrectly? Are there any statistics that could show that it is actually superior read speeds with RAID 5 rather than worse writes speeds with RAID 3? 198.160.96.7 (talk) 17:59, 18 April 2008 (UTC)

[edit] RAID 4 & 5 minimum disk amount wrong

While useless, you can run RAID 4 & 5 on two disks. This is, in effect, RAID 1. -- RichiH 16:43, 30 October 2007 (UTC)

...and I think for that reason it is not appropriate to describe 4 & 5 as being possible on 2 disks. Put another way, it is not helpful to describe these logical niceties, it simply confuses. Spenny 17:14, 30 October 2007 (UTC)

[edit] RAID 2

RAID 2 should be included in this page Although RAID 2 doesn't really have any commercial implementation, I think the article should give the reader a good overview of the different RAID implementations, adding RAID 2 will make that overview better.

As said below: RAID 2 uses Hamming codes (wiki article available) to do error checking. Other than that, RAID 2 is very similar to RAID 3. The reason why RAID 2 hasn't been a commercial success is because of the fact that using Hamming codes as error checking mechanism requires a great deal of disk space (also the raid controller has to perform complicated calculations to do the error checking).

I'm certainly not an expert on this issue, so somebody with more expertise should verify this and maybe add some good structured text about this to the article. —Preceding unsigned comment added by 143.129.41.95 (talk) 14:47, 5 November 2007 (UTC)

[edit] [RAID0] Number of disks

...RAID 0: Striped set (minimum 2 disks) without parity...

Is possible use 3 disks to make a RAID 0? Or is necessary couple numbers (as 2, 4, etc)?

18:39, 4 December 2007 (UTC)Renato S. Yamane

Any number would work. For 3 disks: blocks 0,3,6,9 would go on disk 1; 1,4,7,10 on disk 2; and 2,5,8,11 on disk 3. Cburnett 19:08, 4 December 2007 (UTC)


[edit] [RAID0+1, 1+0] Number of disks

The article says that minimum 4 disks are required, however it is possible to use in a two-disk scenario. This is used for example in HP Proliant DL320 G4, which can only house two disks. —Preceding unsigned comment added by 213.112.31.36 (talk) 13:14, 8 January 2008 (UTC)

[edit] RAID n+1 Alternate Meaning

The '+1' is used in this article to suggest a nested raid system using 'raid 1'. However this term is also used to denote a hot-swap spare disk, probably incorrectly, in some publications. For instance, a RAID 5+1 is taken to mean a RAID 5 with a hot-swap disk which can be brought into the array in the event of a disk failure.

This may not be correct, but it may avoid confusion to readers if this was pointed out, and possibly lead to the correction of this misnomer. —Preceding unsigned comment added by 195.245.100.11 (talk) 13:46, 4 March 2008 (UTC)

The first mention of anything "X+Y" is in RAID#Nested levels where it says:
Nested RAIDs are usually signified by joining the numbers indicating the RAID levels into a single number, sometimes with a '+' in between. For example, RAID 10 (or RAID 1+0) conceptually consists of multiple level 1 arrays stored on physical drives with a level 0 array on top, striped over the level 1 arrays.
Which to me spells it out well enough. Not sure where this confusion could/should be addressed. Cburnett (talk) 01:08, 5 March 2008 (UTC)

[edit] Fake RAID: hardware or software?

I think this edit in incorrect, because the paragraph starting with Since these controllers use proprietary disk layouts... refers to the one that precedes it, Because these controllers often try to give the impression of being hardware RAID controllers.... I think the real error is that those two paragraphs should go at the end of the Hardware-based section, not the Software-based section. --Fstanchina (talk) 16:41, 27 January 2008 (UTC)

[edit] Issues with RAID

Concerning Atomicity: Are those issues also resolved with simple data journaling, or is that not enough? Esspecially are they resolved by Reiser and by ext3 with full journaling?

Concerning Unrecoverable data: Most drives remap bad sectors on write, don't they? So, if a sector is unreadable on one disk, but there is enough redundancy, the controller could simply _write_ the reconstructed sector to the disk, and it's the same as if it would use an own remapping table, wouldn't it?

--JensMueller (talk) 01:01, 10 February 2008 (UTC)

[edit] Beyond RAID6

RAID5 has single parity, RAID6 has dual parity.

I'm sure there are error correcting codes that can go beyond that.

What issues would occur with e.g. a (hypothetical) 6+3 RAID 6+? Are there experimental implementations of such stuff? --JensMueller (talk) 01:08, 10 February 2008 (UTC)

Well, both of those use parity. RAID 2 uses a Hamming code which I would consider "beyond" simple parity. What exactly do you mean by "beyond"? Cburnett (talk) 03:40, 10 February 2008 (UTC)
triple parity, i.e. that three devices can fail without using data. --JensMueller (talk) 09:55, 10 February 2008 (UTC)
Oh, I just saw that this is addressed in Standard RAID levels ... --85.180.64.175 (talk) 00:05, 23 March 2008 (UTC)

[edit] Semi-protected

I have semi-protected the page because of CONSTANT change from "Inexpensive" to "Independent". The original article by Patterson, Gibson, & Katz is titled "A Case for Redundant Arrays of Inexpensive Disks (RAID)". They coined the term and that's what it should remain unless someone comes up with a really compelling reason to override those who coined it.

If anyone has an idea how to avoid this constant change so semi-protection can be removed then I'm all ears! Cburnett (talk) 21:23, 22 February 2008 (UTC)

The problem is that we need a reliable source to indicate its current meaning. Just because it meant something in the 1980s, does not mean that it still means the same thing.
However, if no such source shows up, why not keep "inexpensive" and make a note (later, under "Meaning of Acronym", for example) to the effect that ""Inexpensive" is sometimes replaced with "Independent", but the former term is the one that was used when the term "RAID" was first coined by at Berkeley"--Ernstk (talk) 16:51, 1 March 2008 (UTC)
Second sentence: RAID is also sometimes referred to as "Redundant Arrays of Inexpensive Drives" or "Redundant Arrays of Independent Disks/Drives". That doesn't stop drive-by changing of Inexpensive to Independent. Cburnett (talk) 17:02, 1 March 2008 (UTC)
Can't the introductory sentence simply be replaced with the following?
"RAID (Redundant Array of Independent Disks or Redundant Array of Inexpensive Disks) is......."
This includes both terms immediately so any argument over the correct one is quashed. The terms also appear in alphabetical order, if anyone should argue over which appears first (!) Both terms were used recently in the final of the respected British quiz show University Challenge - if they couldn't decide on a correct answer, no-one can! --- Soulhunter123 (talk) 21:24, 2 March 2008 (UTC)
That is one solution, yes. Though I could pose the argument that Inexpensive should appear first since it was the first used term (chronological over alphabetical since alphabetical is wholly arbitrary and dumb-luck in ordering).
The whole thing is an interesting deal. The authors used "Redundant Array of Inexpensive Disks" and first introduce the acronym RAID in section 6 as "our acronym...is RAID." Since then the acronym has been accepted over Redundant Array of Inexpensive Disks and then changed to mean something different (probably out of confusion). That considered, I reject that Inexpensive and Independent are equal in terms of how you propose they should be presented.
I'd propose not defining RAID and leaving that to the first section on History but I guarantee...GUARANTEE...someone will add it to the introductory sentence. The problem is people changing without bothering to read. The rest of the introductory sentence starts with "as named by the inventors" which Independent is not what they named it. So rationally explaining terms won't work either because people aren't thinking before they change it. Cburnett (talk) 22:01, 2 March 2008 (UTC)
The disambiguation page for "Raid" reads "Redundant Array of Independent/Inexpensive Disks", and has done for over a year. Can't this just go in the opening sentence, and let's be done with it? The second sentence can simply read "Originally dubbed Inexpensive Disks by the creators, the different naming convention (Independent) has since arisen within the industry." And finally, I'm totally impartial to the subject of RAID, but I think chronological order is silly for a neutral, encyclopedic point of view. Alphabetic is the way to go. --- Soulhunter123 (talk) 01:02, 3 March 2008 (UTC)
Please don't bastardize the meaning of "neutral" as it means in WP:NPOV. Cburnett (talk) 01:09, 5 March 2008 (UTC)

I have edited the introduction to encompass both names, entirely seperately, in chronological order (due to the added descriptions in parenthesis). Hopefully this will put an end to the whole issue. Personally, I still think the article is messy, particularly the introduction which drones on for about three paragraphs longer than it could be. If you could unprotect the page now, that would be great. --- Soulhunter123 (talk) 17:17, 7 March 2008 (UTC)

Agreed--KelvinHOWiknerd(talk) 07:40, 29 March 2008 (UTC)
Do we have a cite for "Independant"? Bulbous (talk) 14:36, 29 March 2008 (UTC)

[edit] rewrite so that dumb people can understand it?

this article is a little bit hard for non-technical people to understand. 132.161.187.38 (talk) 13:14, 13 March 2008 (UTC)

I quite agree, and I have re-written/re-structured the introduction and first section to better explain RAID in a more simplistic, readable manner. Hope this helps you and others! --- Soulhunter123 (talk) 04:27, 14 March 2008 (UTC)

[edit] Dumb people rewrite - Purpose and basics

I agree that the article lacked a simple explanation (not for dumb people, but for people who didn't know it all beforehand), and added a new "Purpose and basics" section. It's been extensively edited. I didn't entirely agree with the edit, and was trying to improve the new version, but came to the conclusion that it was basically less clear and simple than my original; I'd like others, particularly the non-technical, to consider the versions and decide what text to use. For now I've reinstated my version, eextensively revised, for consideration, and would suggest that neither Soulhunter123 or I touch the section for a while.

Basically, this section should be simple but never simplistic, and should convey:

  • RAID looks to you just like one disk.
  • Different RAIDs give you a combination of faster performance and data safety against disk failure.
  • RAID 0 uses at least 2 disks and is faster but unsafe (I originally said 2 disks, unduly restrictive)
  • RAID 1 uses 2 disks, protects data, and loses 50% of capacity
  • RAID 5 uses any number of disks, protects data, and gives more capacity that RAID 1.
  • It is possible to protect data against more than one disk failing; read the article for details.

In fact, maybe the whole section should be replaced by just the above? It's actually undesirable to give too much detail and consider all possibilities (mirroring more than 2 disks); after all, it's followed by a whopping great discussion of just about everything. This section must be simple, but I do think it should avoid being simplistic.

Details I wasn't happy about:

'extra "summary" data' seems confusing

"is written alongside the main data on a disk" it's distributed over the array, not on the same disk as the data in question - this is confusing

"data on it is reconstructed from the summary data on the other disks" it's reconstructed from the remaining good data, corrected with the redundant data. I think I made the same implication

"[redundant RAIDs] requiring, on average, roughly half the size of the main data" no

"For increased performance, there are various combinations of configurations" I think for the user unfamiliar with RAID, RAID 0 is enough. The full article details the others.

"lost or corrupted" for all practical purposes, lost. Data which is corrupted but not lost makes you think of perhaps a document with a bad paragraph

"Another approach, 'RAID 1', stores the same data on each disk in the array so that the failure of one disk causes no loss. This configuration allows the user to consume only half the total capacity of the array's disks." In practice, it's very rare to mirror more than 2 discs. And if we do, we can lose all disks except one, and we lose the capacity of all disks except one.

I'm sure what I wrote can be dissected in this way, and improved in the process.

Best wishes, Pol098 (talk) 17:07, 14 March 2008 (UTC) /*History*/ I've removed Sixth and Tenth from the section about the original paper because they weren't discussed in the original paper. They are discussed elsewhere in this article. There are still a few internal inconsistencies in this article. Richard Manion 13:50, 22 April 2008 (UTC)


[edit] Reliability terms unclear

Failure rate "The mean time to failure (MTTF) or the mean time between failure (MTBF) of a given RAID is the same as those of its constituent hard drives, regardless of what type of RAID is employed." This does not clarify whether this is the "sticker" MTTF on the drives, or a number calculated from the average or lowest of these drives. Failure rate is not a synonym for MTBF, though they are related. Possibly both should be defined and linked to the appropriate articles.

[edit] RAID Principles

A mention somewhere of information theory is possibly appropriate, given that RAID is a communications system where you're essentially sending date to yourself, and the channel is the RAID array. I'm pretty sure RAID codes are close to other communications codes, also.