Talk:Column-oriented DBMS

From Wikipedia, the free encyclopedia

Aren't this database the same as Pick-style database. Multivalue databases?


Contents

[edit] rewrite 4/23

I agree with the discussion below and found the content misleading and straying off topic as well. I have submitted a rewrite which is substantially similar in coverage but which I believe characterizes the column-oriented DBMS better. In hindsight, I feel that the page still reads too much as a 'comparison of column-oriented and row-oriented DBMS' rather than a standalone description of 'column oriented dbms' as it is titled, as I adopted this by mirrioring some of the previous content. I'm curious to know if others feel this is beneficial, or if this content should be more properly moved to a 'comparison of column-oriented and row-oriented dbms' page, with this page more deliberately dedicated to column-oriented. Jeskeca (talk) 01:43, 24 April 2008 (UTC)


[edit] comments from before 4/23 rewrite below

[edit] Cost of querying

I read the article:

For [read mostly] workloads, column stores are advantageous because:

  1. Queries tend to require only a few columns out of many contained in a table. Whereas a row store must read the entire table, a column store can confine its reads to the columns required.
  2. Column data, being of uniform type, is much easier to compress than row data. Thus, the actual disk reads required to read the columns required for a query may be far less than the size of the raw data in these columns.

I think this needs to be reformulated, it doesn't seem to be coming across right. Is this an advantage when doing queries to extract a subset of the data? Is there an assumption that the table is not indexed? An index can be used to find a relevant subset of data quickly, and then there's no need to read the entire table. Given such a case, the first point seems to be moot. Perhaps the article should be more specific about the cases where vertical fragmentation is a benefit.

I can't see that the conclusions drawn in the second point follow either. Clarification and/or sources would be beneficial.

WikiKetil 23:16, 6 September 2007 (UTC)

[edit] Re-write

This article needs to be rewritten with the content explicity describing row-column databases, not as an article that describes traditional row based RDMBS and then contrasting it with column based dbs in a couple of sentences. Its round the way, and doesn't contain enogh detail either of column based databases. scope_creep 16:11, 7 September 2007 (UTC)

[edit] Article Content Needs Full Rewrite

This "definition" is ridiculous. By only citing the most rudimentary basics between the two types of databases it serves no one any good whatsoever. I am now going to spend hours searching and reading the web to find out what the real differences are and if given the time I will edit this article. Prof Kayyos 16:54, 7 September 2007 (UTC)

It's not the most rudimentary basics, it's the fundamental architecture of the data store. It's not clear to me what you mean by "the real differences"... storing the data column by column, instead of row by row, is the real difference, with the advantages and disadvantages described.
Is your background in computer science, have you reviewed the literature on database CS at all? Georgewilliamherbert 00:32, 8 September 2007 (UTC)

[edit] Blog reference

I inserted two references earlier today, one of which was to a blog by Dave Kellogg describing column oriented RDBMS systems. He does a lot of RDBMS work and counts as an expert in the field, in addition to being head of a company in a related business space (XML servers, not DBs per se). I believe that the column serves as a good non-technical reference / info source. We can include hard references to all the academic papers and product marketing info sheets we want, but external general overview info is also good source/reference material.

There's a general desire to minimize blog usage in Wikipedia references - but the objective there is to minimize unreliable sources. For sources selected precisely because they are "general" and not focused-technical, that is somewhat a moot point.

Feel free to discuss here if there are further objections... Thanks. Georgewilliamherbert (talk) 22:25, 22 January 2008 (UTC)

The pointed-to article says "I’d recently heard that Michael Stonebraker had founded Vertica, a column-oriented DBMS company.... So I decided to try and figure out what column-oriented DBMS is and why you might want one." That is, he is new to column-oriented DBMSs. And it concludes "For more information on column-oriented DBMSs, check out the Wikipedia entry here." That is, this article is already more complete than his article. So why are we pointing to it? --Macrakis (talk) 18:10, 23 January 2008 (UTC)
He's a DBMS expert in general, but acknowledges not in Column-oriented RDBMS. He's presenting an overview of them to readers who are less familiar than he is. He's an independent source in the DBMS user community commenting on what they are, though not as an expert on them specifically. Georgewilliamherbert (talk) 00:11, 24 January 2008 (UTC)
OK, thanks for confirming my understanding. I will therefore remove the reference. --Macrakis (talk) 05:28, 24 January 2008 (UTC)

An anon added the following comment:

This simply isn't true that schema changes in a row-oriented database have to be expensive. It's really just a shortcoming of MySQL's implementation. In PostgreSQL, for example, adding or removing a column only modify the metadata for a table. If you add a column, new rows with a value for that column will have space for it; old rows that didn't have a value will be null.

This may all be true, but it is written as a comment, not part of the article. I've rewritten it. --Macrakis (talk) 16:01, 3 February 2008 (UTC)