Talk:Entity-Attribute-Value model

From Wikipedia, the free encyclopedia

[edit] How about the Drawbacks?

Maps very poorly to the relational model,

The primary objective of the schema is to map "real life" concepts (i.e. heterogenous sparsely populated attributes; what you'd find in a patient's chart) instead of coercing "real life" into a relational model. You are still able to leverage relational concepts because of the methods you need to use to pivot, join, as well as perform localized normalization. Cowbert (talk) 01:54, 2 May 2008 (UTC)

Does not normalize well, Incurs overhead because it often requires data be implicitly or explicitly converted to/from character format, Many implementations waste space by providing several columns of different data types to allow for the different data types a value might be, You have to manually re-create a lot of the functionality built-in to a relational database, such as check constraints and foreign key constraints.

That is why in production you spin off EAV tables based on data types. EAV/CR directly addresses this issue: either the class table or the attribute table (if you are not using classes) stores metadata about which EAV table has all of those attribute's values. So you'd have an EAV_int, EAV_real, EAV_char, EAV_date etc. tables. When you enumerate the attribute for the entity, your business logic queries this metadata to determine which eav table to query for the value (example, patient_hemoglobin has a metadata column of 'eav_table' pointing to eav_real). Also, the constraints are now primarily metadata driven, instead of defined in the schema itself - again the primary goal is avoid having to coerce a certain representational system into a strict relational schema, where as the arguments that seem to be made are that "we must stick with relational schema" even though the whole point of the exercise is that a column-moduled schema poorly supports large and/or dynamic numbers of sparsely populated attributes (I can do a SELECT attribid, value FROM eavtable where entityid = foo vs. tossing all the null fields if I have a table where the attributes are column modeled [or worse, multiple tables of tons of sparse columns each that I'm going to have to index]). Finally, it has been admitted that the main difficulty more or less lies with attribute-centric queries (i.e. I *do* want to know every patient who is running a fever today, which brings up the necessity to pivot the attribute over into a columnar model or do some self-joins)) Cowbert (talk) 01:54, 2 May 2008 (UTC)

The fact that vendors impose "limitations" and design databases poorly is not an advantage. By the way, the example given in the article is very EASILY converted to a properly normalized relational model. Perhaps someone should come up with a better example that is NOT easily converted to a normalized relational model? As an example:

Patient Table:
[Primary Key is (Patient_ID)]
Patient_ID
Patient_Name


Patient_Test Table:
[Primary Key is (Patient_ID, Test_ID)]
Patient_ID
Test_ID
Date
Test_Type
Quantity
Unit_of_Measure

Test_Type would be related to another table containing the various tests that could be performed, and Unit_of_Measure could relate to another table with UOM identifiers and possibly a conversion factor for a common UOM.

Patient
Patient_ID Patient_Name
1 Patient XYZ
Patient_Test
Patient_ID Test_ID Date Test_Type Quantity Unit_of_Measure
1 1 1/5/1998 Hemoglobin 12.5 gm/dl
1 2 1/5/1998 Potassium 4.9 Meq/L


This is how the normalized eav schema of the above would look like:

table test_events {
eventid integer primary key,
patientid integer foreign key references patient_table(patientid),
timestamp date
}

test_metadata {
testid int pkey,
test_type varchar(),
test_datatype int fkey references eav_types,
test_unit_of_measure varchar()
}

eav_types {
datatype int pkey,
eav_table_name varchar()
}

patient_table {
patientid int pkey,
first_name varchar,
last_name varchar,
dob date
....
}

eav_real {
eventid int fkey references test_events(eventid),
testid int fkey references test_metatdata(testid),
value real
}

ok. So now for the data/meta:

patient_table VALUES { patientid = 1, first_name =... }
patient_event { eventid = 1, patientid = 1, date = 1/5/1998 } patient_event { eventid = 2, patientid = 1, date = 1/5/1998 }

test_metadata { testid = 1, test_type = 'hemoglobin', test_datatype = 1, test_unit_of_measure = 'g/dL'} test_metadata { testid = 2, test_type = 'potassium', test_datatype = 1, test_unit_of_measure = 'meq/L'}

eav_types { datatype = 1, eav_table_name = 'eav_real'}

eav_real { eventid = 1, testid = 1, value = 12.5 } eav_real { eventid = 2, testid = 2, value = 4.9 }

Cowbert (talk) 01:54, 2 May 2008 (UTC)

[edit] Downsides

Seem like good points, but should be written in 3rd person, and should be sourced! —Dfass 05:48, 24 December 2006 (UTC)

I agree. It is self-referential (referring to a previous version of the same article) in a way that is inappropriate. It also seems that many of the points are already discussed in other sections, and the information in this section could be integrated there. Nimrand 21:04, 1 June 2007 (UTC)


The drawback number 2 (scalability for large datasets) can be greatly improved by using 2 queries for retrieving the data.

  • 1st is for filtering and sorting the records, here you would self join ONLY attributes that are used in filters and order by. Ordered result of this query creates empty array with placeholders for entities.
  • 2nd is regular SELECT (or UNION when attributes are splited to few tables by type) that is filtered by entity id, which were received from 1st query. Result from this query is being fetched into array created by 1st query, which makes the result correctly sorted.

[edit] Accuracy of this article

The article makes a lot of unsubstantiated claims (which may or may not be true), many of which seem to stem from the desire to constantly compare EAV to traditional/flat/SQL databases. We can leave them tagged as {{fact}} for a while, but there's a lot of it and I was hoping someone would provide sources or just remove it.

Perhaps we could split the article into an article about EAV, and a separate article comparing it to other models. In other words, we might consider separating the factual and useful information about EAV from the subjective and oft-disputed comparisons to other models.