schema for storing different varchar fields over time?
This app I’m working on needs to store some meta data fields about an entity. The problem is that we can already foresee that these fields are going to change a lot in the future. Right now every entity’s property is translated to one column in the entity table, but altering table columns later down the road will be costly and error-prone right?
Should I go for something like this (key-value store) instead?
MetaDataField ----- metaDataFieldID (PK), name FieldValue ---------- EntityID (PK, FK), metaDataFieldID (PK, FK), value [varchar(255)]
p.s. I also thought of using XML on SQL Server 05+. After talking to some ppl, seems like it is not a viable solution ’cause it will be too slow for doing certain query for reporting purposes.
5 Solutions collect form web for “schema for storing different varchar fields over time?”
You’re right, you don’t want to go changing your data schema any time a new parameter comes up!
I’ve seen two ways of doing something like this. One, just have a “meta” text field, and format the value to define both the parameter and the value. Joomla! does this, for example, to track custom article properties. It looks like this:
ProductTable id name meta -------------------------------------------------------------------------- 1 prod-a title:'a product title',desc:'a short description' 2 prod-b title:'second product',desc:'n/a' 3 prod-c title:'3rd product',desc:'please choose sm med or large'
Another way of handling this is to use additional tables, like this:
ProductTable product_id name ----------------------- 1 prod-a 2 prod-b 3 prod-c MetaParametersTable meta_id name -------------------- 1 title 2 desc ProductMetaMapping product_id meta_id value ------------------------------------- 1 1 a product title 1 2 a short description 2 1 second product 2 2 n/a 3 1 3rd product 3 2 please choose sm med or large
In this case, a query will need to join the tables, but you can optimize the tables better, can query for independent meta without returning all parameters, etc.
Choosing between them will depend on complexity, whether data rows ever need to have differing meta, and how the data will be consumed.
The Key Value table is a good idea and it works much faster than the SQL Server 2005 XML indexes. I started the same type of solution with XML in a project and had to change it to a indexed Key Value table to gain performance. I think SQL Server 2008 XML Indexes are faster, but have not tried them yet.
The XML speed only factors in depending on the size of the data going into the xml column. We had a project that stuffed data into and processed data from an xml column. It was very fast.. until you hit around 64kb. 63KB and less took milliseconds to get the data out or insert into. 64KB and the operations jumped to a full minute. Go figure.
Other than that the main issue we had was complexity. Working with xml data in sql server is not for the faint of heart.
Regardless, your best bet is to have a table of name / value pairs tied to the entity in question. Then it’s easy to support having entities with either different properties or dynamically adding / removing properties. This too has it’s caveats. For example, if you have more than say 10 properties, then it will be much faster to do pivots in code.
There is also a pattern for this to consider — called the observation pattern.
See similar questions/answers: one, two, three.
The pattern is described in Martin Fowler’s book Analysis Patterns, essentially it is an OO pattern, but can be done in DB schema too.
“altering table columns later down the road will be costly and error-prone right?”
A “table column”, as you name it, has exactly two properties : its name and its data type. Therefore, “altering a table column” can refer only to two things : altering the name or altering the data type.
Wanting to alter the name is indeed a costly and error-prone operation, but fortunately there should never be a genuine business need for it. If a certain established column seems somewhat inappropriate, with afterthought, and “it might have been given a better name”, then it is still not the case that the business incurs losses from that fact! Just stick with the old name, even if with afterthought, it was poorly chosen.
Wanting to alter the data type is indeed a costly operation, susceptible to breaking business operations that were running smoothly, but fortunately it is quite rare that a user comes round to tell you that “hey, I know I told you this attribute had to be a Date, but guess what, I was wrong, it has to be a Float.”. And other changes of the same nature, but more likely to occur (e.g. from shortint to integer or so), can be avoided by being cautious when defining the database.
Other types of database changes (e.g. adding a new column) are usually not that dangerous and/or disruptive.
So don’t let yourself be scared by those vague sloganesque phrases such as “changing a database is expensive and dangerous”. They usually come from ignorants who know too little about database management to be involved in that particular field of our profession anyway.
Maintaining queries, constraints and constraint enforcement on an EAV database is very likely to turn out to be thousands of times more expensive than “regular” database structure changes.