Entity framework with lots of rows
I am working on a medical software and my goal is to store lots of custom actions to database. Since it is very important to keep track who has done what, an action is generated every time user does something meaningful (e.g. writes comment, adds some medical information etc.). Now the problem is that over time there will be lots of actions, let’s say 10000 per patient, and there might be 50000 patients, resulting in total of 500 million actions (or even more).
Currently database model looks something like this:
[Patient] 1 -- 1 [ActionBlob]
So every patient simply has one big blob which contains all actions as big serialized byte array. Of course this won’t work when table grows big because I have to transfer the whole byte array all the time back and forth between database and client.
My next idea was to have list of individually serialized actions (not as a big chunk), i.e
[Patient] 1 -- * [Action]
but I started to wonder if this is a good approach or not. Now when I add new action I don’t have to serialize all other actions and transfer them to database but simply serialize one action and add it to Actions table. But how about loading data, will it be superslow since there may be 500 million rows in one table?
So basically the question are:
- Can sql server handle loading of 10000 row from table with 500 million rows? (These numbers may be even larger)
- Can entity framework handle materialization of 10000 entities without being very slow?
2 Solutions collect form web for “Entity framework with lots of rows”
Your second idea is correct, having smaller million items is not problem for SQL database, also if you index some useful columns in action table, it will result in faster performance.
Storing actions as blob is very bad idea, as everytime you will have to convert from blobs to individual records to search and it will not offer any benefits of search etc.
Properly indexed billion records are not at all problem for SQL server.
And in no user interface, we ever will see million records at once, we will always page records like 1 to 99, 100 to 199 and so on.
We have tables with nearly 10 million rows, but everything is smooth, because frequently searched columns are indexed, foreign keys are indexed.
Short answer for questions 1 and 2: yes.
But, if you’re making these “materialization” in one move, you’d rather use SqlBulkCopy.
I’d recommend you take a look at the following:
- How to do a Bulk Insert — Linq to Entities
About your model, you definitely shouldn’t use a blob to store Actions. Have an Action table that has the Patient foreign key, and be sure to have a timestamp column at this table.
This way, whenever you have to load the Actions for a given Patient, you can use time as a filtering criteria (for example, load the Actions for the last 2 month).
As you’re likely going to fetch Actions for a given Patient, make sure to set the Patient FK as an index.
Hope this helps.