Database – How to manage high quantity of data over time with different refresh rates ?
I need to design a Sql server database capable of logging different data on different time scale .
Basicaly what i have to log is data from battery cells from lots of battery at anytime.
here is a basic model of the database.
datatypes on the images are not the one that i use.
I use tinyInt for most of them (2 bytes)
- Time is 3 bytes
- Date is 2 bytes
So imagine :
1 cell report file is emitted every 24 hours.
Each attribute of a cell don’t refresh at the same frequence.
For instance :
- Time attribute refresh every second
- Amps attribute refresh every second
- Temp1 attribute refresh every minute
- Date refresh every day
and that cell is reporting 24/7 over years.
if there is 1000 battery around the world linked to the database, and each battery have let’s say 20 cells.
20 000 cells reporting 24/7
So here is the problem :
If only one attribute change i don’t want the whole line to be re stored.
if so for 20 000 cells i need 1To a year. (and that is with Null stored instead of non refreshed values).
I hope the explaination are clear enough, don’t hesitate to ask for further information
As usual I apologize for my english :/
2 Solutions collect form web for “Database – How to manage high quantity of data over time with different refresh rates ?”
You need 20k inserts per second which is certainly doable but it shows how much data this is. It will be no problem at all to bulk-insert at that rate, but you also have to keep the data around for some time. That’s going to be a lot of rows and TBs.
I’d consider storing this in a custom format: You could store one binary blob per battery and per hour. In that blob you are free to encode the changes any way you want. You can store non-changing columns very efficiently by not storing them at all. You can also compress the blob before storing.
This scheme still gives you indexing on battery and time. Just the time resolution decreased to one hour. Your app has to decode the blobs and extract the needed data from them.
The data will be easily compressible because it is so redundant. For compression there are very fast schemes like LZO, LZ4 and QuickLZ. You can get more compression with things like 7z.
Possibly a simple table with
RowID BattID Dt Key Value 1 1 07/02/2013 10:00:01 Amps 1.852 2 1 07/02/2013 10:00:02 Amps 1.809 3 1 07/02/2013 10:00:03 Amps 1.758 4 1 07/02/2013 10:00:03 Tmp1 18.4 5 1 07/02/2013 10:00:04 Amps 1.725
The data type of the key could be tinyint to save space.
Once you have this data, as long as the indexes are correct you could use a PIVOT query to re-construct the data how you want.