Best practice to design big entities
What would be the best practice to design a big entity like an Employee table that contains over 100 attributes?
Should I keep them as a single table with 100 columns or should I split them and to 1..1 relations and then compose the Employee object in my code?
Any opinions? Pros and cons of each method?
2 Solutions collect form web for “Best practice to design big entities”
The answer here lies not in the Employees table, but in the wider database design. If all attributes are definitely 1-1 then I would definitely have one entity. SQL Server has optimizations you can employ when you get to the physical design, such as SPARSE columns for columns that have many NULL values.
I assume you are going through the process of normalization and Entity Relationship Diagrams at the moment. If you are, then I would suggest looking at a SuperType/SubType approach, for which Employees is normally a great candidate.
In this approach (as example) you may have a “Contacts” table, which would contain First Name, Last Name, Phone Number, etc. This would then link to your Employees table, your Customers table, your Vendors table, etc. Your employees table would then just contain the attributes that are unique to Employees, such as Staff Number, Start Date, etc.
There are several benefits to this.
- Firstly, if an employee is also a customer, for example, then you reduce data redundancy.
- You are likely to achieve a better compression ratio. This is because when you have fewer columns, there will be more rows stored on a page, meaning the name “Smith” will appear on the same page more often.
- From a Master Data perspective, if a company standard for data type of an email column is introduced, then you can change it in one place, not three places. (Forgive the slightly contrived example here but hopefully it illustrates the point).
- As both the Super Table and the Sub Table have fewer columns, each can be read faster in isolation. If joined in the same query, and placed on separate Discs then the 2 tables can be read in parallel.
Try and read up on Star-Schema and Snowflake-Schema. These are terms used when it comes to designing data warehousing schemas, and the pros and cons probably are very similar to what you are facing here.