Checking a data conforms to a list of rules
I’m interested in understanding if there is a common technique/pattern for checking data in a database conforms to a set of rules. I’m interested, perhaps, in running a service/job each night that churns through the data and raises exceptions to rules. I would quite like the system to be dynamic, i.e. it’s quite easy to define new rules.
A very simple example could be:
Customer Cash% A 50 B 25 C 30
Rule: Tell me all customers that have
< 50% cash.
I’m not really coupled to a particular tool or idea, however our expertise are C#, XML and Sql Server. All our data is in SQL SERVER Databases.
If there isn’t anything I will go back to looking at a custom rule defining system, currently my idea is based around rules defined in XML (for the dynamic aspect) and a Windows Service process data against the rules.
EDIT: I should say that this could get complex, whereby I would like to check a bunch of tables and their data against a “Model” to see whether a particular object is following rules. For example there might be a “Model” customer and you want to check how many customers don’t follow the model rules, e.g. Cash, Last Payment, Nationality etc.
EDIT: Following on from a couple of comments, I fully understand that “invalid” data should be prevented from been entered in the first place. My scenario deals with the fact that the data may be perfectly valid, however it might not follow “house rules”, things that don’t follow the rules might then need to be justified.
5 Solutions collect form web for “Checking a data conforms to a list of rules”
As it isn’t mentioned yet, I’d suggest you’d take a look at the data mining capabilities of SQLServer. One of them is the abbility to highlight exceptions in your data.
Take a look at this Highlight Exceptions Video Tutorial to get you started.
Create a script of SQL statements, with one statement being equal to a single rule. In your example, your statement might be:
INSERT INTO EXCEPTION (RULE_NAME, DETAIL) VALUES ("CASH_LEVEL_LOW", SELECT CUSTOMER_ID FROM CUSTOMER WHERE CUSTOMER.CASH < 50);
I’m not up-to-date on the syntax, but you should be able to get the gist of the idea from here. It would insert into another table one record per violation, with sufficient data so that you could locate the record easily.
Your best bet might be to create a maintenance plan in SQL Server, with one step in the plan for each rule. Each rule would check the data and insert into an exception table if it found any nonconforming rules. This would allow for you to leverage the tools provided by SQL Server and maintain fairly easy maintenance of the rules themselves (adding, removing, and reordering).
I would quite like the system to be dynamic, i.e. it’s quite easy to define new rules.
You already have such a system; it is your database. In particular, check constraints serve to prevent invalid data from being entered at all.
For a case like your example — where you want to allow the value but flag it — write a view, and have a client application issue an error if the view has any rows.
Here’s an example:
create view low_on_cash as select * from table where “Cash%” < 50 ;
In the client, you’d raise an error if
“select count(*) from low_on_cash” didn’t return 0;
If you established a convention that all such views were named with a prefix, e.g., “error_report”, your client could select all such view names from the systables for the database in one query, then iterate that list by calling
"select count(*) from " + viewname; logging an error for any that returned more than zero rows.
Since this would be data-driven, adding a new error report would consist of nothing more than creating a view with the proper name prefix; you’d not have to recompile the client.
The additional advantage is that adding any rule engine would require learning its Domain Specific Language for writing rules, training new staff on it, and even then inevitably there would be corner cases the rules didn’t easily cover. Your coders already know SQL, and it’s based on a 17 year old ANSI Standard based on 20 earler years of use, so most of the corner cases have been ironed out of the language.
Personally I don’t allow data to be entered in my database that doesn’t follow the business rules. That is what check constraints, unique indexes, and triggers are for.