Suppose you have a table RULES with 3 columns A, B, and C. As data enters the system, I want to know if any row of the RULES table matches my data with the condition that if the corresponding column in the RULES table is null, all data matches. The obvious SQL is:
SELECT * FROM RULES WHERE (A = :a OR A IS NULL) AND (B = :b OR B IS NULL) AND (C = :c OR C IS NULL)
So if I have rules:
RULE A B C 1 50 NULL NULL 2 51 xyz NULL 3 51 NULL 123 4 NULL xyz 456
An input of (50, xyz, 456) will match rules 1 and 4.
Question: Is there a better way to do this? With only 3 fields this is no problem. But the actual table will have 15 columns and I worry about how well that SQL scales.
Speculation: An alternative SQL statement I came up with involved adding an extra column to the table with a count of how many fields are not null. (So in the example, this columns value for rules 1-4 is 1, 2, 2 and 2 respectively.) With this "col_count" column, the select could be:
SELECT * FROM RULES WHERE (CASE WHEN A = :a THEN 1 ELSE 0 END) (CASE WHEN B = :b THEN 1 ELSE 0 END) (CASE WHEN C = :c THEN 1 ELSE 0 END) = COL_COUNT
Unfortunately, I don't have enough sample data to find our which of these approaches would perform better. Before I start creating random rules, I thought I'd ask here whether there was a better approach.
Note: Data mining techniques and column constraints are not feasible here. The data must be checked as it enters the system and so it can be flagged pass/fail immediately. And, the users control the addition or removal of rules so I can't convert the rules into column constraints or other data definition statements.
One last thing, in the end I need a list of all the rules that the data fails to pass. The solution cannot abort at the first failure.
The first query you provided is perfect. I really doubt that adding the column you were speaking of would give you any more speed, since the NOT NULL property of every entry is checked anyway, since every comparison to NULL yields false. So I would guess that
x=y is expanded to
x IS NOT NULL AND x=y internally. Maybe someone else can clarify that.
您提供的第一個查詢是完美的。我真的懷疑添加你所說的列會給你更多的速度,因為無論如何都要檢查每個條目的NOT NULL屬性,因為每次與NULL的比較都會產生錯誤。所以我猜想x = y在內部擴展為x IS NOT NULL和x = y。也許其他人可以澄清這一點。
All other optimizations I can think of would involve precalculation or caching. You can create [temporary] tables matching certain rules or add further columns holding matching rules.