First a bit of theory: Null (SQL)
The most important parts for us from the above link:
Comparisons with NULL and the three-valued logic (3VL)
Since Null is not a member of any data domain, it is not considered a
"value", but rather a marker (or placeholder) indicating the absence
of value. Because of this, comparisons with Null can never result in
either True or False, but always in a third logical result,
Unknown.[8] The logical result of the expression below, which compares
the value 10 to Null, is Unknown:
SELECT 10 = NULL -- Results in Unknown
so that both comparisons: x = NULL
and x <> NULL
evaluates to NULL(unknown).
SQL implements three logical results, so SQL implementations must
provide for a specialized three-valued logic (3VL). The rules
governing SQL three-valued logic are shown in the tables below (p and
q represent logical states)"[9] The truth tables SQL uses for AND, OR,
and NOT correspond to a common fragment of the Kleene and ?ukasiewicz
three-valued logic (which differ in their definition of implication,
however SQL defines no such operation).
+---------+-------------+-------------+-------------+-----------+--------+
| p | q | p OR q | p AND q | p = q |p != q |
+---------+-------------+-------------+-------------+-----------+--------+
| True | True | True | True | True | False |
| True | False | True | False | False | True |
| True | Unknown | True | Unknown | Unknown | Unknown|
| False | True | True | False | False | True |
| False | False | False | False | True | False |
| False | Unknown | Unknown | False | Unknown | Unknown|
| Unknown | True | True | Unknown | Unknown | Unknown|
| Unknown | False | Unknown | False | Unknown | Unknown|
| Unknown | Unknown | Unknown | Unknown | Unknown | Unknown|
+---------+-------------+-------------+-------------+-----------+--------+
Effect of Unknown in WHERE clauses
SQL three-valued logic is encountered in Data Manipulation Language
(DML) in comparison predicates of DML statements and queries. The
WHERE clause causes the DML statement to act on only those rows for
which the predicate evaluates to True.
So in short: WHERE clause treats NULL as FALSE
Now please consider a simpler case:
SELECT * FROM T1;
| X |
|--------|
| 1 |
| (null) |
and a query:
SELECT * FROM t1 WHERE x IN (1, NULL);
The above query is a shortland to this one:
SELECT * FROM t1
WHERE x = 1
OR x = NULL
For the second row from table t
( x = NULL) this condition looks like:
WHERE NULL = 1
OR NULL = NULL
so this condition for the row x=NULL
evaluates to NULL because NULL=1
is NULL, NULL=NULL
is NULL, and NULL OR NULL
is also NULL (please see the table 3VL above).
Now consider more curious case:
SELECT * FROM t1 WHERE x NOT IN (1, NULL);
This clause x NOT IN (1, NULL)
is equivalent to NOT ( x IN (1, NULL) )
so it is also equivalent to:
NOT (
x = 1
OR
x = NULL
)
and according to De Morgan's laws it is equivalent to:
NOT ( x = 1 ) AND NOT ( x = NULL )
and (if we replace NOT x = y
with x <> y
) it's also equivalent to:
x <> 1 AND x <> NULL
Please look closely at the last condition:
WHERE
x <> 1 AND x <> NULL
We know than x <> NULL
always evaluates to NULL. We also know from the 3VL table above, that both true AND NULL
is NULL and false AND NULL
evaluates to FALSE, so the whole condition always evaluates either to FALSE or NULL, but it never evaluates to TRUE.
Therefore a query with this condition:
SELECT .....
WHERE x NOT IN ( NULL, whatever)
always returns empty resultset
And now your query, which is also curious:
SELECT * FROM t1
WHERE (id, val) NOT IN (select id, val from data2);
which can be rewriten (using constant values) to:
SELECT * FROM t1
WHERE (id, val) NOT IN (
(1, null),
(2, 2 )
)
This query is using so called row value expression
Basically a condition using the row value expressin like this
(a, b) = (x, y)
is equivalent to this one:
a = x AND b = y
so the above query can be rewritten into this one:
SELECT * FROM t1
WHERE NOT (
id = 1 AND val = NULL
OR
id = 2 AND val = 2
)
According to De Morgan's laws this is identical to:
SELECT * FROM t1
WHERE
NOT ( id = 1 AND val = NULL )
AND
NOT ( id = 2 AND val = 2 )
and further to:
SELECT * FROM t1
WHERE
( id <> 1 OR val <> NULL )
AND
( id <> 2 OR val <> 2 )
Since the first part ( id <> 1 OR val <> NULL )
of the condition evaluates to true only in a case where id <> 1
(please see the 3VL table above), this condition can be simplified into:
SELECT * FROM t1
WHERE
( id <> 1 )
AND
( id <> 2 OR val <> 2 )
and further (according to De Morgan's laws) into:
SELECT * FROM t1
WHERE
id <> 1 AND id <> 2
OR
id <> 1 AND val <> 2
so neither (1,1)
nor (2,2)
from the source data1
comply with these conditions.