Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
970 views
in Technique[技术] by (71.8m points)

oracle - NOT IN selection with NULL values

Who can explain the result of the second select. I expected the first row of data1 (1, '1') is not in data2?

with data1(id, val) as
 (select 1, '1' from dual union all 
  select 2, '2' from dual),
     data2(id, val) as
 (select 1, null from dual union all 
  select 2, '2' from dual)

select id, val
  from data1
 where (id, val) IN (select id, val from data2);

-- Result (as expected):
-- id, val
-- 2   '2'

with data1(id, val) as
 (select 1, '1' from dual union all 
  select 2, '2' from dual),
     data2(id, val) as
 (select 1, null from dual union all 
  select 2, '2' from dual)

select id, val
  from data1
 where (id, val) NOT IN (select id, val from data2)

-- No Result ???

I.e. first row (1,'1') is neither IN data2 nor NOT IN data2?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

First a bit of theory: Null (SQL)

The most important parts for us from the above link:

Comparisons with NULL and the three-valued logic (3VL)

Since Null is not a member of any data domain, it is not considered a "value", but rather a marker (or placeholder) indicating the absence of value. Because of this, comparisons with Null can never result in either True or False, but always in a third logical result, Unknown.[8] The logical result of the expression below, which compares the value 10 to Null, is Unknown:

SELECT 10 = NULL       -- Results in Unknown

so that both comparisons: x = NULL and x <> NULL evaluates to NULL(unknown).


SQL implements three logical results, so SQL implementations must provide for a specialized three-valued logic (3VL). The rules governing SQL three-valued logic are shown in the tables below (p and q represent logical states)"[9] The truth tables SQL uses for AND, OR, and NOT correspond to a common fragment of the Kleene and ?ukasiewicz three-valued logic (which differ in their definition of implication, however SQL defines no such operation).

+---------+-------------+-------------+-------------+-----------+--------+
|    p    |        q    |     p OR q  |     p AND q |    p = q  |p != q  |
+---------+-------------+-------------+-------------+-----------+--------+
| True    |     True    |     True    |     True    |   True    | False  |
| True    |     False   |     True    |     False   |   False   | True   |
| True    |     Unknown |     True    |     Unknown |   Unknown | Unknown|
| False   |     True    |     True    |     False   |   False   | True   |
| False   |     False   |     False   |     False   |   True    | False  |
| False   |     Unknown |     Unknown |     False   |   Unknown | Unknown|
| Unknown |     True    |     True    |     Unknown |   Unknown | Unknown|
| Unknown |     False   |     Unknown |     False   |   Unknown | Unknown|
| Unknown |     Unknown |     Unknown |     Unknown |   Unknown | Unknown|
+---------+-------------+-------------+-------------+-----------+--------+

Effect of Unknown in WHERE clauses

SQL three-valued logic is encountered in Data Manipulation Language (DML) in comparison predicates of DML statements and queries. The WHERE clause causes the DML statement to act on only those rows for which the predicate evaluates to True.

So in short: WHERE clause treats NULL as FALSE




Now please consider a simpler case:

SELECT * FROM T1;

|      X |
|--------|
|      1 |
| (null) |

and a query:

SELECT * FROM t1 WHERE x IN (1, NULL);

The above query is a shortland to this one:

SELECT * FROM t1 
WHERE x = 1
  OR  x = NULL

For the second row from table t ( x = NULL) this condition looks like:

WHERE NULL = 1
   OR NULL = NULL

so this condition for the row x=NULL evaluates to NULL because NULL=1 is NULL, NULL=NULL is NULL, and NULL OR NULL is also NULL (please see the table 3VL above).



Now consider more curious case:

SELECT * FROM t1 WHERE x NOT IN (1, NULL);

This clause x NOT IN (1, NULL) is equivalent to NOT ( x IN (1, NULL) )
so it is also equivalent to:

NOT (
  x = 1
  OR
  x = NULL
)

and according to De Morgan's laws it is equivalent to:

NOT ( x = 1 ) AND NOT ( x = NULL )

and (if we replace NOT x = y with x <> y) it's also equivalent to:

 x <> 1 AND x <> NULL

Please look closely at the last condition:

WHERE 
x <> 1 AND x <> NULL

We know than x <> NULL always evaluates to NULL. We also know from the 3VL table above, that both true AND NULL is NULL and false AND NULL evaluates to FALSE, so the whole condition always evaluates either to FALSE or NULL, but it never evaluates to TRUE.

Therefore a query with this condition:

SELECT .....
WHERE x NOT IN ( NULL, whatever)

always returns empty resultset




And now your query, which is also curious:

SELECT * FROM t1
WHERE (id, val) NOT IN (select id, val from data2);

which can be rewriten (using constant values) to:

SELECT * FROM t1
WHERE (id, val) NOT IN (
       (1, null),
       (2, 2 )
)

This query is using so called row value expression

Basically a condition using the row value expressin like this

(a, b) = (x, y)

is equivalent to this one:

a = x AND b = y

so the above query can be rewritten into this one:

SELECT * FROM t1
WHERE NOT (
   id = 1 AND val = NULL
   OR
   id = 2 AND val = 2
)

According to De Morgan's laws this is identical to:

SELECT * FROM t1
WHERE 
   NOT ( id = 1 AND val = NULL )
   AND
   NOT ( id = 2 AND val = 2 )

and further to:

SELECT * FROM t1
WHERE 
   ( id <> 1 OR val <> NULL )
   AND
   ( id <> 2 OR val <> 2 )

Since the first part ( id <> 1 OR val <> NULL ) of the condition evaluates to true only in a case where id <> 1 (please see the 3VL table above), this condition can be simplified into:

SELECT * FROM t1
WHERE 
   ( id <> 1 )
   AND
   ( id <> 2 OR val <> 2 )

and further (according to De Morgan's laws) into:

SELECT * FROM t1
WHERE 
   id <> 1 AND id <> 2
   OR
   id <> 1 AND  val <> 2

so neither (1,1) nor (2,2) from the source data1 comply with these conditions.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...