Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
448 views
in Technique[技术] by (71.8m points)

hadoop - Hive command to execute NOT IN clause

I have two tables,tab1 & tab2.

tab1(T1)  tab2(T2)
a1         b1
b1         c1
c1         f1
d1         g1

I am looking for the values from table T1 that are not present in T2. In this case, the output should be a1 d1

I have tried with the following query but couldn't get the right solution.

select distinct tab1.T1 from tab1 left semi join tab2 on (tab1.T1!=tab2.T2);
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
SELECT t1.str
FROM tab1 t1 
LEFT OUTER JOIN tab2 t2 ON t1.str = t2.str
WHERE t2.str IS NULL;

Result:

OK
a1
d1

"Why is the t2.str is null condition there": Left outer joins ensure that all values from the first table are included in the results. So then what happens when there are no values in the second table: in that case all of the columns from the second table are reported as null.

So in the case above we are searching precisely for the condition that the second table entries are missing - and thus we:

  • Choose one of the never-empty (aka not null) columns from table two.
  • So: is number an always-present column? If not then please choose another one
  • Specify the condition "table1-alias"."table1-never-null-column" = null. That means that the record is actually not present in the join condition - and thus we found the records existing only in table 1.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...