Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
766 views
in Technique[技术] by (71.8m points)

python - Compare Multiple Columns to Get Rows that are Different in Two Pandas Dataframes

I have two dataframes:

df1=
    A    B   C
0   A0   B0  C0
1   A1   B1  C1
2   A2   B2  C2

df2=
    A    B   C
0   A2   B2  C10
1   A1   B3  C11
2   A9   B4  C12

and I want to find rows in df1 that are not found in df2 based on one or two columns (or more columns). So, if I only compare column 'A' then the following rows from df1 are not found in df2 (note that column 'B' and column 'C' are not used for comparison between df1 and df2)

    A    B   C
0   A0   B0  C0

And I would like to return a series with

0   False
1   True
2   True

Or, if I only compare column 'A' and column 'B' then the following rows from df1 are not found in df2 (note that column 'C' is not used for comparison between df1 and df2)

    A    B   C
0   A0   B0  C0
1   A1   B1  C1

And I would want to return a series with

0   False
1   False
2   True

I know how to accomplish this using sets but I am looking for a straightforward Pandas way of accomplishing this.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If your version is 0.17.0 then you can use pd.merge and pass the cols of interest, how='left' and set indicator=True to whether the values are only present in left or both. You can then test whether the appended _merge col is equal to 'both':

In [102]:
pd.merge(df1, df2, on='A',how='left', indicator=True)['_merge'] == 'both'

Out[102]:
0    False
1     True
2     True
Name: _merge, dtype: bool

In [103]:
pd.merge(df1, df2, on=['A', 'B'],how='left', indicator=True)['_merge'] == 'both'

Out[103]:
0    False
1    False
2     True
Name: _merge, dtype: bool

output from the merge:

In [104]:
pd.merge(df1, df2, on='A',how='left', indicator=True)

Out[104]:
    A B_x C_x  B_y  C_y     _merge
0  A0  B0  C0  NaN  NaN  left_only
1  A1  B1  C1   B3  C11       both
2  A2  B2  C2   B2  C10       both

In [105]:    
pd.merge(df1, df2, on=['A', 'B'],how='left', indicator=True)

Out[105]:
    A   B C_x  C_y     _merge
0  A0  B0  C0  NaN  left_only
1  A1  B1  C1  NaN  left_only
2  A2  B2  C2  C10       both

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...