python - Set Union in pandas

Question

Welcome To Ask or Share your Answers For Others

python - Set Union in pandas

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Set Union in pandas

I have two columns which I stored sets in my dataframe.

I want to perform set union on the two columns using fast vectorized operation

df['union'] = df.set1 | df.set2

but the error TypeError: unsupported operand type(s) for |: 'set' and 'bool' is preventing me from doing so as I have type np.nan in both columns.

Is there a good solution to overcome this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:42:47+0000

For these operations pure Python is generally more efficient.

%timeit pd.Series([set1.union(set2) for set1, set2 in zip(df['A'], df['B'])])
10 loops, best of 3: 43.3 ms per loop

%timeit df.apply(lambda x: x.A.union(x.B), axis=1)
1 loop, best of 3: 2.6 s per loop

DataFrame for timings:

import pandas as pd
import numpy as np
l1 = [set(np.random.choice(list('abcdefg'), np.random.randint(1, 5))) for _ in range(100000)]
l2 = [set(np.random.choice(list('abcdefg'), np.random.randint(1, 5))) for _ in range(100000)]

df = pd.DataFrame({'A': l1, 'B': l2})

Categories

python - Set Union in pandas

python - Set Union in pandas

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags