For these operations pure Python is generally more efficient.
%timeit pd.Series([set1.union(set2) for set1, set2 in zip(df['A'], df['B'])])
10 loops, best of 3: 43.3 ms per loop
%timeit df.apply(lambda x: x.A.union(x.B), axis=1)
1 loop, best of 3: 2.6 s per loop
DataFrame for timings:
import pandas as pd
import numpy as np
l1 = [set(np.random.choice(list('abcdefg'), np.random.randint(1, 5))) for _ in range(100000)]
l2 = [set(np.random.choice(list('abcdefg'), np.random.randint(1, 5))) for _ in range(100000)]
df = pd.DataFrame({'A': l1, 'B': l2})
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…