I was applying .sample with random_state
set to a constant and after using set_index
it started selecting different rows. A member dropped that was previously included in the subset. I'm unsure how seeding selects rows. Does it make sense or did something go wrong?
Here is what was done:
df.set_index('id',inplace=True, verify_integrity=True)
df_small_F = df.loc[df['gender']=='F'].apply(lambda x: x.sample(n=30000, random_state=47))
df_small_M = df.loc[df['gender']=='M'].apply(lambda x: x.sample(n=30000, random_state=46))
df_small=pd.concat([df_small_F,df_small_M],verify_integrity=True)
When I sort df_small by index and print, it produces different results.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…