I'm trying to compare cells within a data frame using pandas.
the data looks like that:
seqnames, start, end, width, strand, s1, s2, s3, sn
1, Ha412HOChr01, 1, 220000, 220000, CN2, CN10, CN2, CN2
2, Ha412HOChr01, 1, 220000, 220000, CN2, CN2, CN2, CN2
3, Ha412HOChr01, 1, 220000, 220000, CN2, CN4, CN2, CN2
n, Ha412HOChr01, 1, 220000, 220000, CN2, CN2, CN2, CN6
I was able to make individual comparisons with the following code
import pandas as pd
df = pd.read_csv("test.csv")
if df.iloc[0,5] != df.iloc[0,6]:
print("yay!")
else:
print("not intersting...")
I would like to iterate a comparison between s1 and all the other s columns, line by line in a loop or in any other more efficient methods.
when i've tried the following code:
df = pd.read_csv("test.csv")
df.columns
#make sure to change in future analysis
ref = df[' Sunflower_14_S8']
all_the_rest = df.drop(['seqnames', ' start', ' end', ' width', ' strand'], axis=1)
#all_the_rest.columns
OP = ref.eq(all_the_rest)
OP.to_csv("OP.csv")
i've got a wired output
0,False,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False,False444,False,False,False,False,False,False,False,False,False,False,False,False,False
it seems like it compare all the characters instead of the strings
I'm new to programming and I'm stuck, appreciate your help!