python - How to optimize this Pandas code to run faster

Question

Welcome To Ask or Share your Answers For Others

python - How to optimize this Pandas code to run faster

asked Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

python - How to optimize this Pandas code to run faster

I have this code to create a swarmplot from data from a DataFrame:

df = pd.DataFrame({"Refined__Some_ID":some_id_list,
                   "Refined_Age":age_list,
                   "Name":name_list                   
                          }
                         )
#Creating dataframe with strings from the lists
select  = df.apply(lambda row : any([isinstance(e, str) for e in row  ]),axis=1) 
#Exlcluding data from select in a new dataframe
dfAnalysis = df[~select]
dfAnalysis['Refined_Age'].replace('', np.nan, inplace=True)
dfAnalysis = dfAnalysis.dropna()
dfAnalysis['Refined_Age'] = dfAnalysis['Refined_Age'].apply(int)
# print dfAnalysis
print type(dfAnalysis['Refined_Patient_Age'][1])
g = sns.swarmplot(x = dfAnalysis['Refined_ID'],y = dfAnalysis['Refined_Age'], hue = dfAnalysis['Name'], orient="v")
g.set_xticklabels(g.get_xticklabels(),rotation=30)
# print g

It's taking a crazy amount of time to run (14 hours and counting!). How can I speed it up? Also, why is the code so slow in the first place?

The 3 lists being included in the dataframe are from a Couchdb database with about 320k documents.

UPDATE 1

I had intended to view the first 20 categories only but excluded the code to do so.

The line should have been:

x = dfAnalysis['Refined_ID'].iloc[:20]

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2022-01-31T07:26:47+0000

Do you really mean a swarmplot with several hundred thousand points? Besides it's gonna take forever, it's nonsense. Try with the first 1000 and see what kind of mess you get. Then use a boxplot or a violinplot instead. Try to understand your tools before using them.

From the docstring:

[...] it does not scale as well to large numbers of observations (both in terms of the ability to show all the points and in terms of the computation needed to arrange them).

Categories

python - How to optimize this Pandas code to run faster

python - How to optimize this Pandas code to run faster

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags