Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
119 views
in Technique[技术] by (71.8m points)

python - Creating new column in df with 2 parameters

I need to create a new column based on 2 conditions, Country with population over 50,000 and Recovery Rate in descending order.


df1['Recovery Rate'] = df1.apply(lambda x: (x['Total Recovered']/x['Total Infected']), axis = 1)

df1['Populated Country'] = df1.apply(if lambda row: row.Country == Country and (row: row.Population 2020 (in thousands) >= 50000), axis = 1) 

df1.sort_values(['Recovery Rate'], ascending = [False])

print(df1[['Populated Country','Recovery Rate']].head(10))

But I am having the following error for the new column code.


File "<ipython-input-25-ab35558abd61>", line 4
df1['Populated Country'] = df1.apply(if lambda row: row.Country == Country and (row: row.Population 2020 (in thousands) >= 50000), axis = 1)
                                         ^
SyntaxError: invalid syntax
>Country    Daily Tests Daily Tests per 100000 people   Pop density per sq. km  Urban Population (%)    Start Date of Quarantine/Lockdown   Start Date of Schools Closure   Start Date of Public Place Restrictions Hospital Beds per 1000 people   M-to-F Gender Ratio at Birth    ... Death rate from lung diseases per 100k people for male  Median Age  GDP 2018    Crime Index Population 2020 (in thousands)  Smokers in Population (%)   % of Females in Population  Total Infected  Total Deaths    Total Recovered
>0  Albania NaN NaN 105 63  NaN NaN NaN 2.9 1.08    ... 17.04   32.9    1.510250e+10    40.02   2877.797    28.7    49.063095   949 31  742
>1  Algeria NaN NaN 18  73  NaN NaN NaN 1.9 1.05    ... 12.81   28.1    1.737580e+11    54.41   43851.044   15.6    49.484268   7377    561 3746
>2  Argentina   NaN NaN 17  93  3/20/2020   NaN NaN 5.0 1.05    ... 42.59   31.7    5.198720e+11    62.96   45195.774   21.8    51.237348   8809    393 2872
>3  Armenia 694.0   2.342029    104 63  NaN NaN NaN 4.2 1.13    ... 35.99   35.1    1.243309e+10    20.78   2963.243    24.1    52.956577   5041    64  2164
>4  Australia   31635.0 12.405939   3   86  NaN NaN 3/23/2020   3.8 1.06    ... 22.16   38.7    1.433900e+12    42.70   25499.884   14.7    50.199623   7072    100 6431

This is the data - https://raw.githubusercontent.com/ptw2/PRGA/main/covid19_by_country.csv

This is the result I should get

>         Country  Recovery Rate
>17         China       0.943459
>87      Thailand       0.941972
>47   South Korea       0.906031
>32       Germany       0.875705
>95       Vietnam       0.811728

Can anyone help?

question from:https://stackoverflow.com/questions/66056181/creating-new-column-in-df-with-2-parameters

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

In this case, it's cleaner to define a function to do the computation, then apply the function in a lambda statement:

def compute_rr(row):
    if row['Population 2020 (in thousands)'] >= 50000:
        return row['Total Recovered'] / row['Total Infected']

df1['Recovery Rate'] = df1.apply(lambda row: compute_rr(row), axis = 1)
df1 = df1.sort_values(['Recovery Rate'], ascending = [False])

print(df1[['Country','Total Recovered','Total Infected','Recovery Rate']].head())

#Output:
        Country  Total Recovered  Total Infected  Recovery Rate
17        China            79310           84063       0.943459
87     Thailand             2857            3033       0.941972
47  South Korea            10066           11110       0.906031
32      Germany           155681          177778       0.875705
95      Vietnam              263             324       0.811728

If you really want to alter your dataframe to eliminate countries with population <50K, just add the following line to the bottom of the previous code. It gets rid of all rows that have NaN in the "Recovery Rate" column.

df1 = df1[df1['Recovery Rate'].notna()]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...