I'm trying to create a ranking for string answers and create frequency distribution with the ranking (and use it for data visualization as well).
I tried to use this code:
education_rank = {
' 1st-4th': 2, ' Preschool': 1, ' 5th-6th': 3, ' 7th-8th': 4,
' 9th': 5, ' 10th':6, ' 11th': 7, ' 12th': 8, ' HS-grad': 9,
' Assoc-voc': 10, ' Assoc-acdm': 11, ' Some-college': 12,
' Bachelors': 13, ' Prof-school': 14, ' Masters': 15, ' Doctorate': 16}
adult_data.education.rank = education_rank.keys()
fd_education = pd.value_counts(adult_data.education)
print(fd_education)
fd_education.index = education_rank.values()
print(fd_education)
Results:
The former one is the correct frequency distribution. However, when I add the ranking, the frequency distribution results in the second column remain the same and would not change accordingly.
HS-grad 15784
Some-college 10878
Bachelors 8025
Masters 2657
Assoc-voc 2061
11th 1812
Assoc-acdm 1601
10th 1389
7th-8th 955
Prof-school 834
9th 756
12th 657
Doctorate 594
5th-6th 509
1st-4th 247
Preschool 83
Name: education, dtype: int64
2 15784
1 10878
3 8025
4 2657
5 2061
6 1812
7 1601
8 1389
9 955
10 834
11 756
12 657
13 594
14 509
15 247
16 83
Name: education, dtype: int64
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…