numpy - Python Pandas: remove entries based on the number of occurrences

Question

Welcome To Ask or Share your Answers For Others

numpy - Python Pandas: remove entries based on the number of occurrences

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

numpy - Python Pandas: remove entries based on the number of occurrences

I'm trying to remove entries from a data frame which occur less than 100 times. The data frame data looks like this:

Now I count the number of tag occurrences like this:

bytag = data.groupby('tag').aggregate(np.count_nonzero)

But then I can't figure out how to remove those entries which have low count...

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T02:50:10+0000

New in 0.12, groupby objects have a filter method, allowing you to do these types of operations:

In [11]: g = data.groupby('tag')

In [12]: g.filter(lambda x: len(x) > 1)  # pandas 0.13.1
Out[12]:
   pid  tag
1    1   45
2    1   62
4    2   45
7    3   62

The function (the first argument of filter) is applied to each group (subframe), and the results include elements of the original DataFrame belonging to groups which evaluated to True.

Note: in 0.12 the ordering is different than in the original DataFrame, this was fixed in 0.13+:

In [21]: g.filter(lambda x: len(x) > 1)  # pandas 0.12
Out[21]: 
   pid  tag
1    1   45
4    2   45
2    1   62
7    3   62

Categories

numpy - Python Pandas: remove entries based on the number of occurrences

numpy - Python Pandas: remove entries based on the number of occurrences

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags