How to reduce the size of a Dataframe in Python

Manuela · Nov 10, 2021

Hello ! I have a dataframe in Python that has a column called "animal" with rows that contain the name of 4 animals: some rows with "bird", some with "dolphin", some with "dog" and finally some rows with "Others". I check the number of rows corresponding to each one of these with:
Code:
from collections import Counter

cnt = Counter(data.animal)
print(cnt)
and I obtain:
Code:
Counter({'Others': 1366, 'dog': 922, 'bird': 133, 'dolphin': 10})
I would like to reduce the size of the classes "others" and "dog". How can I do ? I woud like to remove randomly some rows so that for example I have:
Code:
Counter({'Others':140, 'dog': 100, 'bird': 133, 'dolphin': 10})
I know I could use drop in this way:
Code:
# Set the index of the DataFrame to the column name
data_with_index = data.set_index("animal")
# With the index, we can drop the rows for a single animal with its name
data_with_index = data_with_index.drop("Others")
But I would delete all the rows with that name. Instead I would like to delete only a certain number of those. How can I do ?

Manuela · Nov 11, 2021

Just to make the question clearer: I start from a dataframe 2431x5 (2431 rows and 5 columns, one of which was named "animal") and I would like to end up with a dataframe like 383x5 by reducing the classes "others" and "dog" which have larger size.

Log in or Sign up

How to reduce the size of a Dataframe in Python

Manuela New Member

Manuela New Member

Share This Page

Log in or Sign up

How to reduce the size of a Dataframe in Python

Manuela New Member

Manuela New Member

Share This Page

Useful Searches