'Extracting specific number of rows from dataframe
I have a csv file having two columns i.e. imagename and ID. There are multiple image names for same ID as shown in picture. Number of image names against id is different. I need to extract same number of image names against ids and drop the exceeded rows . For example if id 5 has the lowest number of images say 8 then all ids would have corresponding 8 image names.
This code is extracting first 100 id's but number of images against each ID is different i.e. ID =1 has 11 images but Id=2 has 24 images and so on
`select_id = df.loc[df['id'] <= 100]`
Expected output equal number of images against each ID Expected output
Solution 1:[1]
You can do this:
# get min number of occurrences
min_len = df['id'].value_counts().min()
# group by and get head for each group
df.groupby('id').head(min_len)
Solution 2:[2]
Welcome to Stack Overflow! When asking a question on here its a good idea to provide a 'minimum reproduceable example" and example data (https://stackoverflow.com/help/minimal-reproducible-example). I created some example data for you this time. See the code below along with how to subset like you are describing.
# Create some example data
id <- c(0,0,0,1,1,1,1,2,2,3,3,3,3,4,4,4,4,4,4,4)
file_names <- paste("file_number_",seq(1:length(id)),".jpg",sep = "")
dataframe <- data.frame(id = id, file_names = file_names)
# view the data
dataframe
This initial data looks like:
# id file_names
# 1 0 file_number_1.jpg
# 2 0 file_number_2.jpg
# 3 0 file_number_3.jpg
# 4 1 file_number_4.jpg
# 5 1 file_number_5.jpg
# 6 1 file_number_6.jpg
# 7 1 file_number_7.jpg
# 8 2 file_number_8.jpg
# 9 2 file_number_9.jpg
# 10 3 file_number_10.jpg
# 11 3 file_number_11.jpg
# 12 3 file_number_12.jpg
# 13 3 file_number_13.jpg
# 14 4 file_number_14.jpg
# 15 4 file_number_15.jpg
# 16 4 file_number_16.jpg
# 17 4 file_number_17.jpg
# 18 4 file_number_18.jpg
# 19 4 file_number_19.jpg
# 20 4 file_number_20.jpg
Now to subset as you describe.
# find the lowest amount of entries for an id
min <- min(table(dataframe$id))
#create and empty df for your subset
final_dataframe <- data.frame()
# get the first two images for every id
for (i in 1:length(unique(dataframe$id))) {
final_dataframe <- rbind(dataframe[which(dataframe$id == unique(dataframe$id)[i]),][1:2,],final_dataframe)
}
# view
final_dataframe
Final dataframe:
# id file_names
# 14 4 file_number_14.jpg
# 15 4 file_number_15.jpg
# 10 3 file_number_10.jpg
# 11 3 file_number_11.jpg
# 8 2 file_number_8.jpg
# 9 2 file_number_9.jpg
# 4 1 file_number_4.jpg
# 5 1 file_number_5.jpg
# 1 0 file_number_1.jpg
# 2 0 file_number_2.jpg
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | D.Manasreh |
Solution 2 | Henry Holm |