'Optimization problem: I want to group folders into N groups, keeping their order and optimizing the number of files in each group

I have a problem, and I think it should be a known optimization problem, but I don't know its name. I ask for its name.

What I want :
let's say we have folders in alphabetic orders, each folder has a specific amount of files.
I want to group the folders into N groups, keeping the alphabetic orders of the folders, and I want to optimize to have approximately the same amount of files in each group.

You can find an illustration in this image, with 9 folders, 196 files and N=2.
Of course, we have many more folders and more files in each folder.

We have already thought of a naive solution (in the image), using the division by the optimal number of files. However, it is not optimal: for a large folder with a lot of files, we can imagine that the optimal would be to have a group with just that large folder. Our algorithm doesn't allow this. We can do some exceptions for big folders, bring them to a single group before treating each side again, but I think there would be other exceptions next...
That's why I want to find an equivalent problem in order to study the different logics that already exist.

I can think about some solutions, but I really think this problem has already been studied (with others than folder/files) with better solutions that I will find myself.

Thanks for your help :)



Solution 1:[1]

I ran into the same problem, and came up with this process.
It's not always perfect, but close enough.

See demo and play with the numbers

Let's say you have the folder sizes [5,1,3,2,4,1,4] and you want to divide them into 3 groups,
so that the max sum of each group will be the smallest.

The concept goes like this:

  1. Place all the folders in group 1. create empty groups for the rest.
  2. Iterate on the folders starting from the last in group 1
  3. For each, asking - would it be smart to move to the next group?
    Which option would yield the smallest max?
    If so - we move the folder to the beginning of next group.
    If the two options are equal - we roll anyway.
  4. When a folder rolls, we ask the same question about its new group.
    Note: the rolling folder is now the last one on the new hosting group. It might not be the same folder that is rolled again.
  5. Continue until no roll is made.
  6. Moving on to the next folder (last one on group 1)
  7. When the next folder in group 1 should not roll, or when there is only one folder left in group 1 - it ends.

Step 1: Initial
[5,1,2,2,4,1,4] [] []

Step 2: Looking at the last 4 - if we keep it in group 1, the max sum would be 19.
If we move it to group 2 - it would be 15.
Since 15<=19 - we roll it.
[5,1,2,2,4,1] ? [4] []

Step 3: Rolling the 4 from group 2?3 since (4<=4)
[5,1,2,2,4,1] [] ? [4]

Step 4: Since we reached the last group, going back to group 1.
Rolling the 1 from group 1?2 since (14<=15)
[5,1,2,2,4] ? [1] [4]

Step 5: NOT rolling the 1 from group 2?3 since !(5<=4)
[5,1,2,2,4] [1] ? [4]

Step 6: Back to group 1 again. Rolling the 4 from group 1?2 since (10<=14)
[5,1,2,2] ? [4-,1] [4]

Step 7: Rolling the 1 from group 2?3 since (4<=5)
[5,1,2,2] [4] ? [1-,4]

Step 8: Rolling the 2 from group 1?2 since (8<=10)
[5,1,2] ? [2-,4] [1,4]

Step 9: NOT rolling the 4 from group 2?3 since !(9<=6)
[5,1,2] [2,4] ? [1,4]

Step 10: Rolling the 2 from group 1?2 since (8<=8)
[5,1] ? [2-,2,4] [1,4]

Step 11: NOT rolling the 4 from group 2?3 since !(9<=8)
[5,1] [2,2,4] ? [1,4]

Step 12: NOT rolling the 1 from group 1?2 since !(9<=8)
Game over.
[5,1] ? [2,2,4] [1,4]

We ended up with groups sized 6,8,5, which is the optimal in this case.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1