'Function to modify a list of lists in order to prevent repeated numbers within sublists is not working completely
Community of Stackoverflow:
I have a lists of sublists of sublists named dicts that was built by taken randomly from a df's index some values. The values can be repeated within the first level of the list of lists but not within the level of lists[e]. For example:
[[[40, 23, 29, 41, 42], [], [19, 17, 21, 20, 24]],
[[3, 9, 43, 44, 17], [], [20, 9, 23, 3, 27], [3, 30, 43]], #wrong because 9,3 and 43 are repeated in the three sublists
[[2, 26, 42, 29, 44], [], [2, 3, 44, 31, 27]], #2,44 are repeated
[[31, 43, 32, 23, 33], [], [44, 9, 27, 23, 29]], #23 is repeated
[[12, 27, 9, 44, 2], [], [25, 29, 40, 27, 12]]] #27 repeated
As it can be seen, it doesn't matter if the number 3 is repeated in the second sublist of sublists and also in the third sublist of sublists. The empty lists don't matter.
I've built a function that "corrects" the repeating of those values but apparently it doesn't solve all the cases. It takes three arguments: the mentioned list of lists, the df where it takes the numbers (the df's index) called matrix and "cuantosamples" which is a list of lists that indicates how the final result will be partitioned (in uneven sized lists). It's important to note that the code also contains a segment that doesn't allow a value that is replacing a repeated value to be taken again to replace another value in the next sublist:
def vigilado(list1,matrix,cuantosamples):
stored=[]
lists=[[]for e in range(len(dicts))]
vals=list(matrix.index.values)
for e,g in zip(list1,lists):
vig=list(itertools.chain(*e))
dup=list(duplicates(vig))
lendup=len(dup)
if lendup>0:
#assign new values
vals=[e for e in vals if e not in dup and e not in vig and e not in stored] #si esta repetido en la sublista 1, que no vuelva atomar esos valores
sample=matrix.loc[vals].sample(len(dup),weights='weights')
vls=list(sample.index.values)
#identify values to be replaced
dups=[i for i, j in enumerate(vig) if j in dup]
dups2=dups[lendup:]
for i in range(len(dups2)):
vig[dups2[i]]=vls[i]
g.extend(vig)
stored.extend(vig)
l1=[[]for e in range(0,5)]
for e,g,h in zip(lists,cuantosamples,l1):
iterate=iter(e)
l2=[list(islice(iterate,0,i))for i in g]
h.extend(l2)
return(l1)
vigilated=vigilado(dicts,matrix,cuantosamples)
vigilated
This return the following lists of lists, which as it can be seen, it works in mostly of the cases but not in all of them and I don't know why:
[[[40, 23, 29, 41, 42], [], [19, 17, 21, 20, 24]],
[[3, 9, 43, 44, 17], [], [20, 9, 23, 16, 27], [33, 30, 14]], #3 and 43 are no longer repeated, BUT 9 IS STILL REPEATED
[[2, 26, 42, 29, 44], [], [22, 3, 5, 31, 27]], #2 and 44 no longer repeated
[[31, 43, 32, 23, 33], [], [44, 9, 27, 6, 29]], #23 no longer repeated
[[12, 27, 9, 44, 2], [], [25, 29, 40, 1, 28]]] #27 no longer repeated
Can someone please help me? I don't have any idea of why the code is not applied to all cases because I thought that would solve it. Thanks.
Edit: this would be my desired output:
[[[40, 23, 29, 41, 42], [], [19, 17, 21, 20, 24]],
[[3, 9, 43, 44, 17], [], [20, 10, 23, 16, 27], [33, 30, 14]], #9 that wasn't replaced before is replaced here with a 10
[[2, 26, 42, 29, 44], [], [22, 3, 5, 31, 27]],
[[31, 43, 32, 23, 33], [], [44, 9, 27, 6, 29]],
[[12, 27, 9, 44, 2], [], [25, 29, 40, 1, 28]]]
As you can see it's very similar to my resulting list (because my code somehows replaces almost all values but one or two). The change here was that I replaced the 9 of the lists[1][3] to 10.
Solution 1:[1]
My response does not point out where the problem of your code is, but two approaches to your goal.
Approach 1
Generate dicts that does not have repeated index within each list of dicts. Explanations in code.
import numpy as np
index = np.arange(100)
cuantosamples = [[5, 0, 5], [5, 0, 5, 3], [5, 0, 5], [5, 0, 5], [5, 0, 5]]
np.random.seed(0)
dicts = [
list(map(list, # convert np.array to list
np.split( # split a list into sublists
np.random.choice(index, sum(needs), replace=False), # generate random choices without replacement
np.cumsum(needs)[:-1] # how to split
)))
for needs in cuantosamples
]
# print(dicts)
Approach 2
Replace repeated values with new values. Explanations in code.
dicts = [
[[40, 23, 29, 41, 42], [], [19, 17, 21, 20, 24]],
[[3, 9, 43, 44, 17], [], [20, 9, 23, 3, 27], [3, 30, 43]],
[[2, 26, 42, 29, 44], [], [2, 3, 44, 31, 27]],
[[31, 43, 32, 23, 33], [], [44, 9, 27, 23, 29]],
[[12, 27, 9, 44, 2], [], [25, 29, 40, 27, 12]]
]
np.random.seed(0)
new_dicts = []
for lists, needs in zip(dicts, cuantosamples):
ary = np.array([x for l in lists for x in l ]) # flatten lists into an array
candidates = [x for x in index if x not in ary] # find out what to be replaced with
values, counts = np.unique(ary, return_counts=True) # find out what to replace
for v, c in zip(values, counts - 1):
if c:
ary[ary==v] = np.concatenate([[v], np.random.choice(candidates, c, replace=False)]) #replace
new_dicts.append(list(map(list, np.split(ary, np.cumsum(needs)[:-1]))))
new_dicts
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Raymond Kwok |
