'Iterate through a list in python and delete characters after the second instance of a character from an element

Sorry, very new to python.

Essentially I have a long list of file names, some in the format NAME_XX123456 and others in the format NAME_XX123456_123456.

I am needing to lose everything from the second underscore and after in each element. The below code only iterates through the first two elements though, and doesn't delete the remainder when it encounters a double underscore, just splits it.

sample_list=['NAME_XX011024', 'NAME_XX011030_1234', 'NAME_XX011070', 'NAME_XX090119_15165']

shortlist=[]
item  = "_"
count = 0
i=0
for i in range(0,len(sample_list)):
        if(item in sample_list[i]):
               count =  count + 1
               if(count == 2):
                     shortlist.append(sample_list[i].rpartition("_"))
                     i+=1
                     
               if (count == 1):
                   shortlist.append(sample_list[i])
                   i+=1
                   
               
        print(shortlist)


Solution 1:[1]

Here is a simple split join approach. We can split each input on underscore, and then join the first two elements together using underscore as the separator.

sample_list = ['NAME_XX011024', 'NAME_XX011030_1234', 'NAME_XX011070', 'NAME_XX090119_15165']
output = ['_'.join(x.split('_')[0:2]) for x in sample_list]
print(output)
# ['NAME_XX011024', 'NAME_XX011030', 'NAME_XX011070', 'NAME_XX090119']

You could also use regular expressions here:

sample_list = ['NAME_XX011024', 'NAME_XX011030_1234', 'NAME_XX011070', 'NAME_XX090119_15165']
output = [re.sub(r'([^_]+_[^_]+)_.*', r'\1', x) for x in sample_list]
print(output)
# ['NAME_XX011024', 'NAME_XX011030', 'NAME_XX011070', 'NAME_XX090119']

Solution 2:[2]

You can simply use split method to split each item in the list using '_' and then join the first two parts of the split. Thus ignoring everything after the second underscore. Try this:

res= []
for item in sample_list:
    item_split = item.split('_')
    res.append('_'.join(item_split[0:2])) # taking only the first two items

print(res) # ['NAME_XX011024', 'NAME_XX011030', 'NAME_XX011070','NAME_XX090119']

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tim Biegeleisen
Solution 2 iR0ckY