'Deleting characters in csv cell

I am trying to delete characters in a .csv file that is before and follows a string of numbers.

 | Description                                     |
 | ----------------------------------------------- |
 | Case Number and Notes: 21-12311 check building  |
 | Case Number and Notes: 10-50 w/c , 21-10411     |
 | Case Number and Notes: 21-10421 , Vehicle Maint |
 | Case Number and Notes: Test                     |
 | Case Number and Notes: 21-9693 , traffic Maint  |

I need to delete all characters in this column that come before and after the 21-#####. Is this possible to do?

Eventually I'd also like to have a couple leading zeros after the dash to make the number the same length, but I'm still working on that.

Results should be

| 21-12311 |
| 21-10411 |
| 21-10421 |
| 21-9693  |


Solution 1:[1]

You can use a regular expression to extract possible case numbers. In the case of your 3rd row, you can take the last possible match per row.

For example:

import csv
import re

re_casenum = re.compile(r'\b(\d+-\d+)\b')

with open('input.csv') as f_input, open('output.csv', 'w', newline='') as f_output:
    csv_input = csv.DictReader(f_input)
    csv_output = csv.DictWriter(f_output, fieldnames=csv_input.fieldnames + ['Case Number'])
    csv_output.writeheader()
    
    for row in csv_input:
        description = row['Description']
        case_numbers = re_casenum.findall(description)
        
        if case_numbers:
            row['Case Number'] = case_numbers[-1]
            csv_output.writerow(row)

This would give you a output.csv containing:

Description,Case Number
Case Number and Notes: 21-12311 check building,21-12311
"Case Number and Notes: 10-50 w/c , 21-10411",21-10411
"Case Number and Notes: 21-10421 , Vehicle Maint",21-10421
"Case Number and Notes: 21-9693 , traffic Maint",21-9693

This would add a new Case Number column to your CSV. Note, from your example it is not clear what delimiter is used for your CSV file, you might need to add suitable delimiter=';' type parameters. The quotes are added as there is a comma in the field.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Martin Evans