'Split txt file by Year and ID and rename each new txt file as "Year_ID.txt"

I have a bunch of txt files (comma separated) and I want to split the file into separate text files by using common group identifiers from Column 1(Year) and Column 3(ID). Also, I would like to save the new filenames as "Column1_Column3.txt".I do not want to keep any header for these files. I have tried many scripts/suggestions from other questions, but nothing seems to work. I am new to python and any suggestions would be very helpful. Thank you very much.

file format:

1.0,9.0,0.0,0.0,5.0,13.2,143.2,993.8529934630001,18.005554199200002,92.5999984741,0.0,0.0,159.882055791 1.0,9.0,0.0,1.0,5.0,13.3,142.8,992.4,19.0,91.5013544438,0.0,0.0,202.645072402 1.0,9.0,0.0,2.0,5.0,13.4,142.5,989.0,21.2,90.4027104135,0.0,0.0,235.39787781 1.0,9.0,0.0,3.0,5.0,13.5,142.2,986.5,22.7,89.3040663832,0.0,0.0,268.74681081200004 1.0,11.0,1.0,1.0,5.0,11.5,175.6,995.6,18.7,18.5200004578,0.0,0.0,680.61138846 1.0,11.0,1.0,5.0,5.0,12.2,174.1,988.9,23.4,18.5200004578,0.0,0.0,645.040646961 1.0,11.0,1.0,6.0,5.0,12.4,173.9,986.5,24.9,18.5200004578,0.0,0.0,654.7981628169999 1.0,9.0,2.0,4.0,5.0,10.7,146.8,986.0,23.2,68.3182237413,0.0,0.0,364.724300756 1.0,9.0,2.0,5.0,5.0,10.8,146.2,982.9,25.0,66.8777792189,0.0,0.0,317.156397048

So my output should be: File1:

1.0,9.0,0.0,0.0,5.0,13.2,143.2,993.8529934630001,18.005554199200002,92.5999984741,0.0,0.0,159.882055791 1.0,9.0,0.0,1.0,5.0,13.3,142.8,992.4,19.0,91.5013544438,0.0,0.0,202.645072402 1.0,9.0,0.0,2.0,5.0,13.4,142.5,989.0,21.2,90.4027104135,0.0,0.0,235.39787781

File2:

1.0,11.0,1.0,1.0,5.0,11.5,175.6,995.6,18.7,18.5200004578,0.0,0.0,680.61138846 1.0,11.0,1.0,5.0,5.0,12.2,174.1,988.9,23.4,18.5200004578,0.0,0.0,645.040646961 1.0,11.0,1.0,6.0,5.0,12.4,173.9,986.5,24.9,18.5200004578,0.0,0.0,654.7981628169999

File3:

1.0,9.0,2.0,4.0,5.0,10.7,146.8,986.0,23.2,68.3182237413,0.0,0.0,364.724300756 1.0,9.0,2.0,5.0,5.0,10.8,146.2,982.9,25.0,66.8777792189,0.0,0.0,317.156397048



Solution 1:[1]

Assumptions:

  1. All entries are uniform
  2. Entries are housed in a 2d list
  3. All entries have at least length 3 (to include both delimiting fields)

Slight concern:

  • In File1, is the second entry supposed to have '2055791 ' in front of it? This would mean that the list entries are not too uniform for what you want. If this is the case then I suggest scrubbing the data before hand or adding to this code so that it could ignore that.
#grab the full list
full_list = []

#grab every value of column 1
col_one_list = [a[0] for a in full_list]

#grab every value of column 3
col_three_list = [b[2] for b in full_list]


#sort by them
for i in col_one_list:
    for j in col_three_list:
        separate_list = []
        for entry in full_list:
            if (entry[0] == i and entry[2] == j):
                separate_list.append(entry)
        with open(str(i) + "_" +str(j)+".txt", "w" ) as file:
            for item in separate_list:
                file.write("%s\n" % item)

this should be sufficient.

Solution 2:[2]

It looks like a visibility issue. R has pretty complicated scoping rules. To ensure you're passing correctly specify the calls as follows:

fun_1(iris, iris$Sepal.Length)
fun_2(iris, iris$Sepal.Length)

Alternatively, just pass var as a string:

library(dplyr)

fun_1 <- function(data, var) {
  data %>% summarise(mean=mean(data[[var]]))
}

fun_2 <- function(data, var) {
  fun_1(data, var)
}

fun_1(iris, 'Sepal.Length')
fun_2(iris, 'Sepal.Length')

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 dperry5910
Solution 2 LSM - DAT_Linux