'How to make a csv row for each 2 lines in a txt file
I have a text file like this:
Viruses/GCF_000820355.1_ViralMultiSegProj14361_genomic.fna.gz
Sclerophthora macrospora virus A
Viruses/GCF_000820495.2_ViralMultiSegProj14656_genomic.fna.gz
Influenza B virus RNA
Viruses/GCF_000837105.1_ViralMultiSegProj14079_genomic.fna.gz
Tomato mottle virus
And I need to get a csv file like this:
Viruses/GCF_000820355.1_ViralMultiSegProj14361_genomic.fna.gz,Sclerophthora macrospora virus A
Viruses/GCF_000820495.2_ViralMultiSegProj14656_genomic.fna.gz,Influenza B virus RNA
Viruses/GCF_000837105.1_ViralMultiSegProj14079_genomic.fna.gz,Tomato mottle virus
Because later I want to use this like a tuple to find the compressed file, read it and get a final file with names like:
Viruses/GCF_000837105.1/Tomato mottle virus.fna
I just need to learn how to do the first part of the problem. It could by with:
- sed
- awk
- R
- Python
Any help would be very appreciated. This is hard for me to accomplish because the original filenames are very messed up.
I have tried this:
sed -z 's/\n/,/g;s/,$/\n/' multi_headers
However it put comma in all \n.
Solution 1:[1]
Bash
You can do a paste (thanks @glenn jackman for pointing out my previous useless use of cat).
# or cat mytext.txt | paste -d "," - -
paste -d "," - - < mytext.txt
Viruses/GCF_000820355.1_ViralMultiSegProj14361_genomic.fna.gz,Sclerophthora macrospora virus A
Viruses/GCF_000820495.2_ViralMultiSegProj14656_genomic.fna.gz,Influenza B virus RNA
Viruses/GCF_000837105.1_ViralMultiSegProj14079_genomic.fna.gz,Tomato mottle virus
R
The R function is also paste, together with sapply:
mytext <- scan("mytext.txt", character(), sep = "\n")
sapply(seq(1, length(mytext), 2), function(x) paste(mytext[x], mytext[x + 1], sep = ","))
[1] "Viruses/GCF_000820355.1_ViralMultiSegProj14361_genomic.fna.gz,Sclerophthora macrospora virus A"
[2] "Viruses/GCF_000820495.2_ViralMultiSegProj14656_genomic.fna.gz,Influenza B virus RNA"
[3] "Viruses/GCF_000837105.1_ViralMultiSegProj14079_genomic.fna.gz,Tomato mottle virus"
Solution 2:[2]
Using sed
$ sed '/^Viruses/{N;s/\n\(.*\)/,\1/}' multi_headers
Viruses/GCF_000820355.1_ViralMultiSegProj14361_genomic.fna.gz,Sclerophthora macrospora virus A
Viruses/GCF_000820495.2_ViralMultiSegProj14656_genomic.fna.gz,Influenza B virus RNA
Viruses/GCF_000837105.1_ViralMultiSegProj14079_genomic.fna.gz,Tomato mottle virus
/^Viruses/- Match lines starting with the stringViruses{N;- Read/append the next line of input into the pattern space.s/\n\(.*\)/,\1/- Remove the \n from the pattern space and replace it with a comma,
Solution 3:[3]
This might work for you (GNU sed and paste):
sed 'N;s/\n/,/' file
Append the next line to the current line and replace the newline between then with a comma.
or:
paste -sd',\n' file
Paste the file as one long string, replacing every other newline with a comma.
Solution 4:[4]
A simple writerows():
import csv
with open("text.txt", "r") as f:
with open("data.csv", "w", newline="") as w:
writer = csv.writer(w)
# May want to add headers
writer.writerow(["Heading1", "Heading2"])
x = f.readlines()
writer.writerows([x[i:i+2] for i in range(0, len(x), 2)])
Which yields:
Heading1,Heading2
Viruses/GCF_000820355.1_ViralMultiSegProj14361_genomic.fna.gz,Sclerophthora macrospora virus A
Viruses/GCF_000820495.2_ViralMultiSegProj14656_genomic.fna.gz,Influenza B virus RNA
Viruses/GCF_000837105.1_ViralMultiSegProj14079_genomic.fna.gz,Tomato mottle virus
Solution 5:[5]
What about this.
with open('test.txt') as f:
data = f.read().split('\n')
new_data = []
for a in range(0,len(data),2):
new_data.append(data[a]+','+data[a+1]+'\n')
with open('result.txt','w') as f:
f.writelines(new_data)
or
with open('test.txt') as f_read, open('result.txt','w') as f_write:
data = f_read.read().split('\n')
new_data = []
for a in range(0,len(data),2):
new_data.append(data[a]+','+data[a+1]+'\n')
f_write.writelines(new_data)
Solution 6:[6]
Another R approach, relying on vector recycling.
t = readLines("txt.txt")
paste0(t[c(T, F)], ",", t[c(F, T)]) |> writeLines("txt.csv")
or for final file names
t = readLines("R/txt.txt")
sub("(?<=\\.\\d).*", "", t, perl = T) |>
(\(.) paste0(.[c(T, F)], "/", .[c(F, T)], ".fna"))()
#> [1] "Viruses/GCF_000820355.1/Sclerophthora macrospora virus A.fna"
#> [2] "Viruses/GCF_000820495.2/Influenza B virus RNA.fna"
#> [3] "Viruses/GCF_000837105.1/Tomato mottle virus.fna"
Solution 7:[7]
Simple python3 solution, let file.txt content be
Viruses/GCF_000820355.1_ViralMultiSegProj14361_genomic.fna.gz
Sclerophthora macrospora virus A
Viruses/GCF_000820495.2_ViralMultiSegProj14656_genomic.fna.gz
Influenza B virus RNA
Viruses/GCF_000837105.1_ViralMultiSegProj14079_genomic.fna.gz
Tomato mottle virus
and script.py
with open("file.txt","r") as f:
for inx, line in enumerate(f):
print(line.rstrip(), end='\n' if inx%2 else ',')
then
python script.py
output
Viruses/GCF_000820355.1_ViralMultiSegProj14361_genomic.fna.gz,Sclerophthora macrospora virus A
Viruses/GCF_000820495.2_ViralMultiSegProj14656_genomic.fna.gz,Influenza B virus RNA
Viruses/GCF_000837105.1_ViralMultiSegProj14079_genomic.fna.gz,Tomato mottle virus
Explanation: I use .rstrip to jettison trailing newline, then depending on whatever line is odd or even I apply \n or , respectively as line end. Note that enumerate default is starting at 0 as opposed to GNU AWK starting at 1. Note that using for ... in filehandle does prevent loading whole file as once, so this solution could be used also for files bigger than available RAM space.
Solution 8:[8]
To add yet another solution into the mix, you can also use xargs and group input lines by 2, then replace first space with ',' in each output line.
xargs -n2 -d'\n' -a input.txt | sed 's/ /,/'
Solution 9:[9]
Make sure that when you call changeTitle the variable form.name is not undefined.
If you are using react hooks and the variable form is a state or prop you can use a useEffect and change the state of title only when the variable form.name is defined
useEffect(() => {
if(form.name){
changeTitle(form.name);
}
},[form.name]);
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | |
| Solution 3 | potong |
| Solution 4 | |
| Solution 5 | Sharim Iqbal |
| Solution 6 | |
| Solution 7 | |
| Solution 8 | Michail Alexakis |
| Solution 9 |
