'using sed to omit specific lines of a dataset
I have a dataset with data separated by commas, here is an example:
id, date of birth, grade, explusion, serious misdemeanor, info
123,2005-01-01,5.36,1,1,
582,1999-05-12,8.51,0,1
9274,2001-25-12,9.65,0,0,pass
21,2006-14-05,0.53,4,1,repeat
The case, is that I need to implement a regular expression using sed to remove all those records from the student dataset that do not have any explusion nor a serious misdemeanor. So the result of executing the command would be the third register of the previous sample.
sed -i "/^*,*,*,0,0$/d" file.csv
Any idea of what's missing?
Solution 1:[1]
You might want to use awk to check Fields 4 and 5, and only return line where they are not 0:
awk -F, '$4 != 0 || $5 != 0' file.csv > output.csv
Or, to get the other lines:
awk -F, '$4 == 0 && $5 == 0' file.csv > output.csv
See the online demo.
You can also use
sed -i '/,0,0$/d' file.csv
With this, you will remove all lines ending with ,0,0.
See the online demo:
#!/bin/bash
s='id, date of birth, grade, explusion, serious misdemeanor
123,2005-01-01,5.36,1,1
582,1999-05-12,8.51,0,1
9274,2001-25-12,9.65,0,0
21,2006-14-05,0.53,4,1'
sed '/,0,0$/d' <<< "$s"
Output:
id, date of birth, grade, explusion, serious misdemeanor
123,2005-01-01,5.36,1,1
582,1999-05-12,8.51,0,1
21,2006-14-05,0.53,4,1
To see the other lines, use a reverse command like
sed -i -n '/,0,0$/p' file.csv
It will print the lines that end with ,0,0.
Solution 2:[2]
You seem to think * means "anything" but it means "repeat the previous regular expression zero or more times, as many as possible". Regular expressions are different from wildcards as used in many shells and search engines, where * often does mean "any string".
The regular expression .* means "any character at all, repeated as many times as possible" but in this case you clearly mean [^,]* which means "any character which isn't a comma, repeated as many times as possible."
However, sed will happily match on a substring, so just
sed -i '/,0,0$/d' file.csv
should work, or equivalently
grep -v ',0,0$' file.csv >temp && mv temp file.csv
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | tripleee |
