'awk split on first occurrence of character
Trying to use awk to split each line. If there is more the one p or q the second split on the ( does not work correctly (line 2 is an example. I am not able to ignore the second if there is more then one occurrence. I tried ^pq but that did not produce the desired. Thank you :).
file
1p11.2(120785011_120793480)x3
1q12q21.1(143192432_143450240)x1~2
awk
awk '{split($0,a,"[pq(_]"); print "id"a[1],a[3]}' file
current
id1 120785011
id1 21.1
desired
id1 120785011
id1 143192432
Solution 1:[1]
another awk
$ awk -F'[(_]' '{split($0,a,"[pq]"); print "id"a[1],$2}' file
id1 120785011
id1 143192432
since you don't control the number of pqs in the line, use two different splits, one for the field delimiter to find the value, the second for the id.
Solution 2:[2]
the split function returns the number of fields, so we can take advantage of that:
{
n = split($0, a, /[pq(_]/)
printf "id%s %s\n", a[1], a[n-1]
}
outputs
id1 120785011
id1 143192432
Solution 3:[3]
Here is something you can do using FS regex itself and keeping awk simple:
awk -F '[(_]|[pq]([^pq]*[pq])*' '{print "id" $1, $3}' file
id1 120785011
id1 143192432
FS regex details
'[(_]: Match(or_|: OR[pq]([^pq]*[pq])*: Matchporqfollowed by 0 or more non-pq characters followed byporq
Solution 4:[4]
I'd use sed for this since it's simple substitutions on a single line which is what sed is best for:
$ sed 's/\([^pq]*\)[^(]*(\([^_]*\).*/id\1 \2/' file
id1 120785011
id1 143192432
Solution 5:[5]
UPDATE 1 : realized I could make it even more succinct :
mawk 'sub("^","id")<--NF' FS='[pq][^(]+[(]|[_].+$'
It works even when there are empty rows embedded in the input because sub() went first, so NF won't get decremented into negative zone and triggering an error message.
=============================================================
An awk-based solution without requiring:
further, and redundant,
array-splitting, ora back-reference-capable
regexengine:
input :
1p11.2(120785011_120793480)x3
1q12q21.1(143192432_143450240)x1~2
command ::
mawk 'sub("^","id",$!(NF*=2<NF))' FS='[pq][^(]+[(]|[_].+$'
output :
id1 120785011
id1 143192432
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | karakfa |
| Solution 2 | glenn jackman |
| Solution 3 | anubhava |
| Solution 4 | Ed Morton |
| Solution 5 |
