'Deriving pattern for runs of letters

I have a list of string "1. AGGCHRUSHCCKSGDSKCGGHCSG" I would like to get all "G" in my string and print a pattern like this: GG-G-GG-G

and if there are no Gs in my string, it should print "No G found".

I have tried basic string, substring, and print in python, but that's all I got. I can't find an Excel formula for this either. How can I generate this pattern?



Solution 1:[1]

You can use a regular expression to replace sequences of one or more non-"G" characters with a single dash, and then use .strip() to remove any leading or trailing dashes:

import re

data = "1. AGGCHRUSHCCKSGDSKCGGHCSG"
result = re.sub(r"[^G]+", r"-", data).strip("-")
if "G" in result:
    print(result)
else:
    print("No G found")

This outputs:

GG-G-GG-G

Solution 2:[2]

EDIT - With @PranavHosangadi's suggestions:

from itertools import groupby

string = "1. AGGCHRUSHCCKSGDSKCGGHCSG"

groups = ("".join(group) for key, group in groupby(string) if key == "G")

print("-".join(groups))

Output:

GG-G-GG-G
>>> 

Solution 3:[3]

string1= "1. AGGCHRUSHCCKSGDSKCGGHCSG"
string2=""
for char in string1:
    if char=='G':
        string2+=char
    else:
        string2+='-'
print(string2)

like this?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 alexander