'Search for header and specific letters in lines and print both python
Search for "Cluster" and specific letters in lines st104, pK in (st104H_20170,pKH911_25081).
If the lines below the header have both the initials st104,pK print header and the lines.
input.txt
Cluster 1
0 673aa -st104P_06575
1 673aa -st104H_22488
3 673aa -pKH911_09284
4 673aa -pKP911_09288
Cluster 2
0 690aa -st104H_20170
1 690aa -KH911_25081
2 687aa -NE95031.1
3 685aa -TIG_004920
Cluster 3
0 685aa -st104H_27649
1 690aa -st104P_11877
2 685aa -pKP911_15300
Cluster 4
0 685aa -st104H_27649
1 690aa -st104P_11877
output
Cluster 1
0 673aa -st104P_06575
1 673aa -st104H_22488
3 673aa -pKH911_09284
4 673aa -pKP911_09288
Cluster 3
0 685aa -st104H_27649
1 690aa -st104P_11877
2 685aa -pKP911_15300
Tried:
with open("input.txt") as fh:
result = ""
cluster_content = ""
for line in fh:
if line.startswith("Cluster"):
if all(initial in cluster_content for initial in ('st104', 'pK')):
result += cluster_content
cluster_content = ""
cluster_content += line
Solution 1:[1]
This would filter the st104 and pK clusters
# true if filter_str is only one used
def check_alone(cluster_content, filter_str, cluster_split):
return cluster_content.count(filter_str) == len(cluster_split) - 1
def cluster_filter(cluster_content):
filters_labels = ['st104', 'pK']
cluster_split = cluster_content.split('\n')
if cluster_split[-1] == '': # to remove the last empty string in list
cluster_split = cluster_split[:-1]
if check_alone(cluster_content, 'st104', cluster_split) or check_alone(cluster_content, 'pK', cluster_split):
return
# checking if each of the strings contain any of the filter_labels and making sure that all of the strings in the cluster contain an item from the filter
if all(any(label in item for label in filters_labels) for item in cluster_split[1:]):
print(cluster_content)
with open("input.txt") as fh:
result = ""
cluster_content = ""
for line in fh:
if line.startswith("Cluster"):
cluster_filter(cluster_content)
cluster_content = line
else:
cluster_content += line
cluster_filter(cluster_content)
print(result)
Output:
Cluster 1
0 673aa -st104P_06575
1 673aa -st104H_22488
3 673aa -pKH911_09284
4 673aa -pKP911_09288
Cluster 3
0 685aa -st104H_27649
1 690aa -st104P_11877
2 685aa -pKP911_15300
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
