'How do I extract certain words with a specific key letters inside a file with python
Sorry, im fairly new to python, never been trained much.
I want to ask how do I extract words with certain key letters inside of a file './models/asm/Draft_km.modelspec' in python for example (these lines can be found inside of the .modelspec file):
m_BSORx_kcat : 10
m_ENTERH_kcat : 10
m_TRPTRS_kcat : 10
m_EX_remnant1_e_kcat : 10
m_SCYSSL_kcat : 10
m_RNMK_kcat : 10
m_TAGtex_kcat : 10
m_URIDK2r_kcat : 10
m_TRPt2rpp_kcat : 10
m_GLUSy_kcat : 10
m_VPAMTr_copy2_kcat : 10
m_EX_galctn__L_e_km : 0.001
m_EX_galt_e_km : 0.001
m_EX_dgmp_e_km : 0.001
m_EX_galur_e_km : 0.001
m_EX_gam_e_km : 0.001
m_EX_gam6p_e_km : 0.001
m_EX_gbbtn_e_km : 0.001
I want to extract these inside a large '.modelspec' file by filtering "_kcat : 10"
and be able to obtain them as m_BSORx_kcat : 10, m_ENTERH_kcat : 10, m_TRPTRS_kcat : 10, m_EX_remnant1_e_kcat : 10, m_SCYSSL_kcat : 10, m_RNMK_kcat : 10, m_TAGtex_kcat : 10, m_URIDK2r_kcat : 10, m_TRPt2rpp_kcat : 10, m_GLUSy_kcat : 10, m_VPAMTr_copy2_kcat : 10
My end goal is to be able to randomly reassign 10% of the value (-1,1) to do a genetic algorithm
Much help is appreciated
Solution 1:[1]
Since you seem to be planning to modify the data, it might be useful to first split the lines into a list and then process each line individually.
with open("./models/asm/Draft_km.modelspec") as f:
# read lines, skipping empty lines and remove trailing whitespace
lines = [line.rstrip() for line in f if line.strip()]
If all you need to do is check for a substring, you can check each line like so:
for line in lines:
if "_kcat : 10" in line:
print(line) # or do whatever you want
If you need to match more complex patterns, regular expressions as in Tim Biegeleisen's answer are the way to go.
Solution 2:[2]
Using re.findall we can try:
# use this to read all lines into a string
with open('./models/asm/Draft_km.modelspec', 'r') as file:
inp = file.read()
# otherwise we can hard code the data you showed in your question here
inp = """m_BSORx_kcat : 10
m_ENTERH_kcat : 10
m_TRPTRS_kcat : 10
m_EX_remnant1_e_kcat : 10
m_SCYSSL_kcat : 10
m_RNMK_kcat : 10
m_TAGtex_kcat : 10
m_URIDK2r_kcat : 10
m_TRPt2rpp_kcat : 10
m_GLUSy_kcat : 10
m_VPAMTr_copy2_kcat : 10
m_EX_galctn__L_e_km : 0.001
m_EX_galt_e_km : 0.001
m_EX_dgmp_e_km : 0.001
m_EX_galur_e_km : 0.001
m_EX_gam_e_km : 0.001
m_EX_gam6p_e_km : 0.001
m_EX_gbbtn_e_km : 0.001"""
matches = re.findall(r'\b\w+_kcat : \d+(?:\.\d+)?', inp)
output = ', '.join(matches)
print(output)
This prints:
m_BSORx_kcat : 10, m_ENTERH_kcat : 10, m_TRPTRS_kcat : 10, m_EX_remnant1_e_kcat : 10, m_SCYSSL_kcat : 10, m_RNMK_kcat : 10, m_TAGtex_kcat : 10, m_URIDK2r_kcat : 10, m_TRPt2rpp_kcat : 10, m_GLUSy_kcat : 10, m_VPAMTr_copy2_kcat : 10
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | fsimonjetz |
| Solution 2 |
