'ignoring data using ttp module in python

I am going to explain the problem I faced with the following sample. I am able to parse the following data with the following config. When I used the {{ignore}} command, it helps me to get the line as the line matches the correct template, and ignore the data that I don't want to have.

from ttp import ttp
import json

data_to_parse = """
1.peace in the world
2.peace in the world world 
3.peace in the world world world 
"""

To parse this data I can use the following template.

ttp_template = """
<group name="Quote">
{{peace}} in the {{world}}
</group>
<group name="Quote">
{{peace}} in the {{world}} {{ignore}}
</group>
<group name="Quote">
{{peace}} in the {{world}} {{ignore}} {{ignore}}
</group>
"""

With the following config, I can have the parsed data as I wish:

def parser(data_to_parse):

    parser = ttp(data=data_to_parse, template=ttp_template)
    parser.parse()

    # print result in JSON format
    results = parser.result(format='json')[0]
    #print(results)

    #converting str to json. 
    result = json.loads(results)

    print(result)

parser(data_to_parse)

See the output I have:

enter image description here

The problem is that I can not guess how many "world" at the of the each line, and I don't want to keep writing {{ignore}} commands to get the required line and avoid the word that I don't want to have. For example, if I add the following line in my data, it will not be catched with the template I shared above, I will need to add one more {{ignore}} to capture following data.

4.peace in the world world world world

What I have understood that the reason for this the ttp seperates the words from each space. For example, incase I have _ instead of 'space' as following 3.peace in the world_world_world I can get the data with a simple line in my template. However, in my data, I have lines with spaces that I need to be aware of and capture these lines as well.

So the question is that is there any way to facilitate this process? As you see that I have a workaround, however I need to find out a simple way to resolve the issue. Highly appreciate for any advise.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source