'Is there a more elegant way to read a Textfile containing mpz values into a list of integers?

I have a Textfile containing numbers that looks as follows:

[mpz(0), mpz(0), mpz(0), mpz(0), mpz(4), mpz(54357303843626),...]

Does there exist a simple way to parse it directly into an integer list? It doesn't matter whether the target data type is a mpz integer or a plain python integer.

What I tried so far and works is pure parsing (note: the target array y_val3 needs to be initialized with zeros in advance, since it may be larger than the list in the Textfile):

text_file = open("../prod_sum_copy.txt", "r")
content = text_file.read()[1:-1]
text_file.close()
content_list = content.split(",")
y_val3 = [0]*10000
print(content_list)
for idx, str in enumerate(content_list):
    m = re.search('mpz\(([0-9]+)\)', str)
    y_val3[idx]=int(m.group(1))
print(y_val3)

Althought this approach works, I am not sure if this is a best practice or wether there exist a more elegant way than just plain parsing.

To facilitate things: Here is the original Textfile on GitHub. Note: This Textfile might grow in furure, which brings aspects such as performance and scalability into play.



Solution 1:[1]

There is one clever trick how to convert back data from printed by Python format to original objects. Just do obj = eval(string), full example below.

You can use this eval solution for almost any python object, even complex that was printed to file through print(python_object) or similar. Basically enything that is a valid python code can be converted from string by eval().

eval() allows not to use any string processing/parsing functions at all, no regular expressions or whatever.

Beware that eval() doesn't check what string it runs, so string inside can have malicious code if it came from unknown source, this code can do anything to your PC, so do eval() only with trusted strings of code.

Code below used text string with example file content. I used string, not file as an example, so that my code is fully runnable by StackOverflow visitors, without dependencies. In case of read-only opened file f you just replace for line in text.split('\n'): with for line in f: and that's it, code works.

Try it online!

from gmpy2 import mpz

text = '''
[mpz(12), mpz(34), mpz(56)]
[mpz(78), mpz(90), mpz(21)]
'''

nums = []
for line in text.split('\n'):
    if not line.strip():
        continue
    nums.append(eval(line))

print(nums)

Output:

[[mpz(12), mpz(34), mpz(56)], [mpz(78), mpz(90), mpz(21)]]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Arty