'Reading a file with Fortran formatted small floats, using numpy
I am trying to read a data file written by a Fortran program, in which every once in a while there is a very small float like 0.3299880-104. The error message is:
>np.loadtxt(filename, usecols = (1,))
File "/home/anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 928, in loadtxt
items = [conv(val) for (conv, val) in zip(converters, vals)]
File "/home/anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 659, in floatconv
return float(x)
ValueError: invalid literal for float(): 0.3299880-104
Can I do something to make Numpy able to read this data file anyway?
Solution 1:[1]
As @agentp mentioned in the comments, one approach would be to use the converters= argument to np.genfromtxt to insert the e characters before casting to float:
import numpy as np
# some example strings
strings = "0.3299880-104 0.3299880+104 0.3299880"
# create a "dummy file" (see http://stackoverflow.com/a/11970414/1461210)
try:
from StringIO import StringIO # Python2
f = StringIO(strings)
except ImportError:
from io import BytesIO # Python3
f = BytesIO(strings.encode())
c = lambda s: float(s.decode().replace('+', 'e').replace('-', 'e-'))
data = np.genfromtxt(f, converters=dict(zip(range(3), [c]*3)))
print(repr(data))
# array([ 3.29988000e-105, 3.29988000e+103, 3.29988000e-001])
Solution 2:[2]
The accepted answer is helpful, but does not support negative values (-0.3299880 is converted to e-0.3299880) or 2-digit exponents (0.3299880E+10 is converted to 0.3299880Ee10), which both do not make sense and would result in nan values in the numpy array.
Also, the number of columns in the file to read is hard-coded (it is 3 in this case).
It can be addressed as follows:
import re
import numpy as np
def read_fortran_data_file(file):
# count the columns in the first row of data
number_columns = np.genfromtxt(file, max_rows=1).shape[0]
c = lambda s: float(re.sub(r"(\d)([\+\-])(\d)", r"\1E\2\3", s.decode()))
# actually load the content of our file
data = np.genfromtxt(file,
converters=dict(zip(range(number_columns), [c] * number_columns)),)
Testing
np.genfromtext accepts filenames or arrays of strings as input.
For the demonstration I'll use the latter, but the above function works fine with filenames as input.
strings = [
"0.3299880-104 0.3299880E+10 0.3299880 0.3299880+104 0.3299880E-10 -0.3299880"
]
read_fortran_data_file(strings)
## array([ 3.29988e-105, 3.29988e+009, 3.29988e-001, 3.29988e+103,
## 3.29988e-011, -3.29988e-001])
Note on NaN values:
When using np.genfromtxt, one must be careful with NaN values that would replace numbers that were not read properly, e.g. using the following assertion:
assert np.count_nonzero(np.isnan(data))==0, "data contains nan values"
Solution 3:[3]
Not numpy, but I use the following regex and function:
import re
# convert d/D to e and add it if missing
fortregexp = re.compile(r'([\d.])[dD]?(((?<=[dD])[+-]?|[+-])\d)')
def fortran_float(num):
num = fortregexp.sub(r'\1e\2', num)
return float(num)
text = "0.3299880-104 0.3299880D+10 0.3299880 0.3299880+104 0.3299880E-10 -0.3299880"
nums = [fortran_float(i) for i in text.split()]
print(text)
print(nums)
which gives:
0.3299880-104 0.3299880D+10 0.3299880 0.3299880+104 0.3299880E-10 -0.3299880
[3.29988e-105, 3299880000.0, 0.329988, 3.29988e+103, 3.29988e-11, -0.329988]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ali_m |
| Solution 2 | |
| Solution 3 | Jellby |
