'Python programming error re: reading from files
I'm taking an online class and we were assigned the following task:
"Write a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form: X-DSPAM-Confidence: 0.8475 Count these lines and extract the floating point values from each of the lines and compute the average of those values and produce an output as shown below. You can download the sample data at http://www.pythonlearn.com/code/mbox-short.txt when you are testing below enter mbox-short.txt as the file name."
The desired output is: "Average spam confidence: 0.750718518519"
Here is the code I've written:
fname = raw_input("Enter file name: ")
fh = open(fname)
inp = fh.read()
for line in inp:
if not line.strip().startswith("X-DSPAM-Confidence: 0.8475") : continue
pos = line.find(':')
num = float(line[pos+1:])
total = float(num)
count = float(total + 1)
print 'Average spam confidence: ', float( total / count )
The output I get is: "Average spam confidence: nan"
What am I missing?
Solution 1:[1]
values = []
#fname = raw_input("Enter file name: ")
fname = "mbox-short.txt"
with open(fname, 'r') as fh:
for line in fh.read().split('\n'): #creating a list of lines
if line.startswith('X-DSPAM-Confidence:'):
values.append(line.replace('X-DSPAM-Confidence: ', '')) # I don't know whats after the float value
values = [float(i) for i in values] # need to convert the string to floats
print 'Average spam confidence: %f' % float( sum(values) / len(values))
I just tested this against the sample data it works just fine
Solution 2:[2]
#try the code below, it is working.
fname = raw_input("Enter file name: ")
count=0
value = 0
sum=0
fh = open(fname)
for line in fh:
if not line.startswith("X-DSPAM-Confidence:") : continue
pos = line.find(':')
num = float(line[pos+1:])
sum=sum+num
count = count+1
print "Average spam confidence:", sum/count
Solution 3:[3]
My guess from the question is that the actual 0.8475 is actually just an example, and you should be finding all the X-DSPAM-Confidence: lines and reading those numbers.
Also, the indenting on the code you added has all the calcuations outside the for loop, I'm hoping that is just a formatting error for the upload, otherwise that would also be a problem.
As a matter if simplification you can also skip the
inp = fh.read()
line and just do
for line in fh:
Another thing to look at is that total will always only be the last number you read.
Solution 4:[4]
# Use the file name mbox-short.txt as the file name
fname = raw_input("Enter file name: ")
fh = open(fname)
count = 0
total = 0
for line in fh:
if not line.startswith("X-DSPAM-Confidence:") : continue
count = count + 1
# print count
num = float(line[20:])
total +=num
# print total
average = total/count
print "Average spam confidence:", average
Solution 5:[5]
The way you're checking if it is the correct field is too specific. You need to look for the field title without a value (see code below). Also your counting and totaling needs to happen within the loop. Here is a simpler solution that makes use of python's built in functions. Using a list like this takes a little bit more space but makes the code easier to read in my opinion.
How about this? :D
with open(raw_input("Enter file name: ")) as f:
values = [float(line.split(":")[1]) for line in f.readlines() if line.strip().startswith("X-DSPAM-Confidence")]
print 'Average spam confidence: %f' % (sum(values)/len(values))
My output:
Average spam confidence: 0.750719
If you need more precision on that float: Convert floating point number to certain precision, then copy to String
Edit: Since you're new to python that may be a little too pythonic :P Here is the same code expanded out a little bit:
fname = raw_input("Enter file name: ")
values = []
with open(fname) as f:
for line in f.readlines():
if line.strip().startswith("X-DSPAM-Confidence"):
values.append(float(line.split(":")[1]))
print 'Average spam confidence: %f' % (sum(values)/len(values))
Solution 6:[6]
fname = raw_input("Enter file name: ")
fh = open(fname)
x_count = 0
total_count = 0
for line in fh:
if not line.startswith("X-DSPAM-Confidence:") : continue
line = line.strip()
x_count = x_count + 1
num = float(line[21:])
total_count = num + total_count
aver = total_count / x_count
print "average spam confidence:", aver
Solution 7:[7]
user_data = raw_input("Enter the file name: ")
lines_list = [line.strip("\n") for line in open(user_data, 'r')]
def find_spam_confidence(data):
confidence_sum = 0
confidence_count = 0
for line in lines_list:
if line.find("X-DSPAM-Confidence") == -1:
pass
else:
confidence_index = line.find(" ") + 1
confidence = float(line[confidence_index:])
confidence_sum += confidence
confidence_count += 1
print "Average spam confidence:", str(confidence_sum / confidence_count)
find_spam_confidence(lines_list)
Solution 8:[8]
fname = raw_input("Enter file name: ")
fh = open(fname)
c = 0
t = 0
for line in fh:
if line.startswith("X-DSPAM-Confidence:") :
c = c + 1
p = line.find(':')
n = float(line[p+1:])
t = t + n
print "Average spam confidence:", t/c
Solution 9:[9]
fname = input("Enter file name: ")
fh = open(fname)
count = 0
add = 0
for line in fh:
if line.startswith("X-DSPAM-Confidence:"):
count = count+1
pos = float(line[20:])
add = add+pos
print("Average spam confidence:", sum/count)
Solution 10:[10]
fname = input('Enter the file name : ') # file name is mbox-short.txt
try:
fopen = open(fname,'r') # open the file to read through it
except:
print('Wrong file name') #if user input wrong file name display 'Wrong file name'
quit()
count = 0 # variable for number of 'X-DSPAM-Confidence:' lines
total = 0 # variable for the sum of the floating numbers
for line in fopen: # start the loop to go through file line by line
if line.startswith('X-DSPAM-Confidence:'): # check whether a line starts with 'X-DSPAM-Confidence:'
count = count + 1 # counting total no of lines starts with 'X-DSPAM-Confidence:'
strip = line.strip() # remove whitespace between selected lines
nline = strip.find(':') #find out where is ':' in selected line
wstring = strip[nline+2:] # extract the string decimal value
fstring = float(wstring) # convert decimal value to float
total = total + fstring # add the whole float values and put sum in to variable named 'total'
print('Average spam confidence:',total/count) # printout the average value
Solution 11:[11]
total = float(num)
You forgot here to sum the num floats. It should have been
total = total+num
Solution 12:[12]
fname = input("Enter file name: ")
fh = open(fname)
count=0
avg=0
cal=0
for line in fh:
if not line.startswith("X-DSPAM-Confidence:") :
continue
else:
count=count+1
pos = line.find(':')
num=float(line[pos+1:])
cal=float(cal+num)
#print cal,count
avg=float(cal/count)
print ("Average spam confidence:",avg)
Solution 13:[13]
IT WORKS JUST FINE !!!
Use the file name mbox-short.txt as the file name
fname = raw_input("Enter file name: ")
if len(fname) == 0:
fname = 'mbox-short.txt'
fh = open(fname)
count = 0
tot = 0
ans = 0
for line in fh:
if not line.startswith("X-DSPAM-Confidence:") : continue
count = count + 1
num = float(line[21:])
tot = num + tot
ans = tot / count
print("Average spam confidence:", ans)
Solution 14:[14]
# Use the file name mbox-short.txt as the file name
fname = raw_input("Enter file name: ")
fh = open(fname,'r')
count=0
avg=0.0
cal=0.00
for line in fh:
if not line.startswith("X-DSPAM-Confidence:") :
continue
else:
count=count+1
pos = line.find(':')
num=float(line[pos+1:])
cal=cal+num
#print cal,count
avg=float(cal/count)
print "Average spam confidence:",avg
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
