'Python programming error re: reading from files

I'm taking an online class and we were assigned the following task:

"Write a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form: X-DSPAM-Confidence: 0.8475 Count these lines and extract the floating point values from each of the lines and compute the average of those values and produce an output as shown below. You can download the sample data at http://www.pythonlearn.com/code/mbox-short.txt when you are testing below enter mbox-short.txt as the file name."

The desired output is: "Average spam confidence: 0.750718518519"

Here is the code I've written:

fname = raw_input("Enter file name: ")
fh = open(fname)
inp = fh.read()
for line in inp:
    if not line.strip().startswith("X-DSPAM-Confidence: 0.8475") : continue
pos = line.find(':')
num = float(line[pos+1:]) 
total = float(num)
count = float(total + 1)
print 'Average spam confidence: ', float( total / count )

The output I get is: "Average spam confidence: nan"

What am I missing?



Solution 1:[1]

values = []
#fname = raw_input("Enter file name: ")
fname = "mbox-short.txt"
with open(fname, 'r') as fh:
    for line in fh.read().split('\n'): #creating a list of lines
        if line.startswith('X-DSPAM-Confidence:'):
            values.append(line.replace('X-DSPAM-Confidence: ', '')) # I don't know whats after the float value

values = [float(i) for i in values] # need to convert the string to floats
print 'Average spam confidence: %f' % float( sum(values) / len(values))

I just tested this against the sample data it works just fine

Solution 2:[2]

#try the code below, it is working.
fname = raw_input("Enter file name: ")
count=0
value = 0
sum=0
fh = open(fname)
for line in fh:
    if not line.startswith("X-DSPAM-Confidence:") : continue
    pos = line.find(':')
    num = float(line[pos+1:])
    sum=sum+num
    count = count+1    
print "Average spam confidence:", sum/count

Solution 3:[3]

My guess from the question is that the actual 0.8475 is actually just an example, and you should be finding all the X-DSPAM-Confidence: lines and reading those numbers.

Also, the indenting on the code you added has all the calcuations outside the for loop, I'm hoping that is just a formatting error for the upload, otherwise that would also be a problem.

As a matter if simplification you can also skip the

inp = fh.read()

line and just do

for line in fh:

Another thing to look at is that total will always only be the last number you read.

Solution 4:[4]

# Use the file name mbox-short.txt as the file name
fname = raw_input("Enter file name: ")
fh = open(fname)
count = 0
total = 0
for line in fh:
    if not line.startswith("X-DSPAM-Confidence:") :     continue
    count = count + 1
   # print count
    num = float(line[20:])
    total +=num
   # print total
    average = total/count
print "Average spam confidence:", average

Solution 5:[5]

The way you're checking if it is the correct field is too specific. You need to look for the field title without a value (see code below). Also your counting and totaling needs to happen within the loop. Here is a simpler solution that makes use of python's built in functions. Using a list like this takes a little bit more space but makes the code easier to read in my opinion.

How about this? :D

with open(raw_input("Enter file name: ")) as f:
    values = [float(line.split(":")[1]) for line in f.readlines() if line.strip().startswith("X-DSPAM-Confidence")]
    print 'Average spam confidence: %f' % (sum(values)/len(values))

My output:

Average spam confidence: 0.750719

If you need more precision on that float: Convert floating point number to certain precision, then copy to String

Edit: Since you're new to python that may be a little too pythonic :P Here is the same code expanded out a little bit:

fname = raw_input("Enter file name: ")
values = []
with open(fname) as f:
    for line in f.readlines():
        if line.strip().startswith("X-DSPAM-Confidence"):
            values.append(float(line.split(":")[1]))

print 'Average spam confidence: %f' % (sum(values)/len(values))

Solution 6:[6]

fname = raw_input("Enter file name: ")
fh = open(fname)
x_count = 0
total_count = 0
for line in fh:
    if not line.startswith("X-DSPAM-Confidence:") : continue
    line = line.strip()
    x_count = x_count + 1
    num = float(line[21:])
    total_count = num + total_count
aver = total_count / x_count

print "average spam confidence:", aver

Solution 7:[7]

user_data = raw_input("Enter the file name: ")
lines_list = [line.strip("\n") for line in open(user_data, 'r')]


def find_spam_confidence(data):
    confidence_sum = 0
    confidence_count = 0
    for line in lines_list:
        if line.find("X-DSPAM-Confidence") == -1:
            pass
        else:
            confidence_index = line.find(" ") + 1
            confidence = float(line[confidence_index:])
            confidence_sum += confidence
            confidence_count += 1
    print "Average spam confidence:", str(confidence_sum / confidence_count)

find_spam_confidence(lines_list)

Solution 8:[8]

fname = raw_input("Enter file name: ")
fh = open(fname)
c = 0
t = 0
for line in fh:
    if line.startswith("X-DSPAM-Confidence:") : 
        c = c + 1
        p = line.find(':')
        n = float(line[p+1:])
        t = t + n

print "Average spam confidence:", t/c

Solution 9:[9]

    fname = input("Enter file name: ")
    fh = open(fname)
    count = 0
    add = 0
    for line in fh:
        if line.startswith("X-DSPAM-Confidence:"):
        count = count+1
        pos = float(line[20:])
        add = add+pos
    print("Average spam confidence:", sum/count)

Solution 10:[10]

fname = input('Enter the file name : ') # file name is mbox-short.txt
try:
    fopen = open(fname,'r') # open the file to read through it
except:
    print('Wrong file name') #if user input wrong file name display 'Wrong file name'
    quit()
count = 0  # variable for number of 'X-DSPAM-Confidence:' lines
total = 0  # variable for the sum of the floating numbers

for line in fopen: # start the loop to go through file line by line
    if line.startswith('X-DSPAM-Confidence:'): # check whether a line starts with 'X-DSPAM-Confidence:'
        count = count + 1 # counting total no of lines starts with 'X-DSPAM-Confidence:'
        strip = line.strip() # remove whitespace between selected lines
        nline = strip.find(':') #find out where is ':' in selected line
        wstring = strip[nline+2:] # extract the string decimal value
        fstring = float(wstring) # convert decimal value to float
        total = total + fstring  # add the whole float values and put sum in to variable named 'total'
print('Average spam confidence:',total/count) # printout the average value

Solution 11:[11]

total = float(num)

You forgot here to sum the num floats. It should have been

total = total+num 

Solution 12:[12]

fname = input("Enter file name: ")
fh = open(fname)
count=0
avg=0
cal=0
for line in fh:
    if not line.startswith("X-DSPAM-Confidence:") :
        continue
    else:
        count=count+1
        pos = line.find(':')
        num=float(line[pos+1:])
        cal=float(cal+num)
        #print cal,count
avg=float(cal/count)
print ("Average spam confidence:",avg)

Solution 13:[13]

IT WORKS JUST FINE !!!

Use the file name mbox-short.txt as the file name

fname = raw_input("Enter file name: ")

if len(fname) == 0:
    fname = 'mbox-short.txt'

fh = open(fname)
count = 0
tot = 0
ans = 0

for line in fh:
    if not line.startswith("X-DSPAM-Confidence:") : continue
    count = count + 1
    num = float(line[21:])
    tot = num + tot

ans = tot / count
print("Average spam confidence:", ans)

Solution 14:[14]

# Use the file name mbox-short.txt as the file name
fname = raw_input("Enter file name: ")
fh = open(fname,'r')
count=0
avg=0.0
cal=0.00 
for line in fh:
    if not line.startswith("X-DSPAM-Confidence:") :        
        continue
    else:
        count=count+1
        pos = line.find(':')
        num=float(line[pos+1:])
        cal=cal+num
        #print cal,count
avg=float(cal/count)
print "Average spam confidence:",avg