'How to prevent duplicate dictionaries from appending to list continuously (python)
I am currently using Python ver 3.9.7
So I have a lot of serial port data continuously incoming. The data I have is incoming as dictionaries appending to a list. Each element is a dictionary, as the first part is an integer, and the rest of a string. I then have subcategorised each string, and am saving it to an excel spreadsheet.
I need to prevent duplicates appending to my list. Below is my code trying to do this, however when I view the Excel log being created, I am often seeing sometimes 50k rows of the same data repeatedly.
I was able to successfully prevent duplicate excel rows with reading from a text file with my approaches, but can't seem to find a solution for the continuous incoming data.
The output I would like is unique values only for each element appending to my list, and appearing on my excel file.
Code below:
import serial
import xlsxwriter
#ser-prt-connection
ser = serial.Serial(port='COM2',baudrate=9600)
#separate X (int) and Y (string)
regex = '(?:.*\)?X=(?P<X>-?\d+)\sY=(?P<Y>.*)'
extracted_vals = [] #list to be appended
less_vals = [] #want list with no duplicates
row = 0
workbook = xlsxwriter.Workbook('Serial_port_data.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write(row, 0, 'X')
worksheet.write(row, 1, 'Ya')
worksheet.write(row, 2, 'Yb')
worksheet.write(row, 3, 'Yc')
while True:
for line in ser.readlines():
signal = str(line)
for part in parts:
m = re.match(regex,parts)
if m is None:
continue
X,Y = m.groupdict(),values()
# each element appending to list (incl. duplicates)
item = dict(X=int(X),Y=str(Y.lower())
extracted_vals.append(item)
for i in range(0, len(extracted_vals):
i+=0
X_val =extracted_vals[i].setdefault('X')
Key = extracted_vals[i].keys()
data = extracted_vals[i].setdefault('Y')
for val in extracted_vals:
if val not in less_vals:
less_vals.append(val)
for j in range(0,len(less_vals)):
j+=0
X_val = less_vals[j].setdefault('X')
less_data = less_vals[j].setdefault('Y')
#separate Y part into substrings
Ya = less_data[0:3]
Yb = less_data[3:6]
Yc = less_data[6:]
less = X_val,Ya,Yb,Yc
#to check for no duplicates in ouput, compared with raw data
print(signal) #raw incoming data continuously
print(less) #want list, no duplicates
# write to excel file, column for X value
# 3 more columns for Ya, Yb, Yc each
if True:
row = row+1
worksheet.write(row,0,X_val)
worksheet.write(row,1,Ya)
worksheet.write(row,2,Yb)
worksheet.write(row,3,Yc)
#Chosen no. rows to write to file
if row == 10:
workbook.close()
serial.close()
Example of what a line of raw data 'signal' looks like:
X=-10Y=yyAyyBthisisYc
Example of what list 'less' is looking like for one line of raw data:
(-10, 'yyA','yyB', 'thisisYc')
#(repeats in simalar fashion for subsequent lines)
#each part has its own row in excel file
#-10 is X value
My main issue is that sometimes the data being printed is unique, but the excel file has many duplicates.
My other issue is that sometimes the data is printed as every second duplicate: like 1,2,1,2,1,2,1,2 and the same is being saved to the Excel file.
I have only been programming a few weeks now, so any advice at all in general is welcome
Solution 1:[1]
Your program has a real lot of problems. I just tried to show you how you can improve it. Bear in mind that I did not test it.
import serial
import xlsxwriter
#ser-prt-connection
ser = serial.Serial(port='COM2',baudrate=9600)
#separate X (int) and Y (string)
regex = '(?:.*\)?X=(?P<X>-?\d+)\sY=(?P<Y>.*)'
# First problem:
# maybe you don't need these lists. Perhaps, you could use a set and a list. Let's do it:
# extracted_vals = [] #list to be appended
extracted_vals = set()
less_vals = [] #want list with no duplicates
row = 0 # this will be used at the end of the program.
MAXROWS = 10 # Given that you want to limit the number of rows, I'd write that limit where it can be seen and changed easily.
#
workbook = xlsxwriter.Workbook('Serial_port_data.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write(row, 0, 'X')
worksheet.write(row, 1, 'Ya')
worksheet.write(row, 2, 'Yb')
worksheet.write(row, 3, 'Yc')
while True: # Second problem: this loop lacks a way to terminate!
# When you write 'while True:', you also MUST write 'break' or 'return' somewhere inside the loop! I'll take care of it.
for line in ser.readlines():
signal = str(line)
#
# Third problem: where is this 'parts' defined? The following line will throw an error! Maybe you meant "for part in signal"?
for part in signal: # was 'parts'
m = re.match(regex,signal) # was 'parts'
if m is None:
continue
X,Y = m.groupdict(),values()
#
# each element appending to list (incl. duplicates)
item = {(int(X), Y.lower())} # you forgot the closing parens, this is the Fourth problem. But this also counts as Fifth:
# you cannot create a dictionary as "dict(X=int(X),Y=str(Y.lower())"! It's a syntax error.
#
# Now that extracted_vals is a set, you can write:
extracted_vals.add(item)
# instead of 'extracted_vals.append(item)'
# At this point, you have no duplicates: sets automatically ignore them!
less_vals = list(extracted_vals) # And now you have a non-duplicated list.
# Sixth problem: all the following lines were included in the "for line in ser.readlines():" loop, you forgot to de-dent them.
# I've done it for you. Written the way you wrote them, for each line read from serial you also repeated all of the following!
# That might be the main reason for creating many duplicate rows.
for i in range(0, len(less_vals) ): # I added the last closing parens, this was the Fourth problem again
# Seventh problem: if you need to process all items in a sequence, you should use
# "for item in sequence:" not "for i in range(len(sequence)):" Moreover, the
# range() function does not need a starting value, if you need it to start from 0.
i+=0 # Eighth problem: this line does nothing. Why did you write it?
X_val = less_vals[i].setdefault('X') # this will put an item {'X': None} if missing in the dictionary extracted_vals[i]
# and assigns the value of less_vals[i]["X"] (possibly None) to X_val.
#Are you sure that you need an {'X': None} item?
# You could have written: X_val = less_vals[i]["X"] or, much better:
# for elem in less_vals:
# X_val, data = elem["X"], elem["Y"]
Key = less_vals[i].keys() # this assigns to 'Key' a dictionary view object, that you will never use. Ninth problem.
data = less_vals[i].setdefault('Y') # this will put an item {'Y': None} if missing in the dictionary less_vals[i]
# and assigns the value of less_vals[i]["Y"] (possibly None) to data.
#Are you sure that you need an {'Y': None} item?
# this loop is completely useless.
# for val in less_vals: # Sixth problem, again: this loop was included in the "for i in range(0, len(less_vals) ):" loop. I de-dented it.
# if val not in less_vals:
# less_vals.append(val)
# I noticed that I could put all following lines in the previous loop, without re-reading all less_vals.
#
# for j in range(0,len(less_vals)): # Sixth problem, again: this loop too was included in the "for i in range(0, len(less_vals) ):" loop. I de-dented it.
# j+=0 # Seventh problem, again: this line does nothing. Why did you write it?
#
# so, now we are continuing the loop on less_vals:
# X_val = less_vals[j].setdefault('X') we already have it
# less_data = less_vals[j].setdefault('Y') instead of 'less_data', we use 'data' that we already have
#separate Y part into substrings
# Remember that data can be None? Tenth problem, you should have dealt with this. I'll take care of it:
if data is None:
Ya = Yb = Yc = ""
else:
Ya = data[0:3]
Yb = data[3:6]
Yc = data[6:]
less = X_val,Ya,Yb,Yc
#to check for no duplicates in ouput, compared with raw data
print(signal) # it may contain only the last line read from serial, if the raw data contained more than 1 line
print(less) # I don't think this check is useful, but if you like it...
# write to excel file, column for X value
# 3 more columns for Ya, Yb, Yc each
# if True: Eleventh problem: this line does nothing
row = row+1
worksheet.write(row,0,X_val)
worksheet.write(row,1,Ya)
worksheet.write(row,2,Yb)
worksheet.write(row,3,Yc)
#
if row >= MAXROWS:
break # this exits the for line in ser.readlines(): loop
if row >= MAXROWS:
workbook.close()
break # this exits the while True: loop, solving the Second problem.
serial.close()
This is a refined version of your program; I tried to follow my own advices. I have no serial communication, (and I have no time to mock one) so I didn't test it properly.
# A bit refined, albeit untested.
import serial
import xlsxwriter
ser = serial.Serial(port='COM2',baudrate=9600)
#separate X (int) and Y (string)
regex = '(?:.*\)?X=(?P<X>-?\d+)\sY=(?P<Y>.*)'
MAXROWS = 50
row = 0 # this will be used at the end of the program.
#
workbook = xlsxwriter.Workbook('Serial_port_data.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write(row, 0, 'X')
worksheet.write(row, 1, 'Ya')
worksheet.write(row, 2, 'Yb')
worksheet.write(row, 3, 'Yc')
while True:
extracted_vals = set()
for line in ser.readlines():
m = re.match(regex,line)
if m is None:
continue
values = m.groupdict()
extracted_vals.add((int(values['X']), values['Y'].lower())) # each element in extracted_vals will be a tuple (integer, string)
#
for this in extracted_vals:
X_val = this[0]
data = this[1]
Ya = data[0:3]
Yb = data[3:6]
Yc = data[6:]
row = row+1
worksheet.write(row,0,X_val)
worksheet.write(row,1,Ya)
worksheet.write(row,2,Yb)
worksheet.write(row,3,Yc)
#
if row >= MAXROWS:
break
if row >= MAXROWS:
workbook.close()
break
serial.close()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
