'Checking for Duplicates twice over in a File - Python

config.yml example,

DBtables:
  CurrentMinuteLoad:
    CSV_File: trend.csv
    Table_Name: currentminuteload

GUI image, GUI

This may not be the cleanest route to take.

I'm making a GUI that creates a config.yml file for another python script I'm working with.

Using pysimplegui, My button isn't functioning the way I'd expect it to. It currently and accurately checks for the Reference name (example here would be CurrentMinuteLoad) and will kick it back if it exists, but will skip the check for the table (so the ELIF statement gets skipped). Adding the table still works, I'm just not getting the double-check that I want. Also, I have to hit the Okay button twice in the GUI for it to work?? A weird quirk that doesn't quite make sense to me.

def add_table():
  window2.read()
  with open ("config.yml","r") as h:
    if values['new_ref']  in h.read():
      sg.popup('Reference name already exists')  
    elif values['new_db']  in h.read():
      sg.popup('Table name already exists')
    else:
      with open("config.yml", "a+") as f:
        f.write("\n  " + values['new_ref'] +":")
        f.write("\n    CSV_File:" + values['new_csv'])
        f.write("\n    Table_Name:" + values['new_db'])
        f.close()
        sg.popup('The reference "' + values['new_ref'] + '" has been included and will add the table "' + values['new_db'] + '" to PG Admin during the next scheduled upload')


Solution 1:[1]

When you use h.read(), you should save the value since it will read it like a stream, and subsequent calls for this method will result in an empty string.

Try editing the code like this:

 with open ("config.yml","r") as h:
    content = h.read()
    if values['new_ref']  in content:
      sg.popup('Reference name already exists')  
    elif values['new_db']  in content:
      sg.popup('Table name already exists')
    else:
        # ...

Solution 2:[2]

You should update the YAML file using a real YAML parser, that will allow you to check on duplicate values, without using in, which will give you false positives when a new value is a substring of an existing value (or key).

In the following I add values twice, and show the resulting YAML. The first time around the check on new_ref and new_db does not find a match although it is a substring of existing values. The second time using the same values there is of course a match on the previously added values.

import sys
import ruamel.yaml
from pathlib import Path

def add_table(filename, values, verbose=False):
    error = False
    yaml = ruamel.yaml.YAML()
    data = yaml.load(filename)
    dbtables = data['DBtables']
    if values['new_ref'] in dbtables:
        print(f'Reference name "{values["new_ref"]}" already exists') # use sg.popup in your code
        error = True
    for k, v in dbtables.items():
        if values['new_db'] in v.values():
            print(f'Table name "{values["new_db"]}" already exists')
            error = True
    if error:
        return
    dbtables[values['new_ref']] = d = {}
    for x in ['new_cv', 'new_db']:
        d[x] = values[x]
    yaml.dump(data, filename)
    if verbose:
        sys.stdout.write(filename.read_text())
    

values = dict(new_ref='CurrentMinuteL', new_cv='trend_csv', new_db='currentminutel')
add_table(Path('config.yaml'), values, verbose=True)
print('========')
add_table(Path('config.yaml'), values, verbose=True)

which gives:

DBtables:
  CurrentMinuteLoad:
    CSV_File: trend.csv
    Table_Name: currentminuteload
  CurrentMinuteL:
    new_cv: trend_csv
    new_db: currentminutel
========
Reference name "CurrentMinuteL" already exists
Table name "currentminutel" already exists

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Drago96
Solution 2 Anthon