'Cross referencing data and removing entire lines that contain same string with Python

Cutting to the chase here...

I have 2 sets of data, both in .txt files. We will call them DATA A and DATA B.

I am currently collecting information from other students for a mailing list / club application as well as gathering a bit more data from them. This data goes into DATA A. DATA A currently looks like the following example:

[email protected] | How many years in college: 3 | Address: 123 Example Blvd.
[email protected] | How many years in college: 1 | Address: 444 Example Blvd.
[email protected] | How many years in college: 2 | Address: 567 Example Blvd.

However, when people sign up at the auditorium and aren't monitored, they tend to only leave their email and leave out the certain pieces of information such as the following examples:

[email protected] | N/A | N/A
[email protected] | How many years in college: 1 | N/A
[email protected] | N/A | 111 Example Blvd.

When compiling both sets of data, I need to make sure that the semi-completed lines of data are (when both files are run with the script) still present on DATA A, but are in turn REMOVED FROM DATA B as to leave ONLY FRESH, NOT ALREADY PRESENT data on either DATA B or an OUTPUT file with ONLY THE EMAILS, not the blank data, so I can email them to ask for it...

Here is an example:

EXAMPLE

DATA A:

[email protected] | How many years in college: 3 | Address: 123 Example Blvd.
[email protected] | How many years in college: 1 | Address: 444 Example Blvd.
[email protected] | How many years in college: 2 | Address: 567 Example Blvd.
[email protected] | How many years in college: 2 | Address: 888 Example Blvd.

DATA B:

[email protected] | N/A | N/A
[email protected] | How many years in college: 3 | N/A
[email protected] | N/A | 888 Example Blvd.
[email protected] | N/A | N/A

+++ SCRIPT IS RUN AT THIS POINT +++ (Any COMMON data from DATA B is removed from DATA B OR an output file with only FRESH data is created)...

OUTPUT:

[email protected]
[email protected]

Would love to know how to do this in Python - thanks!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source