'When i try to create a DataFramne from a List of Lists it shows 16 empty columns that i have no idea where they come from
Im trying to create a dataframe with a list that contains data about a message as lists, having date,sender and body of the message
def extract_data(text):
fields = []
fields = text.split(']')
fields[0] = fields[0][1:]
fields[0] = fields[0].replace('/','-')
fields.extend(fields[1].split(':'))
fields.pop(1)
return fields
lines = []
messages = []
header = ['Date','Sender','Body']
with open('merlinator.txt',encoding = 'utf8') as f:
for line in f:
if has_date(line):
lines.append(((line.replace('\u200e','')).strip('\n')))
else:
lines[len(lines) - 1] += ' ' + line.strip('\n')
lines = [l for l in lines if valid_user(l)]
if i append just one item to the list "messages" and use it to create the df, it works just fine.
lines = [l for l in lines if valid_user(l)]
messages.append(extract_data(lines[0]))
df = pd.DataFrame(messages,columns=header)
df
if i use a loop to complete the list i´ll use for the dataframe, it appears to be fine when displayed, but when used to create the dataframe it shows 16 empty extra columns.
for x in lines:
messages.append(extract_data(x))
df = pd.DataFrame(messages)
df
what im i doing wrong?
pd: Im learning python and pandas so every constructive criticism about my code is morew than welcome, also english is not my first language so sorry for bad english :)
EDIT
the input file is a whatsapp group chat exported as a .txt and it looks like this:
[01/12/2019 4:01:38 AM] Joaquin Cibeira: He creado este grupo con el fin de que nadie nunca mas se pierda una juntada
[01/12/2019 4:01:57 AM] Joaquin Cibeira: Si consideran que falta alguien, ej: marinita. Lo metemos
[01/12/2019 4:03:30 AM] Almada: sticker omitted
[01/12/2019 4:03:35 AM] You added Juani Pisa
[01/12/2019 4:04:16 AM] Juani Pisa: sticker omitted
[01/12/2019 4:05:12 AM] Almada: audio omitted
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
