'rearrange txt to csv file using python
i have data in txt file in the format of
Santosh kumar
+92 123 1234567
Voted For Voted 2 8 months ago
Doc...sapna
+92 123 1234567
Voted For Voted 2 8 months ago
Ramesh Dinani
+92 123 1234567
604PMO S BH: all & GD
Poll e)
Details Options Voters Settings Message
Mk we
+92 242342
Voted For Voted 4 8 months ago
+92 123 1234567
Voted For Voted 2 8 months ago
Nenoram Kolhi
+123 1234567 there more rough line of data between numbers like
r SKL
+92 12323232
Voted For Voted
i need data NAme and phone NUmber LIKE
Name,Number
Santosh kumar,+92 123 1234567
Nenoram Kolhi,+123 1234567
and remove all rough data my code not working properly
import csv
with open('File001.txt', 'r') as in_file:
stripped = (line.strip() for line in in_file)
lines = (line.split("+") for line in stripped if line)
with open('log1.csv', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerow(('title', 'intro'))
writer.writerows(lines)
#########
import pandas as pd
read_file = pd.read_csv('log1.csv',header = None,delimiter = ',')
read_file.columns = ['Name','number']
read_file.to_csv('Final1.csv', index=None)
Solution 1:[1]
The use of regular expressions or "regex" should be of great help in your case.
For example, this piece of code look for every phone number (a "+" followed by numbers or spaces) and the two previous lines to get the names :
import re
re_contact=re.compile(r"\n(.*?)\n\n(\+[\d\s]+?)\n")
for contact in re_contact.finditer(text):
print("name=",contact.group(1))
print("number=",contact.group(2))
print()
This gives :
name= Santosh kumar
number= +92 123 1234567
name= Doc...sapna
number= +92 123 1234567
name= Ramesh Dinani
number= +92 123 1234567
name= Mk we
number= +92 242342
name= Voted For Voted 4 8 months ago
number= +92 123 1234567
name= r SKL
number= +92 12323232
As you can see, there is a phone number whithout name clearly associated in your data.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | manu190466 |
