'Python: Trying to write to .txt only lines that contains specific word, instead of the whole text
This Python code here reads list.txt which contains websites links and then extract URLS from webarchive.org from those websites, and writes them to urls.txt. What I want is to extract ONLY lines that contains specific "WORD". As I see, my code extract all lines if a specific "WORD" exist in one line.
Can anyone explain why? Thank you in advance!
The code:
urls = []
with open("list.txt", "r") as f_in:
for line in map(str.strip, f_in):
if line == "":
continue
urls.append(line)
archive_url = "http://web.archive.org/cdx/search/cdx?url=*.{}&output=text&fl=original&collapse=urlkey"
with open("url.txt", "w") as f_out:
for url in urls:
r = requests.get(archive_url.format(url))
if 'WORD' in archive_url:
print(r.text, file=f_out)
print("\n", file=f_out)
I tried to replace if 'WORD' in archive_url: with if 'WORD' in url: but it doesn't write anything to TXT!
I don't know how to print only the LINE which contain "WORD"
Solution 1:[1]
Try:
import requests
urls = []
with open("list.txt", "r") as f_in:
for line in map(str.strip, f_in):
if line == "":
continue
urls.append(line)
archive_url = "http://web.archive.org/cdx/search/cdx?url=*.{}&output=text&fl=original&collapse=urlkey"
with open("url.txt", "w") as f_out:
for url in urls:
r = requests.get(archive_url.format(url))
for line in r.text.splitlines():
if "your_word" in line:
print(line, file=f_out)
print("\n", file=f_out)
Solution 2:[2]
with open("url.txt", "w") as f_out:
for url in urls:
if 'WORD' in url:
r = requests.get(archive_url.format(url))
f_out.write(r,'\n')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Andrej Kesely |
| Solution 2 |
