'Trying to add code that extract only lines that contains "word" and write a new .txt file from requests

This code opens a text file (list.txt) with websites and then extract URLS from webarchive.org from those websites, and write them to a new text file (urls.txt). I need to extract from web.archive.org only links that contain "word", but I am getting error:

if `word' in url:  IndentationError: unexpected indent

Can someone explain why and give the right code here?

The code:

urls = []
with open("list.txt", "r") as f_in:
    for line in map(str.strip, f_in):
        if line == "":
            continue
        urls.append(line)

archive_url = "http://web.archive.org/cdx/search/cdx?url=*.{}&output=text&fl=original&collapse=urlkey"

with open("url.txt", "w") as f_out:
    for url in urls:

        r = requests.get(archive_url.format(url))
         if 'word' in url:
        print(r.text, file=f_out)
        print("\n", file=f_out)


Solution 1:[1]

There are two issues:

  1. You have a leading space before the if statement
  2. In the line after this statement, you must indent the code

This should solve your problem:

urls = []
with open("list.txt", "r") as f_in:
    for line in map(str.strip, f_in):
        if line == "":
            continue
        urls.append(line)

archive_url = "http://web.archive.org/cdx/search/cdx?url=*.{}&output=text&fl=original&collapse=urlkey"

with open("url.txt", "w") as f_out:
    for url in urls:

        r = requests.get(archive_url.format(url))
        if 'word' in url:
            print(r.text, file=f_out)
            print("\n", file=f_out)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Desi Pilla