'How to run parallelly the same function in python
I am running one code where the python script is basically scraping through a list of links.
But the process is too slow.I want to divide my code into several process where the code is simultaneously scrapping through multiple links at once.
List of Links is almost 5000.
Here is my code which I want to run in parallel
#links contains list of links
def fun():
for link in links:
requests.get(link,timeout=5)
###... scraping code
#####
Solution 1:[1]
If you want to make more requests at the same time, you don't want to use requests, but AIOHTTP instead.
The package allows you to make HTTP requests asynchronously.
Solution 2:[2]
I suggest build a multithreaded program to make requests. concurrent.futures is one of the easiest ways to multithread these kinds of requests, in particular using the ThreadPoolExecutor. They even have a simple multithreaded URL request example in the documentation.
here is a sample code using bs4 and concurrent.futures
import time
import requests
from bs4 import BeautifulSoup
from concurrent.futures import ThreadPoolExecutor, as_completed
URLs = [ ... ] # A long list of URLs.
def parse(url):
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')
return soup.find_all('a')
# run 10 workers concurrently, it depends on the number of core/threads of your processor
with ThreadPoolExecutor(max_workers=10) as executor:
start = time.time()
futures = [ executor.submit(parse, url) for url in URLs ]
results = []
for result in as_completed(futures):
results.append(result)
end = time.time()
print("Time Taken: {:.6f}s".format(end-start))
Also, you may want to check out python scrapy framework, it will scrape the data concurrently and very easy to learn, also it comes with many features such as auto-throttle, rotating proxies and user-agents, you can easily integrate with your databases as well.
Solution 3:[3]
from flask import Flask, request
app = Flask(__name__)
@app.route('/form', methods=['GET', 'POST'])
def form():
# allow for both POST AND GET
if request.method == 'POST':
language = request.form.get('language')
framework = request.form.get('framework')
return '''
<h1>The language value is: {}</h1>
<h1>The framework value is: {}</h1>'''.format(language, framework)
# otherwise handle the get request
return '''
<form method="POST">
<div><label>Language: <input type="text" name="language"></label></div>
<div><label>Framework: <input type="text" name="framework"></label></div>
<input type="submit" value="Submit">
</form>'
'''
if __name__ == '__main__':
app.run(debug=True)
by adding these two code blocks in the code as shown in the above code, the app is working perfectly fine.
app = Flask(__name__)
if __name__=='__main__':
app.run(debug=True)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | ahmedshahriar |
| Solution 3 | Ayush |
