'tweepy: RateLimitError when I try to extract 1000 tweets
I am trying to get 1000 tweets with tweepy. I know there is a limit of tweets that I can extract but from what I have read it is possible to extract 1000 tweets using Cursor. The thing is that I am extracting the desirable tweets and append them in a list inside a for loop but everytime I try to break from that loop I get a RateLimitError. Below is the code I am using. Any ideas?
def get_tweets(api, context):
cnt= 0
tweets = tweepy.Cursor(api.search,
q=context,
since='2022-02-06', until='2022-02-08',
result_type='recent',
include_entities=True,
monitor_rate_limit=True,
wait_on_rate_limit=True,
lang="en").items()
return tweets
tweets_1 = get_tweets(api, 'Lakers')
tweets = []
cnt = 1
for tweet in tweets_1:
print(f'TEXT{cnt}: {tweet.text}')
tweets.append(tweet)
cnt += 1
if cnt == 1001:
break # Everytime I break I get RateLimitError
Solution 1:[1]
are you still dealing with this problem? I'm not a pro but maybe it might solve your problem.
What I'm doing is exactly what u did but a bit further. I think the problem is at "wait_on_rate_limit" position which is in my code is written in Auth section:
auth1 = tweepy.OAuthHandler(consumer_key1, consumer_secret1) api = tweepy.API(auth2, wait_on_rate_limit=True)
def get_trends(api, loc): # get trendings at US
trends = api.get_place_trends(loc)
return trends[0]["trends"]
def scrape(numtweet, API, db, a, b):
for word in d[a:b]: # word from trending [0:5] and [5:10] (ref: check def scraping)
print (word)
query = word + ' -filter:retweets' # remove RT tweet
tweets = tweepy.Cursor(
api.search_tweets,
query,
lang = 'en',
result_type = 'recent',
tweet_mode = 'extended').items(numtweet)
list_tweets = [tweet for tweet in tweets]
for tweet in list_tweets: # appending section
username = tweet.user.screen_name
text = tweet.full_text
text = text.replace('\n', '').replace(',', '')
db['trendings'].append(word)
db['username'].append(username)
db['text'].append(text)
return db
if __name__ == '__main__':
db = {'trendings': [], 'username': [], 'text': []}
loc = "23424977" # US's WOEID
trends = get_trends(api1, loc)
d= [i['name'] for i in trends[0:10]] # take 10 of US's trendings list
db = scraping(200, db)
df = pd.DataFrame.from_dict(db)
What I'm doing is getting the top 10 US trends topic ( loc="23424977": is WOEID of US), and from each trend, I will take 200 tweets, so basically I can scrape 1000 tweets per 15 mins
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Nguyen Hoang Chu |
