'I want to search a file of tweets to find the most popular hashtags used

For a python project I have been asked to collect tweets over a certain period of time about a certain topic. I now have a file with hundreds of tweets. How do I search for most popular hashtags in that file to create a word cloud?



Solution 1:[1]

Let us suppose that your corpus is stored as a list and all the special characters are already removed. I am using functions from sklearn

corpus = ['the text of your tweet','quote in it']
vectorizer = TfidfVectorizer(stop_words='english')
v = vectorizer.fit_transform(corpus)
names = vectorizer.get_feature_names()
dense = v.todense()
final_list = dense.tolist()
df = pd.DataFrame(final_list, columns=names)
Cloud = WordCloud(background_color="white", max_words=50).generate_from_frequencies(df.T.sum(axis=1))

Solution 2:[2]

I will suppose that you have the ID of each tweet you need to send a GET request to this url " https://twitter.com/i/api/graphql/6n-3uwmsFr53-5z_w5FTVw/TweetDetail?variables=%7B%22focalTweetId<YOUR_TWEET_ID> with_rux_injections%22%3Afalse%2C%22includePromotedContent%22%3Atrue%2C%22withCommunity%22%3Atrue%2C%22withQuickPromoteEligibilityTweetFields%22%3Atrue%2C%22withBirdwatchNotes%22%3Afalse%2C%22withSuperFollowsUserFields%22%3Atrue%2C%22withDownvotePerspective%22%3Afalse%2C%22withReactionsMetadata%22%3Afalse%2C%22withReactionsPerspective%22%3Afalse%2C%22withSuperFollowsTweetFields%22%3Atrue%2C%22withVoice%22%3Atrue%2C%22withV2Timeline%22%3Atrue%2C%22__fs_responsive_web_like_by_author_enabled%22%3Afalse%2C%22__fs_dont_mention_me_view_api_enabled%22%3Atrue%2C%22__fs_interactive_text_enabled%22%3Atrue%2C%22__fs_responsive_web_uc_gql_enabled%22%3Afalse%2C%22__fs_responsive_web_edit_tweet_api_enabled%22%3Afalse%7D "

Note: the url looks not good because of the line breaks but hopefully you understood

and the very first kind of param is focalTweetId, which is the id of the tweet, this API call will return a data object where you'll find all infos about a tweet

const response = await fetch(url)
console.log(response.data.instructions[0].entries[0].content.itemContent.tweet_results.result.legacy.retweet_count)

I did this in JavaScript, so you can do it in python with

response = requests.get(url)
# ...

this will return the retweet_count, and there are a lot of other usefull data you can use

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Harsh
Solution 2 YassBan