'Computing relative frequencies based on dictionary
I'd like to examine the Psychological Capital (a construct consisting of four dimensions, namely hope, optimism, efficacy and resiliency) of founders using computer-aided text analysis in R. So far I have pulled tweets from various users into R. The data frame contains of 2130 tweets from 5 different users in different periods. The dataframe is called before_failure. Picture of original data frame
I have then used the quanteda package to create a corpus, perfomed tokenization on it and removed redundant punctuatio/numbers/symbols:
#Creating a corpus
before_failure_corpus <- corpus(before_failure, text_field = "text")
#Tokenization, removing punctuation and numbers
tok_before_failure <- before_failure_corpus %>%
tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>%
tokens_tolower()
After that I created a dictionary also using the quanteda package (The dictionary itself has been created by other authors examining Psychological capital):
#Creating Dictionary with quanteda
dict <- dictionary(list(hope = c("Accomplishments", "Achievements", "Approach", "Aspiration", "Aspire", "Aspired",
"Aspirer", "Aspires", "Aspiring", "Aspiringly", "Assurance", "Assurances", "Assure",
"Assured", "Assuredly", "Assuredness", "Assuring", "Assuringly", "Assuringness", "Belief",
"Believe", "Believed", "Believes", "Believing", "Breakthrough", "Certain", "Certainly",
"Certainty", "Committed", "Concept", "Confidence", "Confident", "Confidently",
"Convinced", "Dare say", "Deduce", "Deduced", "Deduces", "Deducing", "Desire",
"Desired", "Desires", "Desiring", "Doubt not", "Energy", "Engage", "Engagement",
"Expectancy", "Faith", "Foresaw", "Foresee", "Foreseeing", "Foreseen", "Foresees", "Goal",
"Goals", "Hearten", "Heartened", "Heartening", "Hearteningly", "Heartens", "Hope",
"Hoped", "Hopeful", "Hopefully", "Hopefulness", "Hoper", "Hopes", "Hoping", "Idea",
"Innovation", "Innovative", "Ongoing", "Opportunity", "Promise", "Promising",
"Propitious", "Propitiously", "Propitiousness", "Solution", "Solutions", "Upbeat",
"Wishes", "Wishing", "Yearn", "Yearn for", "Yearning", "Yearning for", "Yearns for"),
efficacy = c("Ability", "Accomplish", "Accomplished", "Accomplishes", "Accomplishing",
"Accomplishments", "Achievements", "Achieving", "Adept", "Adeptly", "Adeptness",
"Adroitly", "Adroitness", "All-in", "Aplomb", "Arrogance", "Arrogant", "Arrogantly",
"Assurance", "Assured", "Assuredly", "Assuredness", "Backbone", "Bandwidth", "Belief",
"Capable", "Capableness", "Capably", "Certain", "Certainly", "Certainness", "Certainty",
"Certitude", "Cocksurely", "Cocksureness", "Cocky", "Commitment", "Commitments",
"Committed", "Compelling", "Competence", "Competency", "Competent", "Competently",
"Confidence", "Confident", "Confidently", "Conviction", "Effective", "Effectively",
"Effectiveness", "Effectual", "Effectually", "Effectualness", "Efficacious", "Efficaciously",
"Efficaciousness", "Efficacy", "Equanimity", "Equanimous", "Equanimously", "Expertise",
"Expertly", "Fortitude", "Fortitudinous", "Forward", "Forwardness", "Know-how",
"Knowledgability", "Knowledgeable", "Knowledgably", "Masterful", "Masterfully", "Masterfulness",
"Masterly", "Mastery", "Overconfidence", "Overconfident", "Overconfidently",
"Persuasion", "Power", "Powerful", "Powerfully", "Powerfulness", "Prevailed",
"Prevailing", "Prevails", "Prevalence", "Prevalent", "Reassurance", "Reassure", "Reassured",
"Reassures", "Reassuring", "Self-assurance", "Self-assured", "Self-assuring", "Selfconfidence",
"Self-confident", "Self-dependence", "Self-dependent", "Self-reliance",
"Self-reliant", "Stamina", "Steadily", "Steadiness", "Steady", "Strength", "Strong", "Stronger",
"Strongish", "Strongly", "Strongness", "Superior", "Superiority", "Sure", "Surely", "Sureness",
"Unblinking", "Unblinkingly", "Undoubtedly", "Undoubting", "Unflappability", "Unflappable",
"Unflinching", "Unflinchingly", "Unhesitating", "Unhesitatingly", "Unwavering",
"Unwaveringly"),
resiliency = c("Adamant", "Adamantly", "Assiduous", "Assiduously", "Assiduousness", "Backbone",
"Bandwidth", "Bears up", "Bounce", "Bounced", "Bounces", "Bouncing", "Buoyant",
"Commitment", "Commitments", "Committed", "Consistent", "Determination",
"Determined", "Determinedly", "Determinedness", "Devoted", "Devotedly",
"Devotedness", "Devotion", "Die trying", "Died trying", "Dies trying", "Disciplined",
"Dogged", "Doggedly", "Doggedness", "Drudge", "Drudged", "Drudges", "Endurance",
"Endure", "Endured", "Endures", "Enduring", "Grit", "Hammer away", "Hammered away",
"Hammering away", "Hammers away", "Held fast", "Held good", "Held up", "Hold fast",
"Holding fast", "Holding up", "Holds fast", "Holds good", "Immovability", "Immovable",
"Immovably", "Indefatigable", "Indefatigableness", "Indefatigably", "Indestructibility",
"Indestructible", "Indestructibleness", "Indestructibly", "Intransigence", "Intransigency",
"Intransigent", "Keep at", "Keep going", "Keep on", "Keeping at", "Keeping going",
"Keeping on", "Keeps at", "Keeps going", "Keeps on", "Kept at", "Kept going", "Kept on",
"Labored", "Laboring", "Never-tiring", "Never-wearying", "Perdure", "Perdured", "Perduring",
"Perseverance", "Persevere", "Persevered", "Persevering", "Persist", "Persisted",
"Persistence", "Persistent", "Persisting", "Pertinacious", "Pertinaciously", "Pertinacity",
"Rebound", "Rebounded", "Rebounding", "Rebounds", "Relentlessness", "Remain",
"Remained", "Remaining", "Remains", "Resilience", "Resiliency", "Resilient", "Resolute",
"Resolutely", "Resoluteness", "Resolve", "Resolved", "Resolves", "Resolving", "Robust",
"Sedulity", "Sedulous", "Sedulously", "Sedulousness", "Snap back", "Snapped back",
"Snapping back", "Snaps back", "Spring back", "Springing back", "Springs", "Springs back",
"Sprung back", "Stalwart", "Stalwartly", "Stalwartness", "Stand fast", "Stand firm", "Standingfast",
"Standing firm", "Stands fast", "Stands firm", "Stay", "Steadfast", "Steadfastly",
"Steadfastness", "Stood fast", "Stood firm", "Strove", "Survive", "Surviving", "Surviving",
"Tenacious", "Tenaciously", "Tenaciousness", "Tenacity", "Tough", "Uncompromising",
"Uncompromisingly", "Uncompromisingness", "Unfaltering", "Unfalteringly", "Unflagging",
"Unrelenting", "Unrelentingly", "Unrelentingness", "Unshakable", "Unshakablely",
"Unshakeable", "Unshaken", "Unshaking", "Unswervable", "Unswerved", "Unswerving",
"Unswervingly", "Unswervingness", "Untiring", "Unwavered", "Unwavering", "Unweariedness",
"Unyielding", "Unyieldingly", "Unyieldingness", "Upheld", "Uphold", "Upholding",
"Upholds", "Zeal", "Zealous", "Zealously", "Zealousness"),
optimism = c("Aspire", "Aspirer", "Aspires", "Aspiring", "Aspiringly", "Assurance", "Assured", "Assuredly",
"Assuredness", "Assuring", "Auspicious", "Auspiciously", "Auspiciousness", "Bank on",
"Beamish", "Believe", "Believed", "Believes", "Believing", "Bullish", "Bullishly", "Bullishness",
"Confidence", "Confident", "Confidently", "Encourage", "Encouraged", "Encourages",
"Encouraging", "Encouragingly", "Ensuring", "Expectancy", "Expectant", "Expectation",
"Expectations", "Expected", "Expecting", "Faith", "Good omen", "Hearten", "Heartened",
"Heartener", "Heartening", "Hearteningly", "Heartens", "Hope", "Hoped", "Hopeful",
"Hopefully", "Hopefulness", "Hoper", "Hopes", "Hoping", "Ideal", "Idealist", "Idealistic",
"Idealistically", "Ideally", "Looking up", "Looks up", "Optimism", "Optimist", "Optimistic",
"Optimistical", "Optimistically", "Outlook", "Positive", "Positively", "Positiveness",
"Positivity", "Promising", "Propitious", "Propitiously", "Propitiousness", "Reassure",
"Reassured", "Reassures", "Reassuring", "Roseate", "Rosy", "Sanguine", "Sanguinely",
"Sanguineness", "Sanguinity", "Sunniness", "Sunny")))
Now i would like to compute the relative frequency by dividing the number of words used in the tweets that reflect the four dimensions of Psycap trough the total number of words in the corpus. Unfortunately I got stuck at this point. In the end I would like to have a table that looks like this (values are made up):
dimensions Frequency
1 hope 0.36
2 optimism 0.50
3 Efficacy 0.22
4 Resiliency 0.10
I hope my explanations are sufficient, if not do not hesitate to ask. Thank you
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
