'Analysis of text using Gunning Fox index

While doing Analysis of readability using Gunning Fox index-. I have to calculate following values

  1. Average Sentence Length = the number of words / the number of sentences
  2. Percentage of Complex words = the number of complex words / the number of words
  3. Fog Index = 0.4 * (Average Sentence Length + Percentage of Complex words)

I want to know whether the number of words will be calculated after removing duplicates and stop words i.e. after cleaning or just the total no of words in the text without removing any words or cleaning?

Thanks for help!



Solution 1:[1]

No, you don't do any cleaning or 'stop-word' removal.

You are trying to calculate how easy it is to read the text. Stop words are only relevant for old-style information retrieval. Also, do not remove duplicates. Process the text as-is, otherwise the result will be wrong.

If you were to remove stopwords, the text would be more difficult to read, as effectively a lot of short (ie "easy") words will have been removed.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Oliver Mason