'TF-IDF similarity with twitter stream

I have collected many tweets using twitter4j and saved them into different text files. Now I want to consider several time windows of size 10 days and for each window I want to select its top 100 tokens according to their TF-IDF. How can I do this using java? It's necessary to create a Lucene index?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source