Category "elasticsearch-performance"

recommended max number of tokens? (scalability)

I'm using the following ngram tokenizer to process 15000 documents (and expect it to grow to up to a million documents), each with up to 6000 characters (avg 10