When using the scikit-learn library in Python, I can use the CountVectorizer to create ngrams of a desired length (e.g. 2 words) like so: from sklearn.metrics.
android-studio-import
dynatrace
hypervisor
thinktecture-ident-server
texttrimming
powershell-4.0
stride
java-vertx-web
webshop
shared-ptr
access-control-allow-origin
category-theory
pointer-events
opencascade
outputformat
cardinality
security-scoped-bookmarks
resharper-9.0
android-appwidget-list
rmdformats
devops
computer-name
paintevent
ilm
secure-crt
expression-evaluation
visual-studio-2022
dfinity
webfiltering
flux-influxdb