I have a Spark question, so for the input for each entity k I have a sequence of probability p_i with a value associated v_i, for example the data can look like
hdf5storage
perforce-client-spec
jsondecoder
stencil-utils
deno-puppeteer
surrogate-pairs
svgpanzoom
expressionvisitor
android-arrayadapter
knative-eventing
pcap.net
terrain
netsuite
brat
custom-search-provider
mindsphere
core-data-migration
azure-releases
kotlin-lateinit
term-vectors
llvm-gcc
nominal-typing
counting-semaphore
pytest-cases
http-accept-language
percy
serializearray
yandex-metrika
touch-event
protege