I have a Spark question, so for the input for each entity k I have a sequence of probability p_i with a value associated v_i, for example the data can look like
apple-business-manager
mocha-phantomjs
mybinder
beat-detection
catch-block
psr-1
sqloledb
mako
tsyringe
socketexception
global-namespace
google-maps-embed
windows-nt
julia-1.7
ios-background-mode
ionic-tabs
ios-app-signing
azure-sql-server-managed-instance
fckeditor
gcc4.7
zipline
sdist
condor
riot
pihole
kdb
sass-loader
common.logging
r-grid
wrk