I have a Spark question, so for the input for each entity k I have a sequence of probability p_i with a value associated v_i, for example the data can look like
leiningen
hexagon-dsp
system-services
iot
websphere-commerce
envstats
cmake-presets
shdocvw
hpcc-ecl
octobercms-plugins
string-pool
wikitext
multi-gpu
fullcalendar
python-appium
defaults
server-error
pangolin
cohesion
lua-5.1
react-safeareaview
adblock
myob
inorder
restore-points
laravel-modules
rangeslider
rtm
chop
whitenoise