I have a Spark question, so for the input for each entity k I have a sequence of probability p_i with a value associated v_i, for example the data can look like
listof
irfanview
deriving
mutagen
qtextedit
lucidchart
labeling
automatic-updates
aes
hyperledger-sawtooth
lotus-notes
cmusphinx
user-presence
saleslogix
cron-task
afnetworking
portforwarding
off-heap
php-cs-fixer
wildfly
knative-serving
nsstatusitem
gui-builder
xml-column
mutex
flutter-build
spider-chart
android-broadcast
ms-access-reports
conversion-operator