'How can I compute a Count Morgan fingerprint as numpy.array?

I would like to use rdkit to generate count Morgan fingerprints and feed them to a scikit Learn model (in Python). However, I don't know how to generate the fingerprint as a numpy array. When I use

from rdkit import Chem
from rdkit.Chem import AllChem
m = Chem.MolFromSmiles('c1cccnc1C')
fp = AllChem.GetMorganFingerprint(m, 2, useCounts=True)

I get a UIntSparseIntVect that I would need to convert. The only thing I found was cDataStructs (see: http://rdkit.org/docs/source/rdkit.DataStructs.cDataStructs.html), but this does not currently support UIntSparseIntVect.



Solution 1:[1]

from rdkit.Chem import AllChem
m = Chem.MolFromSmiles('c1cccnc1C')
fp = AllChem.GetHashedMorganFingerprint(m, 2, nBits=1024)
fp_dict = fp.GetNonZeroElements()
arr = np.zeros((1024,))
for key, val in fp_dict.items():
    arr[key] = val

It seems there is no direct way to get a numpy array so I build it from the dictionary.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 evilolive