'RDkit fingerprint

I have 100 polymers and I want to compare their solubility by their fingerprint.

By using rdkit I reach a list of bits for each polymer like as [39, 80, 152, 233, 234, 265, 310, 314, 321, 356, 360, 406, 547, 650, 662, 726, 730, 801, 819, 849, 935]', but I faced with this error: " it could not convert string to float: "

my first question is how can I reach to just one bit for each polymer? and how can I define each bit as a single feature in rdkit?



Solution 1:[1]

Based on your problem, I believe you use Morgan Fingerprint with radius=2 and fpSize=1024. However, count fingerprint results in a list of hashed value. If you want to deal with comparison, I suggested you should use rdkit.Chem.rdMolDescriptors.GetMorganFingerprintAsBitVect in here #1.

If you want to use count fingerprint, see here #2 and search this query: The types of atom pairs and torsions are normal (default), hashed and bit vector (bv). The types of the Morgan fingerprint are bit vector (bv, default) and count vector (count).

If you want to get the result as np.array, you can run bv = GetMorganFingerprintAsBitVect(mol, radius=your_radius, nBits=1024, *args, **kwargs).ToBitString(), then run np.frombuffer(bv.encode(), dtype=np.uint8) - 48

However, I cannot provide explicit description and solution without the code so please provide it for further support. Thank you.

#1: https://www.rdkit.org/docs/source/rdkit.Chem.rdMolDescriptors.html?highlight=getmorganfingerprintasbitvect#rdkit.Chem.rdMolDescriptors.GetMorganFingerprintAsBitVect

#2: https://www.rdkit.org/docs/GettingStartedInPython.html

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Take Ichiru