'Convert a SMILES dataset to graph

My idea would be to create a VAE or a GAN capable of generating new drugs, using graphs as representations for my molecules. Now I’m asking the real question:

I started the project with a simple Pandas dataframe made up of SMILES strings and various features, like this one:

  • CC(=O)Nc1ccc(O)cc1, weight = 151.16, …

  • CC(=O)Oc1ccccc1C(=O)O, weight = 180, …

Is it possible to convert the strings in a graph data format? If yes, may you give me some suggestions on how to do that?

Thank you all!



Solution 1:[1]

Yes, use dgl lifesci they have a few functions for smiles to graphs depending on the graph you want:

https://github.com/awslabs/dgl-lifesci/blob/master/python/dgllife/utils/mol_to_graph.py

Also deepchem has similar functionality in their inbuilt featurizers: https://github.com/deepchem/deepchem/blob/master/deepchem/feat/molecule_featurizers/mol_graph_conv_featurizer.py

Sometimes going stright from smiles to graph can be confusing, where you see anything that talks about mol e.g mol_to_graph, you can convert smiles to mol with the mol_from_smiles function in rdkit.Chem:

mol = Chem.MolFromSmiles('Cc1ccccc1')

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mrw