torch_pgn.data package

Submodules

torch_pgn.data.FingerprintDataset module

class torch_pgn.data.FingerprintDataset.FingerprintDataset(args, transform=None, pre_transform=None)

Bases: InMemoryDataset

download(): Downloads the dataset to the self.raw_dir folder.

process(): Processes the dataset to the self.processed_dir folder.

property processed_file_names: The name of the files in the self.processed_dir folder that must be present in order to skip processing.

property raw_file_names: The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

torch_pgn.data.ProximityGraphDataset module

class torch_pgn.data.ProximityGraphDataset.ProximityGraphDataset(args, transform=None, pre_transform=None)

Bases: InMemoryDataset

download(): Downloads the dataset to the self.raw_dir folder.

parse_pre_transforms()

process(): Processes the dataset to the self.processed_dir folder.

property processed_file_names: The name of the files in the self.processed_dir folder that must be present in order to skip processing.

property raw_file_names: The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

torch_pgn.data.data_utils module

class torch_pgn.data.data_utils.LigandOnlyPretransform

Bases: object

Transform object for ProximityGraphDataset to applied in the pre-transform step. Takes a proximity graph and removes any interaction edges/protein atoms from the graph.

class torch_pgn.data.data_utils.OneHotTransform

Bases: object

Transform object for ProximityGraphDataset with atomic number feature. Transforms the int feature into a 1-hot encoding that used atomic number (up to 100) as an index. #TODO: make it compatible with other atomic number index positions.

class torch_pgn.data.data_utils.RandomLigandTranslationTransform(mean=None, std=None, scale=0.05)

Bases: object

TODO: Not applied. Will go back latter to implement. Need to figure out DMPNN complications. Transform object for ProximityGraphDataset to applied in the pre-transform step. Takes a proximity graph and updates any interaction edges/protein atoms from the graph.

class torch_pgn.data.data_utils.RemoveProximityEdgesPretransform

Bases: object

Transform object for ProximityGraphDataset to applied in the pre-transform step. Takes a proximity graph and removes any interaction edges/protein atoms from the graph.

torch_pgn.data.data_utils.format_data_directory(args)

torch_pgn.data.data_utils.normalize_distance(dataset, args=None, index=None, mean=None, std=None, yield_stats=True): Normalizes the training target to have mean 0 and stddev 1 :param dataset: dataset to normalize the targets for :param index: Index into dataset for the subset of dataset to calculate statistics on for normalization :param mean: external mean to use for normalization (i.e. test set normalization) :param std: external stddev to use for normalization (i.e. test set normalization) :param yield_stats: toggle to yield [dataset, (mean, std)] if True or just [dataset] if False :return: A dataset with normalized targets

torch_pgn.data.data_utils.normalize_targets(dataset, index=None, mean=None, std=None, yield_stats=True): Normalizes the training target to have mean 0 and stddev 1 :param dataset: dataset to normalize the targets for :param index: Index into dataset for the subset of dataset to calculate statistics on for normalization :param mean: external mean to use for normalization (i.e. test set normalization) :param std: external stddev to use for normalization (i.e. test set normalization) :param yield_stats: toggle to yield [dataset, (mean, std)] if True or just [dataset] if False :return: A dataset with normalized targets

torch_pgn.data.data_utils.parse_transforms(transforms)

torch_pgn.data.data_utils.split_test_graphs(graph_path, test_list): Splits out the core set paths and the train set from the graphs output from generate_all_graphs. :param graph_path: The path where the graph generation placed all graphs :param test_list: The path to the csv containing the graphs :return: None

torch_pgn.data.dmpnn_utils module

A lot of this code is repurposed from https://github.com/chemprop. This code is used in order to make the proximity graph dataset compatible with using the D-MPNN with edge messages.

class torch_pgn.data.dmpnn_utils.BatchProxGraph(mol_graphs, atom_fdim, bond_fdim)

Bases: object

A BatchMolGraph represents the graph structure and featurization of a batch of molecules. A BatchMolGraph contains the attributes of a MolGraph plus: * atom_fdim: The dimensionality of the atom feature vector. * bond_fdim: The dimensionality of the bond feature vector (technically the combined atom/bond features). * a_scope: A list of tuples indicating the start and end atom indices for each molecule. * b_scope: A list of tuples indicating the start and end bond indices for each molecule. * max_num_bonds: The maximum number of bonds neighboring an atom in this batch. * b2b: (Optional) A mapping from a bond index to incoming bond indices. * a2a: (Optional): A mapping from an atom index to neighboring atom indices.

get_a2a(): Computes (if necessary) and returns a mapping from each atom index to all neighboring atom indices. :return: A PyTorch tensor containing the mapping from each bond index to all the incoming bond indices.

get_b2b(): Computes (if necessary) and returns a mapping from each bond index to all the incoming bond indices. :return: A PyTorch tensor containing the mapping from each bond index to all the incoming bond indices.

get_components(atom_messages: bool = False)

Returns the components of the BatchMolGraph. The returned components are, in order: * f_atoms * f_bonds * a2b * b2a * b2revb * a_scope * b_scope :param atom_messages: Whether to use atom messages instead of bond messages. This changes the bond feature

vector to contain only bond features rather than both atom and bond features.

Returns:: A tuple containing PyTorch tensors with the atom features, bond features, graph structure, and scope of the atoms and bonds (i.e., the indices of the molecules they belong to).

class torch_pgn.data.dmpnn_utils.MolGraphTransform: Bases: object

class torch_pgn.data.dmpnn_utils.ProxGraph(mol, atom_descriptors: Optional[ndarray] = None)

Bases: object

A ProxGraph represents the graph structure and featurization of a single molecule. A ProxGraph computes the following attributes: * n_atoms: The number of atoms in the molecule. * n_bonds: The number of bonds in the molecule. * f_atoms: A mapping from an atom index to a list of atom features. * f_bonds: A mapping from a bond index to a list of bond features. * a2b: A mapping from an atom index to a list of incoming bond indices. * b2a: A mapping from a bond index to the index of the atom the bond originates from. * b2revb: A mapping from a bond index to the index of the reverse bond.

apply_dist_norm(dist_column, mean, std)

remove_dist_norm(dist_column, mean, std)

torch_pgn.data.dmpnn_utils.prox2graph(mols) → BatchProxGraph: Converts a directory of raw proximity graphs into BatchMolGraph. :param directory: directory containing a PG raw inputs :return: A BatchMolGraph containing the combined molecular graph for the molecules.

torch_pgn.data.load_data module

torch_pgn.data.load_data.load_proximity_graphs(args): #TODO :param args: :return:

torch_pgn.data package

Submodules

torch_pgn.data.FingerprintDataset module

torch_pgn.data.ProximityGraphDataset module

torch_pgn.data.data_utils module

torch_pgn.data.dmpnn_utils module

torch_pgn.data.load_data module

Module contents