torch_pgn.data package
Submodules
torch_pgn.data.FingerprintDataset module
- class torch_pgn.data.FingerprintDataset.FingerprintDataset(args, transform=None, pre_transform=None)
Bases:
InMemoryDataset- download()
Downloads the dataset to the
self.raw_dirfolder.
- process()
Processes the dataset to the
self.processed_dirfolder.
- property processed_file_names
The name of the files in the
self.processed_dirfolder that must be present in order to skip processing.
- property raw_file_names
The name of the files in the
self.raw_dirfolder that must be present in order to skip downloading.
torch_pgn.data.ProximityGraphDataset module
- class torch_pgn.data.ProximityGraphDataset.ProximityGraphDataset(args, transform=None, pre_transform=None)
Bases:
InMemoryDataset- download()
Downloads the dataset to the
self.raw_dirfolder.
- parse_pre_transforms()
- process()
Processes the dataset to the
self.processed_dirfolder.
- property processed_file_names
The name of the files in the
self.processed_dirfolder that must be present in order to skip processing.
- property raw_file_names
The name of the files in the
self.raw_dirfolder that must be present in order to skip downloading.
torch_pgn.data.data_utils module
- class torch_pgn.data.data_utils.LigandOnlyPretransform
Bases:
objectTransform object for ProximityGraphDataset to applied in the pre-transform step. Takes a proximity graph and removes any interaction edges/protein atoms from the graph.
- class torch_pgn.data.data_utils.OneHotTransform
Bases:
objectTransform object for ProximityGraphDataset with atomic number feature. Transforms the int feature into a 1-hot encoding that used atomic number (up to 100) as an index. #TODO: make it compatible with other atomic number index positions.
- class torch_pgn.data.data_utils.RandomLigandTranslationTransform(mean=None, std=None, scale=0.05)
Bases:
objectTODO: Not applied. Will go back latter to implement. Need to figure out DMPNN complications. Transform object for ProximityGraphDataset to applied in the pre-transform step. Takes a proximity graph and updates any interaction edges/protein atoms from the graph.
- class torch_pgn.data.data_utils.RemoveProximityEdgesPretransform
Bases:
objectTransform object for ProximityGraphDataset to applied in the pre-transform step. Takes a proximity graph and removes any interaction edges/protein atoms from the graph.
- torch_pgn.data.data_utils.format_data_directory(args)
- torch_pgn.data.data_utils.normalize_distance(dataset, args=None, index=None, mean=None, std=None, yield_stats=True)
Normalizes the training target to have mean 0 and stddev 1 :param dataset: dataset to normalize the targets for :param index: Index into dataset for the subset of dataset to calculate statistics on for normalization :param mean: external mean to use for normalization (i.e. test set normalization) :param std: external stddev to use for normalization (i.e. test set normalization) :param yield_stats: toggle to yield [dataset, (mean, std)] if True or just [dataset] if False :return: A dataset with normalized targets
- torch_pgn.data.data_utils.normalize_targets(dataset, index=None, mean=None, std=None, yield_stats=True)
Normalizes the training target to have mean 0 and stddev 1 :param dataset: dataset to normalize the targets for :param index: Index into dataset for the subset of dataset to calculate statistics on for normalization :param mean: external mean to use for normalization (i.e. test set normalization) :param std: external stddev to use for normalization (i.e. test set normalization) :param yield_stats: toggle to yield [dataset, (mean, std)] if True or just [dataset] if False :return: A dataset with normalized targets
- torch_pgn.data.data_utils.parse_transforms(transforms)
- torch_pgn.data.data_utils.split_test_graphs(graph_path, test_list)
Splits out the core set paths and the train set from the graphs output from generate_all_graphs. :param graph_path: The path where the graph generation placed all graphs :param test_list: The path to the csv containing the graphs :return: None
torch_pgn.data.dmpnn_utils module
A lot of this code is repurposed from https://github.com/chemprop. This code is used in order to make the proximity graph dataset compatible with using the D-MPNN with edge messages.
- class torch_pgn.data.dmpnn_utils.BatchProxGraph(mol_graphs, atom_fdim, bond_fdim)
Bases:
objectA
BatchMolGraphrepresents the graph structure and featurization of a batch of molecules. A BatchMolGraph contains the attributes of aMolGraphplus: *atom_fdim: The dimensionality of the atom feature vector. *bond_fdim: The dimensionality of the bond feature vector (technically the combined atom/bond features). *a_scope: A list of tuples indicating the start and end atom indices for each molecule. *b_scope: A list of tuples indicating the start and end bond indices for each molecule. *max_num_bonds: The maximum number of bonds neighboring an atom in this batch. *b2b: (Optional) A mapping from a bond index to incoming bond indices. *a2a: (Optional): A mapping from an atom index to neighboring atom indices.- get_a2a()
Computes (if necessary) and returns a mapping from each atom index to all neighboring atom indices. :return: A PyTorch tensor containing the mapping from each bond index to all the incoming bond indices.
- get_b2b()
Computes (if necessary) and returns a mapping from each bond index to all the incoming bond indices. :return: A PyTorch tensor containing the mapping from each bond index to all the incoming bond indices.
- get_components(atom_messages: bool = False)
Returns the components of the
BatchMolGraph. The returned components are, in order: *f_atoms*f_bonds*a2b*b2a*b2revb*a_scope*b_scope:param atom_messages: Whether to use atom messages instead of bond messages. This changes the bond featurevector to contain only bond features rather than both atom and bond features.
- Returns:
A tuple containing PyTorch tensors with the atom features, bond features, graph structure, and scope of the atoms and bonds (i.e., the indices of the molecules they belong to).
- class torch_pgn.data.dmpnn_utils.MolGraphTransform
Bases:
object
- class torch_pgn.data.dmpnn_utils.ProxGraph(mol, atom_descriptors: Optional[ndarray] = None)
Bases:
objectA
ProxGraphrepresents the graph structure and featurization of a single molecule. A ProxGraph computes the following attributes: *n_atoms: The number of atoms in the molecule. *n_bonds: The number of bonds in the molecule. *f_atoms: A mapping from an atom index to a list of atom features. *f_bonds: A mapping from a bond index to a list of bond features. *a2b: A mapping from an atom index to a list of incoming bond indices. *b2a: A mapping from a bond index to the index of the atom the bond originates from. *b2revb: A mapping from a bond index to the index of the reverse bond.- apply_dist_norm(dist_column, mean, std)
- remove_dist_norm(dist_column, mean, std)
- torch_pgn.data.dmpnn_utils.prox2graph(mols) BatchProxGraph
Converts a directory of raw proximity graphs into BatchMolGraph. :param directory: directory containing a PG raw inputs :return: A
BatchMolGraphcontaining the combined molecular graph for the molecules.
torch_pgn.data.load_data module
- torch_pgn.data.load_data.load_proximity_graphs(args)
#TODO :param args: :return: