Open Protein Benchmark is a progressing project for a standardized research of machine learning on proteins.
We extracted several structure-aware protein property prediction tasks from previous literatures. These benchmark datasets are categorized to node-level predictions - binding site prediction - and graph-level predictions - physico chemical property prediction.
We also aim to provide a convenient functions for initiative developers who are not familer to the protein dataformat, such as .pdb and .mol2.
For self-supervised learning utilizing large-sized bulk structures provided in Protein Structure Database (PDB) or AlphaFold Database, we build preprocessing codes which converts protein structures to graph objects supported by pytorch geometric.
Open Protein Benchmark directly inherits functions and user interfaces of Open Graph Benchmark, which becomes a standard framework for machine learning with molecules.
Specifically we adopt the concept of chemical bonding to the proteins, so that edges connect two nodes iff non-covalent bonding exists between them.
We aim to make Open Protein Benchmark as a standard benchmark dataset as MoleculeNet becomes a global standard.
Since this is a growing project, please be aware that the package may not be working at current stage.