Open Protein Benchmark

Protein benchmark for machine learning.

Open Protein Benchmark is a progressing project for a standardized research of machine learning on proteins.

We extracted several structure-aware protein property prediction tasks from previous literatures. These benchmark datasets are categorized to node-level predictions - binding site prediction - and graph-level predictions - physico chemical property prediction.

Open Protein Benchmark provides user-friendly codes for studying downstream tasks dealing with proteins, by converting protein structures to Pytorch Geometric data structure.

We also aim to provide a convenient functions for initiative developers who are not familer to the protein dataformat, such as .pdb and .mol2.

For self-supervised learning utilizing large-sized bulk structures provided in Protein Structure Database (PDB) or AlphaFold Database, we build preprocessing codes which converts protein structures to graph objects supported by pytorch geometric.

By using Open Protein Benchmark, users can easily use bulk structure databases to their own project.

Open Protein Benchmark directly inherits functions and user interfaces of Open Graph Benchmark, which becomes a standard framework for machine learning with molecules.

Specifically we adopt the concept of chemical bonding to the proteins, so that edges connect two nodes iff non-covalent bonding exists between them.

We aim to make Open Protein Benchmark as a standard benchmark dataset as MoleculeNet becomes a global standard.

Since this is a growing project, please be aware that the package may not be working at current stage.