New machine learning model can help identify new antibiotics
Advanced computational methods and machine learning could help reduce the high costs and complexity of antibiotic drug discovery. The COMBINE project has developed an Antimicrobial Knowledge Graph with models that can scan compound libraries to identify new antimicrobial compounds. The database and a machine learning model built using the knowledge are now published in the Journal of Chemical Information and Modeling.

Gadiya Y, Genilloud O, Bilitewski U, et al. Predicting Antimicrobial Class Specificity of Small Molecules Using Machine Learning. Journal of Chemical Information and Modeling. Published online February 23, 2025. doi:10.1021/ACS.JCIM.4C02347.
Antimicrobial resistance (AMR) is rapidly depleting the number of useful antibiotics. At the same time, the development of new therapeutics has slowed down. As part of the COMBINE project, researchers at Fraunhofer ITMP in Hamburg, have led work within the AMR Accelerator programme to develop a machine learning (ML) model, trained on publicly available in vitro data. The AntiMicrobial Knowledge Graph (KG) represents a knowledge graph with the first ever MIC (minimum inhibitory concentration) aggregated dataset in a FAIR-compliant format. According to Yojana Gadiya, who coordinated this effort, the models are customizable and open source. They are also transparent, making it possible to decipher the physicochemical properties required for bacterial and fungal activity, supporting chemical optimization in antimicrobial drug discovery.
Making selections of compounds based on model predictions could eventually decrease the experimental cost associated with antimicrobial screening. As the model is pre-trained, it can be used to build a compound library from scratch with chemicals that have a higher tendency to demonstrate activity in vitro. It is also cost-effective: instead of using high-throughput screening of different compound libraries to identify an active compound, the model can identify a subset of compounds that are more likely to be active. By using the ML model in early compound screening, the authors showed that the cost associated with screening can be reduced substantially through the ML predictions of any compound libraries, by filtering them into smaller subsets with a higher probability of activity.
The cost associated with screening can be reduced substantially through the ML predictions of any compound libraries, by filtering them into smaller subsets with a higher probability of activity.
The model is trained on the AntiMicrobial-KG database, representing an aggregation of public bioassay datasets in a FAIR-compliant format. It has generated the largest training set hitherto used for ML applied to antimicrobial activity. However, both model and code can be trained on external datasets that drug developers already have access to, and to expand the applicability and confidence of model predictions for research and development.
According to Yojana Gadiya, the AntiMicrobial-KG can help accelerate research in AMR drug discovery efforts by filtering out compounds that have lower chances of being active. The model has already identified some important physicochemical properties that can help identify compounds with antimicrobial properties.
“For example, it seems that hydrogen bond donors play an essential role in rewarding active from inactive compounds. This may in turn confirm that cell permeability plays a deep role in dictating antimicrobial activity independently from targets”, says Yojana Gadiya.
About the AntiMicrobial-KG
The AntiMicrobial-KG was developed within the framework of the Innovative Medicines Initiative (IMI) AMR Accelerator program’s Scientific Interest Group on Machine Learning, coordinated by the COMBINE project. The AntiMicrobial-KG is a repository for collecting and visualizing public in-vitro antimicrobial assays. Utilizing this data, the AMR Accelerator projects have built ML models to efficiently scan compound libraries to identify compounds with the potential to exhibit antimicrobial activity. Using Random Forest and XGBoost algorithms, Antimicrobial-KG has developed classification models on four classes of microorganisms (i.e. gram-positive, gram-negative, acid-fast, and fungi) that outperform existing models. The ML model was tested on the EU-OPENSCREEN screening library to demonstrate its applicability in a laboratory setting.
Antimicrobial-KG uses Python scripts for model training, exploratory analysis, and KG generation, available from GitHub. The data collected with the AntiMicrobial-KG and models are available on Zenodo. The AntiMicrobial-KG website at SciLifeLab allows users to search the database and use the pre-trained models for compound activity prediction.
About COMBINE
COMBINE has a coordinating role in the AMR Accelerator, and has a scientific mission aiming to improve 1) the design and analysis of clinical trials, and 2) animal infection model reproducibility and translation to clinical efficacy. COMBINE has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under Grant Agreement No 853967.
About the AMR Accelerator
The AMR Accelerator programme was launched in 2019, with the aim to accelerate the development of medicines for patients suffering from infections with drug-resistant Mycobacterium tuberculosis, nontuberculous mycobacteria (NTM), and Gram-negative bacteria, and build capability for antibiotics research and development. The program is funded by the Innovative Medicines Initiative (IMI). The AMR Accelerator program includes nine projects: AB-Direct, COMBINE, ERA4TB, GNA NOW, PriMAVeRa, RespiriNTM & RespiriTB, TRIC-TB, and UNITE4TB. Together, the projects have a €479 million budget. The 98 partners represent key stakeholders from academia, industry, small- and medium-sized companies, patient organizations, regulators, and Health Technology Assessment.
Publication:
Gadiya Y, Genilloud O, Bilitewski U, et al. Predicting Antimicrobial Class Specificity of Small Molecules Using Machine Learning. Journal of Chemical Information and Modeling. Published online February 23, 2025. doi:10.1021/ACS.JCIM.4C02347.
More information & contact:
Yojana Gadiya, Fraunhofer Institute for Translational Medicine and Pharmacology ITMP
E-mail: yojana.gadiya@itmp.fraunhofer.de
Philip Gribbon, Fraunhofer Institute for Translational Medicine and Pharmacology ITMP
E-mail: philip.gribbon@itmp.fraunhofer.de