TxGemma

TxGemma is a collection of machine learning (ML) models that generate predictions, classifications or text based on therapeutic related data. The models can be used to efficiently build AI models for therapeutic-related tasks (for example, classifying molecules by properties or toxicities), requiring less data and less compute than having to fully train a model without the pretrained model.

TxGemma has been trained on a diverse dataset of instruction pairs from the Therapeutics Data Commons (TDC) dataset.

For details about how to use the model and how it was trained, see the TxGemma model card.

Common Use Cases

The following sections present some common use cases for the model. You're free to pursue any use case, as long as it adheres to the Health AI Developer Foundations terms of use.

Therapeutic Prediction

TxGemma can be used for making predictions relating to various therapeutic tasks drawn from TDC datasets. These predictions include classification (multiple choice), regression (numeric value), or generation. Examples include:

Classification

Given a drug SMILES string, the model can:

  • Predict the drug's toxicity.
  • Predict whether the drug can cross the blood-brain barrier.
  • Predict whether the drug is active against a certain protein e.g., a choline transporter.
  • Predict whether the drug is a carcinogen.

Regression

The model can be used predict regressions; for example:

  • Given a drug SMILES string, predict the lipophilicity.
  • Given a drug SMILES string and a cell line description, predict the drug sensitivity level.
  • Given the target amino acid sequence and compound SMILES string, predict their binding affinity.
  • Given a disease description and the amino acid sequence of a gene, predict their association.

Generation

  • Given a product, the model can generate the reactant set.

Conversational Abilities

TxGemma-Chat models also have conversational abilities, allowing users to converse with using natural language, as with general LLMs. Importantly, this conversation can ask TxGemma about the reasons for its own therapeutic predictions.

Agentic Orchestration

TxGemma can be used as a tool within an agentic system. For example, TxGemma can be prompted to form tools that an agent can use as part of a more complex reasoning task e.g., to provide toxicity, mutagenicity or phase 1 trial viability predictions.

TxGemma's chat capabilities can also be used to help understand the basis of predictions, which can guide the agent's reasoning to explore alternative approaches. These TxGemma tools can be augmented with other tools, like search or other molecular, gene, or protein lookups to provide more flexibility. We have included an example Colab notebook showing how this can be achieved as part of the Agentic-Tx example in our preprint.

Fine-tuning

TxGemma can be fine-tuned for improved performance on the existing tasks it's been trained on, or to add additional tasks to its repertoire. To fine-tune TxGemma, you must have a dataset formatted for instruction tuning.

For an example of how to fine-tune TxGemma, including the format required of the dataset, see the TxGemma fine-tuning notebook in Colab.

Next Steps