The AskCPG project by Qmed Asia was developed in collaboration with the Malaysian Health Technology Assessment Section (MaHTAS) to make the nation's Clinical Practice Guidelines (CPGs) accessible through an AI-powered Retrieval-Augmented Generation (RAG) system. The platform allows clinicians and medical professionals to query the CPGs conversationally, receiving evidence-based responses grounded in official guidelines.
The Challenge: moving beyond text-only queries
While the existing text-based RAG engine excelled at resolving clinical queries, it couldn't interpret medical images, which are a critical component of clinical reasoning. Many guideline inquiries are inherently visual, requiring imagery to determine disease stages and management pathways. This text-only limitation reduced the platform's utility in visually driven specialties and hindered its adoption in imaging-heavy clinical settings.
Integrating visual reasoning into the RAG framework introduced major computational and operational challenges. Most existing vision-language models (VLMs) were too large for efficient self-hosting, lacked medical domain alignment, or generated unreliable, non-factual image captions. Closed-box API solutions offered more capabilities but came with high recurring inference costs and required complicated data privacy governance and mutual agreements with the API provider.
The Solution: fine-tuning MedGemma for image captioning
To give AskCPG 'eyes' without overloading the system, Qmed Asia fine-tuned MedGemma-4B-IT model and used it as a translator. MedGemma looks at the medical images and translates their content into structured text captions detailing image type, anatomy, key findings, and clinical significance. AskCPG then reads these captions alongside standard clinical guidelines to deliver the final grounded answers that account for both the text and the visuals in the query.

This diagram illustrates the fine-tuning and evaluation workflow for MedGemma. For full methodology and evaluation results, refer to the technical report. At a high level, the process consists of four key steps:
Data Preparation: The process begins with gathering a diverse clinical dataset consisting of Chest X-ray images (n = 112,120), Fundus images (n = 5,660), and Dermatology images (n = 10,000). To ensure a balanced and statistically sound representation across these varying medical specialties, the framework applies Stratified Random Sampling.
Knowledge Distillation: In the second stage, the sampled dataset is processed through a teacher model (Open AI GPT-5) to extract high-level clinical intelligence. The final output of this distillation process yields a paired dataset consisting of a ground truth label (n = 1,676) and its corresponding medical captioning (n = 1,676) for the selected images.
Fine-Tuning: With the distilled dataset prepared, the framework introduces the student model (MedGemma-4B-IT). To efficiently adapt the model, the system utilizes Quantized Low-Rank Adaptation (QLoRA) fine-tuning. This parameter-efficient training method injects low-rank matrixes into the model architecture, successfully transforming the base student model into a specialized fine-tuned variant.
Model Evaluation: The final step is the validation process to ensure clinical reliability and factual accuracy. The fine-tuned MedGemma-4B-IT model generates text captions from medical image inputs, and its performance is evaluated across two critical metrics. First, it undergoes a ground truth prediction test to see how well it identifies clinical features. Second, it is subjected to factual and contextual reliability assessment to measure faithfulness and correctness; see the technical report for full evaluation data.
Real world impact and clinician adoption
"MedGemma gave our RAG system the ability to 'see' — turning visual findings into structured, factual insights while keeping data private and costs low. It delivers large-model performance in a practical, privacy-focused way for real-world healthcare."
— Goh Man Fye, Chief AI Officer, Qmed Asia
Running a fine-tuned MedGemma model on Qmed Asia’s infrastructure preserved data privacy, reduced inference costs by over 60%, and integrated seamlessly into the existing RAG stack. This milestone positions AskCPG as one of Malaysia’s first multimodal, AI-powered clinical guideline assistants.
Since the multimodal version of AskCPG launched in July 2025, adoption has grown steadily, now serving over a thousand healthcare professionals across Malaysia each month. The integration of the fine-tuned MedGemma model introduced multimodal search capabilities - combining text and image inputs - which boosted user engagement by 30%, particularly for queries in ophthalmology, dermatology, and respiratory care.
Pilot feedback has been highly positive, with users describing the system as “closer to a clinical assistant” due to its ability to incorporate and contextualise medical images. Internal evaluations also showed improved reliability and stronger alignment with clinical guidelines.
Looking ahead: localized models and hospital deployments
Qmed Asia continues to enhance MedGemma’s role in AskCPG by fine-tuning it on broader and more diverse datasets to improve performance across domains such as radiology, pathology, and primary care. The team also plans to train MedGemma on Malaysia’s Clinical Practice Guidelines corpus to develop a localized Malaysia Medical LLM, aligned with national healthcare standards and linguistic context.
Following the example of this implementation, future deployments will also prioritize sovereignty through self-hosted infrastructure within hospitals and health institutions. This approach simplifies compliance with medical data governance while enabling secure, high-performance multimodal AI for Malaysia’s healthcare ecosystem.