LangExtract
Stay organized with collections
Save and categorize content based on your preferences.
LangExtract is a general-purpose Natural Language Processing (NLP) library
designed to structure and ground information extracted from unstructured text
using Large Language Models (LLMs). It is particularly well-suited for tasks
such as information extraction, entity recognition, and content structuring,
making it useful across multiple healthcare use cases. It supports integration
with a variety of LLMs, including Gemini, enabling users to create
versatile information extraction workflows.
An example use case of LangExtract is RadExtract, a specialized
implementation tailored for radiology reports using the power of Gemini 2.5.
LangExtract allows users to define structured prompt templates for grounded
information extraction, ensuring outputs maintain clear and precise references
to the original source text.
RadExtract transforms unstructured radiology narratives into clear, structured
sections with section headers, improving the readability and clinical utility of
the data. For an example of report structuring with grounding, see the
RadExtract demo on HuggingFace.
RadExtract is one of many use cases where the LangExtract library could be
useful. We encourage you to explore other use cases!
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-07-30 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-07-30 UTC."],[],[],null,["# LangExtract\n\n[LangExtract](https://github.com/google/langextract) is a general-purpose Natural Language Processing (NLP) library\ndesigned to structure and ground information extracted from unstructured text\nusing Large Language Models (LLMs). It is particularly well-suited for tasks\nsuch as information extraction, entity recognition, and content structuring,\nmaking it useful across multiple healthcare use cases. It supports integration\nwith a variety of LLMs, including [Gemini](https://ai.google.dev/gemini-api), enabling users to create\nversatile information extraction workflows.\n\nRadiology report structuring with RadExtract\n--------------------------------------------\n\nAn example use case of [LangExtract](https://github.com/google/langextract) is **RadExtract**, a specialized\nimplementation tailored for radiology reports using the power of Gemini 2.5.\nLangExtract allows users to define structured prompt templates for grounded\ninformation extraction, ensuring outputs maintain clear and precise references\nto the original source text.\n\nRadExtract transforms unstructured radiology narratives into clear, structured\nsections with section headers, improving the readability and clinical utility of\nthe data. For an example of report structuring with grounding, see the\n[RadExtract demo](https://huggingface.co/spaces/google/radextract) on HuggingFace.\n\n**RadExtract** is one of many use cases where the LangExtract library could be\nuseful. We encourage you to explore other use cases!"]]