Case study: Understand user intent

This page describes how to enhance an AI agent's fundamental ability to understand and respond effectively to user needs, starting with user intent classification.

Understanding user intent is a core capability that improves agent effectiveness, involving:

  • Challenge: the agent must accurately determine the user's goal, for example, getting information, changing settings, and saving data.
  • Insight and approach: Gemini excels at classification tasks. Key to this is crafting clear prompts, which enables Gemini to understand user intent for tasks like information retrieval, configuration, note-taking, or source attribution. This classification is vital for accessibility, unifying settings within a natural language interface.
  • Example prompt structure: a prompt might ask Gemini to categorize a user query: "Analyze the following user query: USER_QUERY. Classify the user's primary intent from: INFORMATION_SEEKING, CONFIGURATION CHANGE, NOTE_TAKING, ADDING_SOURCE, OTHER. Output only the category name."

The following sections demonstrate how LLMs like Gemini can understand user intent by classifying queries into categories such as information requests or app setting changes. This capability enhances accessibility by integrating settings into a unified natural language interface, reducing user effort.

Define classification categories

First, define the QueryClassification enum to list user intents, for example, INFORMATION_SEEKING. Then, populate the classifications array with a description and examples for each intent, helping the language model understand each category. The ClassificationResponse interface specifies the expected output, including a predicted category and its reason.

import { GoogleGenerativeAI, GenerativeModel, SchemaType } from "@google/generative-ai";

enum QueryClassification {
    INFORMATION_SEEKING = "INFORMATION_SEEKING",
    UPDATE_APP_SETTINGS = "UPDATE_APP_SETTINGS",
    OTHER = "OTHER",
}

interface ClassificationDefinition {
    name: QueryClassification;
    description: string;
    examples: string[];
}

const classifications: ClassificationDefinition[] = [
    {
        name: QueryClassification.INFORMATION_SEEKING,
        description: "The user is asking for factual information.",
        examples: ["What is the capital of France?"],
    },
    {
        name: QueryClassification.UPDATE_APP_SETTINGS,
        description: "The user wants to modify app settings like dark mode or font size.",
        examples: ["Turn on dark mode."],
    },
    {
        name: QueryClassification.OTHER,
        description: "The user's query does not fit other categories.",
        examples: ["Hello."],
    },
];

interface ClassificationResponse {
    response: Array<{ queryClassification: QueryClassification; reason: string }>;
}

Instruct the LLM for structured classification

The following function crafts a prompt for the language model, listing classification categories and their details. It crucially instructs the model to return a structured JSON answer using responseSchema, resulting in predictable queryClassification and reasoning. Harnessing responseSchema makes certain the LLM's output is consistently formatted as structured JSON, making it predictable and much easier for other applications or processes to reliably parse and utilize for subsequent actions or analysis.

async function classifyQuery(model: GenerativeModel, query: string): Promise<ClassificationResponse | null> {
    const classificationDetails = classifications.map(
        (c) => `${c.name}: ${c.description} (Examples: ${c.examples.join(', ')})`
    ).join('\n- ');

    const prompt = `Classify the following user query into one or more of the following categories:

-   ${classificationDetails}

User Query: "${query}"

Return a JSON object with a field "response" which is an array of objects. Each object should have a "queryClassification" (one of ${Object.values(QueryClassification).join(', ')}) and a "reason" explaining the classification.`;

    try {
        const result = await model.generateContent({
            contents: [{ role: "user", parts: [{ text: prompt }] }],
            generationConfig: {
                responseMimeType: "application/json",
                responseSchema: {
                    type: SchemaType.OBJECT,
                    properties: {
                        response: {
                            type: SchemaType.ARRAY,
                            items: {
                                type: SchemaType.OBJECT,
                                properties: {
                                    queryClassification: {
                                        type: SchemaType.STRING,
                                        format: "enum",
                                        enum: Object.values(QueryClassification),
                                    },
                                    reason: {
                                        type: SchemaType.STRING,
                                    },
                                },
                                required: ['queryClassification', 'reason'],
                            },
                        },
                    },
                    required: ['response'],
                },
            },
        });
        const response = result.response;
        return JSON.parse(response.text()) as ClassificationResponse;
    } catch (error: any) {
        console.error("Error calling the language model API:", error);
        return null;
    }
}

Put the LLM to work for classification

To run the classification, provide your apiKey and userQuery, then call classifyQuery after initializing the language model. The result displays how the model categorized your query and its reasoning, demonstrating natural language understanding for tasks like app setting adjustments.

async function main() {
    const apiKey = "YOUR_API_KEY";
    const userQuery = "Turn on dark mode because it's easier on my eyes.";

    const genAI = new GoogleGenerativeAI(apiKey);
    const model = genAI.getGenerativeModel({ model: "gemini-2.0-flash-exp" });

    const classificationResult = await classifyQuery(model, userQuery);

    if (classificationResult && classificationResult.response) {
        console.log("Query:", userQuery);
        classificationResult.response.forEach(c => {
            console.log(`Classification: ${c.queryClassification}, Reason: ${c.reason}`);
        });
    } else {
        console.log("Could not classify the query.");
    }
}

main();

What's next