Case study: Refine queries and manage context

This section describes advanced agent capabilities, including correcting flawed natural language queries and effectively managing conversation context for natural interaction.

Correct flawed natural language queries

Correcting flawed natural language queries is a core capability that improves agent effectiveness, involving:

  • Challenge: to achieve reliable understanding and accurate downstream processing, the agent must effectively handle inherent imperfections and ambiguities in raw user queries. For example, typos, grammatical errors, vague phrasing, and speech recognition mistakes.
  • Insight and approach: to enhance response accuracy, user queries can be automatically corrected and reformulated. This process involves identifying and fixing issues like typos, recognition errors, or grammatical mistakes, often using surrounding context for better interpretation and user preferences. The query is then rephrased to be clearer and more specific, increasing the likelihood of receiving a comprehensive and correct answer.
  • Example: in the context of the Google Maps navigation assistant, the query "Okey, what's the best Ruth to get there?" could be improved to "Okay, what is the best wheelchair accessible route to the Pacific Science Center from my current location, avoiding stairs and preferring curb cuts?" This corrects the typos, resolves the ambiguity of "there" using conversation history to mean Pacific Science Center, and adds accessibility constraints from the user profile: wheelchair accessible, avoid stairs, prefer curb cuts.

The following sections illustrate a hypothetical navigation assistant where conversation history and user accessibility preferences help interpret a flawed request. The examples presented here relate to navigation, but this approach can be expanded to other use cases, as the LLM ultimately transforms the vague query into a precise, personalized, and actionable instruction for the assistant.

Response setup

The following example imports necessary tools like Google AI SDK and Node.js readline, and defines the RephrasedQueryResponse interface, specifying the required JSON output format, including the improved query and the reason for changes.

import { GoogleGenerativeAI, GenerativeModel, SchemaType } from "@google/generative-ai";
import * as readline from 'node:readline/promises';
import { stdin as input, stdout as output } from 'node:process';

// Define the structure for the expected JSON response from the LLM
// This is what we want the LLM to generate: a refined query and the reason for changes.
interface RephrasedQueryResponse {
    rephrasedQuery: string; // The corrected and augmented query
    reason: string;          // Explanation of the changes made
}

Define context and instruct the LLM

This core example demonstrates how an LLM like Gemini leverages context. It defines a CONTEXT object for a hypothetical navigation assistant, encompassing location, history, and an accessibility profile. The constructPrompt function then provides detailed instructions for the LLM to use this context, refining queries by correcting errors, resolving ambiguity, and augmenting with relevant details, especially accessibility needs.

// --- Defining the context ---
// This object bundles crucial information the LLM needs to understand the user's situation.
const CONTEXT = {
    // Relevant location data (imagine the Navigation API offers this).
    currentLocation: "Near Museum of Pop Culture (MoPOP), Seattle",
    // Recent relevant conversation turns help resolve ambiguity (like "there")
    conversationHistory: [
        { role: "user", text: "Tell me more about the Pacific Science. Is it good for kids?" },
        { role: "assistant", text: "Yes, the Pacific Science is very popular...receives good reviews for accessibility..." } // Shortened
    ],
    // Key user information, that can be provided by the user or inferred.
    userProfile: "Uses a manual wheelchair. Difficult navigating stairs or steep inclines (>5% gradient). Prefers smooth pavement and routes with curb cuts. Preferred mode for suitable routes is self-propelling (rolling); otherwise, considers alternative transport."
};

// --- Instructing the LLM ---
// This function builds the prompt that guides the LLM's refinement process.
// (Note: This is part of the larger refineQueryWithContext function shown fully later)
function constructPrompt(userQuery: string, context: typeof CONTEXT): string {
    // Construct the detailed prompt instructing the LLM on how to use the context
    const prompt = `
You are an AI assistant integrated into a navigation application. Your task is
to refine user queries to be more precise and actionable for the routing engine,
especially considering user context like accessibility needs.

Analyze the provided user query, considering the user's profile, conversation
history, and potentially relevant location information.

**Instructions:**

1.  **Correct Errors:** Identify and correct potential spelling or grammatical
    errors... based on the surrounding words and conversation... Only make
    corrections if they significantly improve clarity or likely match user
    intent based on context.
2.  **Resolve Ambiguity:** Clarify ambiguous references (like "there", "it",
    "that place") using the conversation history...
3.  **Augment for Precision:** Enhance the query by adding relevant details
    derived from the provided context (user profile, conversation)... Focus
    particularly on incorporating **accessibility requirements** mentioned in
    the user profile... e.g., adding terms like 'wheelchair accessible' or
    specifying needs like 'avoid stairs'...
4.  **Formulate Query:** ...formulate a clear, specific query suitable for
    querying a detailed map routing engine.
5.  **Explain Changes:** Provide the refined query and a brief explanation... in
    the specified JSON format.

**Context:**

[User Profile]
${context.userProfile}

[Conversation History]
${context.conversationHistory.map(msg => `${msg.role}: ${msg.text}`).join('\n')}

[Potentially Relevant Location Info]
Current Location: ${context.currentLocation}
(Note: Use location info only if relevant...)

**User Query to Refine:**
"${userQuery}"

**Output Format (JSON):**
{
    "rephrasedQuery": "The precise, augmented query for the routing engine.",
    "reason": "Brief explanation of corrections and augmentations made based on context and profile."
}
`;
    return prompt;
}

Execute the API call

The following example shows the technical step in our example: calling the LLM API, such as Gemini. It sends the prompt from example 2 and uses the API's generationConfig feature to reliably get structured JSON output matching our goal. It also includes logic to handle and parse this response.

/**
 * Sends the prompt to the LLM and parses the structured JSON response.
 * (This completes the refineQueryWithContext function started in example 2)
 */
async function callApiModelAndParse(
    model,
    prompt // The prompt constructed in the previous example
) {
    try {
        // Call the Gemini API's generateContent method
        const result = await model.generateContent({
            contents: [{ role: "user", parts: [{ text: prompt }] }],
            generationConfig: {
                // Request JSON output and define the expected schema
                responseMimeType: "application/json",
                responseSchema: {
                    type: "OBJECT", // SchemaType.OBJECT becomes "OBJECT"
                    properties: {
                        rephrasedQuery: { type: "STRING" }, // SchemaType.STRING becomes "STRING"
                        reason: { type: "STRING" }
                    },
                    required: ['rephrasedQuery', 'reason'],
                },
            },
        });
        const response = result.response;

        // --- Basic Validation and Parsing ---
        if (!response || !response.candidates || !response.candidates[0].content.parts[0].text) {
            console.warn("Received an empty or invalid response structure...");
            return null;
        }
        const responseText = response.candidates[0].content.parts[0].text;

        try {
            // Parse the JSON string
            const parsedResponse = JSON.parse(responseText); // Remove `as RephrasedQueryResponse`
            if (!parsedResponse || !parsedResponse.rephrasedQuery || !parsedResponse.reason) {
                console.warn("Parsed JSON response is missing required fields...");
                return null;
            }
            return parsedResponse; // Return the successfully parsed object
        } catch (error) {
            console.error("Error parsing JSON response:", error);
            return null;
        }
    } catch (error) { // Remove `: any`
        console.error("Error calling the Gemini API:", error);
        return null;
    }
}

Orchestrate for integration

The main function demonstrates the complete workflow of using an LLM, such as Gemini, for query refinement. It initializes the API, gets user input, then calls constructPrompt from example 2 and callApiModelAndParse from example 3 to use the LLM context understanding and structured output capabilities. Finally, it displays the original versus the LLM-refined query, showcasing the approach effectiveness.

// Main function to run the workflow
async function main() {
    const apiKey = await getApiKey(); // You can replace this with your API key.
    const defaultQuery = "okey, whats the best Ruth to get there";

    // Get query from user
    const rlQuery = readline.createInterface({ input, output });
    let userQuery = await rlQuery.question(`\nEnter query (default: "${defaultQuery}"): `);
    rlQuery.close();
    userQuery = userQuery.trim() || defaultQuery;

    console.log("\nInitializing Model...");
    const genAI = new GoogleGenerativeAI(apiKey);
    const model = genAI.getGenerativeModel({ model: "gemini-2.0-flash-exp" });
    const prompt = constructPrompt(userQuery, CONTEXT); // Using function from example 2
    console.log("Refining query using context...");
    const rephrasedQueryResult = await callApiModelAndParse(model, prompt); // Using function from example 3
    if (rephrasedQueryResult) {
        console.log("\n--- Query Refinement Result ---");
        console.log("Original Query: ", userQuery);
        console.log("Rephrased Query:", rephrasedQueryResult.rephrasedQuery);
        console.log("Reasoning:       ", rephrasedQueryResult.reason);
        console.log("---------------------------\n");
    } else {
        console.log("\nFailed to refine the query.\n");
    }
}

main().catch(console.error);

Manage context and conversation

Effective context and conversation management is crucial for building capable AI agents and involves the following:

  • Challenge: the agent needs to understand the context of a query (e.g., what video segment the user is asking about) and remember the flow of the conversation for follow-up interactions.
  • Context gathering insight: a critical step identifies the relevant pieces of information, such as specific video frames or related transcript sections, pertaining to the user's current query. This extracted context becomes essential input for the agent's prompt. Getting this right context using Gemini was a key learning.
  • Conversation history insight: maintaining conversational state is crucial for natural interaction. For example, if a user says, "Change setting X to Y," and the agent replies "Okay," the user might then follow up with "Actually, can you go back to the previous settings?" You need to keep track of these past conversation components.
  • Gemini sessions approach: newer versions of the Gemini API greatly simplify conversation management through built-in session handling. By interacting within a session, the API automatically takes previous turns into account, significantly reducing the burden on the developer to manually track and re-inject conversation history. Use this session feature.
  • Code example implementation: this example adapts the previous navigation query correction concept, "Correcting flawed natural language queries." It now uses Gemini's chat sessions and function calling, which are key tools. Chat sessions simplify development by automatically managing conversation history, eliminating manual tracking. Function calling provides a clear purpose for the LLM refined query: directly invoking a specific action like find_route.

Model configuration with system instruction and tools

Unlike previous examples, this one defines the agent's role using a system instruction. This instruction provides persistent guidance, detailing the AI role, how to use conversation history, and crucially, the steps to internally refine a user's route request, such as correcting errors or resolving ambiguity, before calling the defined find_route tool. Providing the tool definition also makes the AI aware of the find_route function itself.

const SYSTEM_INSTRUCTION = `You are an AI assistant integrated into a navigation application.
Maintain awareness of the ongoing conversation history.
Your primary goal is to be helpful.

WHEN THE USER ASKS FOR A ROUTE OR DIRECTIONS:
1.  **CRUCIAL** Do not ask follow up questions... Do your best to understand and answer...
1.  **Analyze Request:** Understand the user's request using the conversation as context.
1.  **Refine Query Internally:** Before calling any tool, refine the user's request. Consider:
    * Correcting potential errors.
    * Resolving ambiguous references ('there', 'it')
    * Augmenting with details like origin, destination, and relevant user needs from the user profile'.
1.  **Call Tool:** Invoke the 'find_route' function, passing the fully refined, detailed query and reasoning.
1.  **Respond to User:** After the tool provides its result, formulate a natural language response.

For general chat respond directly without calling the tool.

**User Profile**
...${STATIC_CONTEXT.userProfile}
**Current Location**
...${STATIC_CONTEXT.currentLocation}.
`;

// --- Tool definition ---
// In JavaScript, you define the tool structure as a plain object.
// The FunctionDeclaration and Tool types are TypeScript specific.
const findRouteTool = {
    name: "find_route",
    description: `Calculates and finds a suitable route... The query should be reformulated first...`,
    parameters: { /* ... rephrasedUserQuery, reasoning ... */ } // Define parameters as a plain object as well
};

// --- Model initialization ---
// The tool type is omitted, and functionDeclarations is an array of plain objects.
const tools = [{ functionDeclarations: [findRouteTool] }];
const model = genAI.getGenerativeModel({
    model: "gemini-2.0-flash-exp",
    systemInstruction: SYSTEM_INSTRUCTION,
    tools: tools,
});

Start the chat and send the raw user query

The following example demonstrates the simplified interaction flow thanks to session handling. We start a ChatSession which automatically manages history. After an initial turn providing context discussing the Science Center, we send the user's raw, potentially flawed query, for example 'Whats the best Ruth [sic] to get there', directly using chat.sendMessage(). We don't need to manually bundle history. The model uses the session context and its system instructions to understand this ambiguous query refers to the Science and needs refinement before potentially calling the find_route tool.

const chat = model.startChat({
    history: []
});

console.log("Chat session started...");
console.log("Simulating conversation...");

// --- Turn 1: (Simulated - Provides context for Turn 2) ---
const query1 = "Tell me more about the Pacific Science accessibility.";
console.log(`\nUser: ${query1}`);
let result = await chat.sendMessage(query1);
/* ... handle response 1 ... */
console.log(`Assistant: ${result.response.text()}`);


// --- Turn 2: User asks for directions ---
const query2_raw = "Whats the best Ruth to get there"; // Example flawed query
console.log(`\nUser: ${query2_raw}`);
// Send the query directly. Session handling means we don't re-send history.
// The model uses session context + system instructions to process this.
result = await chat.sendMessage(query2_raw);
let response = result.response;

// --- Function calling handling loop (starts here) ---
// (Loop checks response.candidates for functionCall - see next example)

Handle the function call and observe the refined query

The function calling mechanism works as follows:

  1. The code checks the LLM response for a functionCall.
  2. When the find_route call is detected, the local find_route function runs with arguments from the LLM. Logging args.rephrasedUserQuery and args.reasoning within find_route shows how the LLM, using session history and instructions, transforms a vague query like "best Ruth [sic] to get there" into a precise, context-aware, and personalized request before the tool is invoked.
  3. An example output displays: "What is the best wheelchair accessible route from near the Museum of Pop Culture (MoPOP), Seattle to the Pacific Science Center, avoiding stairs and steep inclines and preferring smooth pavement and curb cuts?"
// --- Tool implementation ---
async function find_route(
    { rephrasedUserQuery, reasoning } // Destructure directly, types removed
) {
    console.log(`\n--- TOOL CALL: find_route ---`);
    // **** KEY: Observe the LLM's refined query ****
    console.log(`Received refined query: "${rephrasedUserQuery}"`);
    console.log(`Reasoning: "${reasoning}"`);
    /* ... rest of placeholder implementation ... */
    return { routeInfo: `Route calculation based on '${rephrasedUserQuery}' completed. No route found (placeholder).` };
}


// --- Inside the main function's try block, after sending query2_raw ---
    let completedFunctionCall = false; // Add semicolon
    while (!completedFunctionCall) {
        let functionCall = response.candidates?.[0]?.content?.parts?.[0]?.functionCall;
        if (functionCall) {
            completedFunctionCall = true; // Assume one call for simplicity here
            console.log(`\nAssistant requested Function Call: ${functionCall.name}`);
            const { name, args } = functionCall;
            if (name === "find_route") {
                // Cast removed, as is common in JS where types aren't enforced at runtime
                const apiResponse = await find_route(args);
                result = await chat.sendMessage([{ functionResponse: { name: "find_route", response: apiResponse } }]);
                response = result.response;
            } else { /* ... handle unexpected call ... */ }
        } else {
            // Logic if LLM didn't call the function (e.g., asked clarification)
            console.log(`Assistant: ${response.text()}`);
            // Break or handle follow-up... (Simplified loop exit in example)
            completedFunctionCall = true; // Exit if no function call received
        }
    }
    // Get final text response after loop...
    console.log(`\nAssistant (Final Response ): ${response.text()}`);

Approach for models without built-in session handling

Manually managing conversation context requires careful implementation within your application. You need to store the history, strategically select relevant parts for each API call while respecting the model's context window limit, and format the prompt correctly. This adds complexity compared to using models with built-in session handling but is essential for creating stateful conversational experiences with stateless APIs.

To implement application-managed history, see Manage application history for more information.

What's next