These improvements focus on making the agentive interface usable for individuals with diverse accessibility needs.
Provide compatibility with standard assistive technologies
Compatibility with standard assistive technologies is crucial for making agentive interfaces broadly accessible and involves:
- Challenge: agentive interfaces, particularly novel ones relying on custom UI elements or interaction patterns, may not be inherently compatible with standard assistive technologies like screen readers. This prevents users who rely on these tools from effectively interacting with the agent.
- Insight and approach: design and implement the agent's interface components in accordance with established accessibility guidelines such as WAI-ARIA attributes and semantic HTML. Components can encompass text outputs, input areas, and controls.
- Example: When the agent provides a text response, such as answering a
question, the HTML container displaying this text should use appropriate
ARIA attributes such as
aria-live="polite"
. This enables screen readers to automatically announce the new response to the user without requiring manual navigation. Button controls for agent actions, for example, 'Start Listening' or 'Send Text Query', must have clear, descriptive text labels oraria-label
attributes that are announced correctly by screen readers.
Proactive agent introduction for user orientation and collaboration
It's important for an agent to proactively introduce itself, which can result in better user orientation and fostering of user and LLM collaborative interaction, addressing the following:
- Challenge: when an agentive session or mode begins, users might be unaware that it has started or uncertain about what the agent enables them to do and how to interact with it. This can lead to confusion or missed opportunities.
- Insight and approach: user tests show a preference for a collaborative relationship with the AI agent. To foster this and orient the user, the agent should proactively introduce itself immediately upon activation. This introduction should briefly state its identity and outline its core functionalities or the expected interaction paradigm.
- Example: upon being enabled for a video player, the agent could output using Text-to-Speech and display the text: "Hi, I'm Blue, your agent, and I'm active now. You can ask me to describe what's happening, control playback like 'Go back 30 seconds,' or ask questions about the video content using your voice."
- Code example implementation: this example demonstrates integrating the
gemini-2.0-flash
model, which has built-in Google Search capabilities, into an interactive web UI. It shows how system instructions are used to define the assistant's persona, initial behavior such as a proactive introduction, and guidelines for using its capabilities. Moreover, it shows how built-in tools such as Google Search can be enabled using configuration. This can simplify your work by removing the need to implement custom search logic or external API calls for basic web retrieval.
Study code examples for frontend-backend interaction
To implement the backend and frontend components, including their data flow and architectural choices, study the following code examples.
Configure the backend model and set up the server
To configure the backend model and set up the server in src/server.ts
, use
the following snippet as an example. This code defines the essential setup,
which includes configuring the googleSearchTool
to enable built-in search
capabilities. It also includes the SYSTEM_INSTRUCTION
to provide the AI with its persona, introductory prompts, and guidelines for
using the configured search tool.
// --- Configuration ---
const API_KEY = "YOUR_API_KEY";
const MODEL_NAME = "gemini-2.0-flash";
const PORT = process.env.PORT || 5000;
if (!API_KEY) {
console.error("Error: GOOGLE_API_KEY environment variable not set.");
process.exit(1);
}
// --- Express app setup ---
// Assuming 'express' and 'path' are imported, e.g.:
// const express = require('express');
// const path = require('path');
// const { GoogleGenerativeAI, HarmCategory, HarmBlockThreshold } = require('@google/generative-ai');
const app = express();
app.use(express.json()); // Middleware to parse JSON bodies
// --- Serve static frontend files ---
// Serve index.html at the root
app.get('/', (req, res) => { // Removed type annotations for req, res
// Resolve path from 'dist' directory where compiled JS runs
res.sendFile(path.resolve(__dirname, '..', 'index.html'));
});
// Serve assistant.js
app.get('/assistant.js', (req, res) => { // Removed type annotations for req, res
// Resolve path from 'dist' directory
res.sendFile(path.resolve(__dirname, '..', 'assistant.js'));
});
// --- Gemini SDK setup ---
// Assuming GoogleGenerativeAI, HarmCategory, HarmBlockThreshold are imported
const genAI = new GoogleGenerativeAI(API_KEY);
const model = genAI.getGenerativeModel({
model: MODEL_NAME,
// Safety settings using SDK enums (ensure HarmCategory and HarmBlockThreshold are imported)
safetySettings: [
{ category: HarmCategory.HARM_CATEGORY_HARASSMENT, threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE },
{ category: HarmCategory.HARM_CATEGORY_HATE_SPEECH, threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE },
{ category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT, threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE },
{ category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT, threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE },
],
// Tools configuration passed per the following request
});
// --- Tool configuration (Built-in Google Search) ---
// Enabling built-in tool using configuration **
// The `Tool` type is TypeScript specific, use plain object in JS.
const googleSearchTool = {
// Enable Google Search retrieval
googleSearchRetrieval: {} // Use an empty object to enable default search
};
// --- System instruction --- // Defining persona and behavior guidance ** const
SYSTEM_INSTRUCTION = `You are a helpful AI assistant integrated into this
application's UI. You have access to Google Search to find current information.
Your current location context is Sunnyvale, CA.
WHEN ACTIVATED (e.g., user clicks your button for the first time):
1. **Introduce Yourself:** Start the conversation by saying hello and briefly
explaining who you are (the UI Assistant).
2. **State Capabilities:** Clearly list what you can help the user with.
Mention capabilities like answering questions and accessing up-to-date
information using Google Search when necessary.
3. **Invite Interaction:** Prompt the user to ask a question or make a request.
WHEN THE USER ASKS questions requiring current information, facts, or real-time
data (like weather, news, stock prices, future events etc.):
1. **Use Search:** Utilize your Google Search tool internally to find the
Initiate chat and send messages
The following events occur to initiate chat and send messages to the user:
- The frontend, using
assistant.js
, sends the user's prompt and locally storedchatHistory
to the/chat
endpoint on the Node.js server. - In turn, the
server.ts
backend's/chat
route receives this request, parses the JSON, and prepares the contents array (managing the initial message and conversation history). It then constructs theGenerateContentRequest
object, to make sure thegoogleSearchTool
is included in the tools array for the API call.
// src/server.ts (continued)
// --- API route for chat ---
// Handling client requests in the backend **
app.post('/chat', async (req, res) => { // Removed type annotations for req, res
try {
// Removed type annotations for prompt, historyRaw, Content[]
const { prompt, history: historyRaw = [] } = req.body;
const effectivePrompt = prompt ?? ""; // Verify the prompt is a string
console.log(`\nReceived request: Prompt='${effectivePrompt}', History Length=${historyRaw.length}`); // Server log
// Removed type annotation for content[]
let contents = [];
const isFirstMessage = historyRaw.length === 0;
// Simple way to inject system prompt for context
if (isFirstMessage && effectivePrompt) {
contents.push({ role: 'user', parts: [{ text: SYSTEM_INSTRUCTION }] });
contents.push({ role: 'model', parts: [{ text: "Understood. I am ready to introduce myself." }] });
contents.push({ role: 'user', parts: [{ text: effectivePrompt }] }); // Actual intro trigger
} else if (!isFirstMessage) {
contents = [...historyRaw]; // Append history received from client
if (effectivePrompt) { contents.push({ role: 'user', parts: [{ text: effectivePrompt }] }); }
} else { // First message, but empty/no specific prompt from FE
contents.push({ role: 'user', parts: [{ text: SYSTEM_INSTRUCTION + "\nPlease introduce yourself now." }] });
}
if (contents.length === 0) {
console.warn("Contents list is empty after processing history/prompt.");
return res.status(400).json({ error: "Cannot process empty request." });
}
console.log(`Calling generateContent with ${contents.length} content blocks.`);
// Prepare the request object for generateContent
// Removed type annotation for GenerateContentRequest
const generateRequest = {
contents: contents,
tools: [googleSearchTool], // Enable search tool for this specific request
// generationConfig could be added here if needed (e.g., temperature)
};
// --- Make the API call ---
// (Response processing in next section)
const result = await model.generateContent(generateRequest);
const response = result.response;
// --- (Response processing logic) ---
let assistantResponseText = "Processing...";
let history_for_client = historyRaw; // Initialize
// ... response processing ...
// --- Send response to frontend ---
// (Final return statement)
// res.json({ response: assistantResponseText, history: history_for_client });
} catch (error) {
console.error("Error in /chat endpoint:", error);
res.status(500).json({ error: "An internal server error occurred." });
}
});
// --- (Global Error Handler & Server Start defined in next sections) ---
Process model responses in backend and frontend
The code examples in this section illustrate how the backend in server.ts
processes model responses and communicates with the frontend in assistant.js
.
On the backend in server.ts
, after model.generateContent
completes,
in some cases using the search tool internally, the code extracts the textual
response. It then attempts to access any grounding metadata and logs this to the
server console if found. The chat history, including the latest user prompt and
AI response, is updated before a JSON object containing the response text and
complete history is sent to the frontend.
// src/server.ts (/chat route continued)
// --- Make the API call (code from previous section) ---
const result = await model.generateContent(generateRequest);
const response = result.response; // Removed 'as GenerateContentResponse'
// --- Process the response ---
let assistantResponseText = "Sorry, I couldn't generate a response."; // Default
let groundingMetadataContent = null; // Removed ': string | null'
if (response) {
// Get text response using the text() accessor
try {
assistantResponseText = response.text();
} catch (e) {
console.error("Error extracting text() from response:", e);
// Fallback to manually joining parts if text() fails
if (response.candidates && response.candidates[0]?.content?.parts) {
assistantResponseText = response.candidates[0].content.parts.map(p => p.text ?? '').join('');
}
if (!assistantResponseText){ // Final fallback
assistantResponseText = "Sorry, I encountered an issue processing the response format.";
}
}
// --- Log grounding metadata (server-side) ---
// Access grounding metadata safely as shown in user's Python reference
try {
const metadata = response.candidates?.[0]?.groundingMetadata;
const searchEntryPoint = metadata?.searchEntryPoint;
if (searchEntryPoint?.renderedContent) {
groundingMetadataContent = searchEntryPoint.renderedContent;
console.log("\n--- Grounding Metadata (Server Log) ---");
console.log(groundingMetadataContent);
console.log("--- End Grounding Metadata ---\n")
}
} catch(e) {
console.warn("Could not access grounding metadata:", e);
}
} else {
console.warn("API response object was unexpectedly undefined or null.");
// Check for prompt feedback if available on the result object itself
const feedback = result?.response?.promptFeedback;
if (feedback?.blockReason) {
assistantResponseText += ` (Request blocked: ${feedback.blockReason})`;
console.log("Request blocked:", feedback);
}
}
// --- Update history ---
// Removed ': Content' type annotations
const current_user_entry = { role: 'user', parts: [{ text: effectivePrompt }] };
const current_model_entry = { role: 'model', parts: [{ text: assistantResponseText }] };
const history_for_client = [...historyRaw, current_user_entry, current_model_entry];
// --- Send response to frontend ---
console.log(`Sending response: '${assistantResponseText.substring(0,50)}...', History Length=${history_for_client.length}`);
res.json({
response: assistantResponseText,
history: history_for_client,
// grounding: groundingMetadataContent // Optionally send to FE if needed
});
// ... (catch block) ...
// }); // End of app.post('/chat')
Global error handling and server startup are configured as follows:
// --- Global error handler (optional but recommended) ---
app.use((err, req, res, next) => { // Removed type annotations
console.error("Unhandled error:", err.stack);
res.status(500).json({error: 'Something broke on the server!'}); // Send JSON error
});
// --- Start server ---
app.listen(PORT, () => {
console.log(`Server running at http://localhost:${PORT}`);
});
The assistant.js
success handler is responsible for parsing the received JSON
data, displaying the assistant's response in the UI, and updating the local chat
history to synchronize with the backend's state, as shown in the following code
sample:
// assistant.js (fetch success block within sendMessageToBackend)
// ... (inside try block after fetch call)
const fetchResponse = await fetch('/chat', { /* ... */ });
// Attempt to parse JSON regardless of response.ok
const data = await fetchResponse.json().catch(e => {
console.error("Failed to parse JSON response:", e);
// Create a generic error object if JSON parsing fails
throw new Error(`Server returned non-JSON response (Status: ${fetchResponse.status} ${fetchResponse.statusText})`);
});
if (!fetchResponse.ok) {
// Throw an error using the message from parsed JSON or default HTTP status
throw new Error(`HTTP error ${fetchResponse.status}: ${data.error || fetchResponse.statusText}`);
}
if (data.error) {
// Handle errors explicitly returned in the JSON body
throw new Error(`Backend error: ${data.error}`);
}
// Frontend receiving and displaying backend response **
// Display assistant response received from backend
addMessageToChat(data.response, 'assistant');
// Update local chat history with the full history from backend
// This makes sure FE history matches the state used for the next backend call
chatHistory = data.history;
// console.log("Updated History:", chatHistory); // Uncomment for debugging
} catch (error) {
// ... (error logging and display) ...
} finally {
setUiLoading(false);
}
// ... (rest of sendMessageToBackend)
No explicit functionCall
handling is needed for this setup. This is because
the built-in search tool's execution is managed internally by the Gemini API
during the generateContent
call. As a result, the response already
incorporates any search results the model decided to perform.
Built-in search tool implementation
A key advantage of this approach is leveraging the built-in
GoogleSearchRetrieval
tool. The Gemini backend infrastructure handles the
execution of the search when the model deems it necessary based on the prompt
and system instructions.
Example of conceptual UI interaction flow
The example in this section illustrates the expected flow in the web browser and relevant server logs.
UI interaction: User clicks "Activate Assistant"
Frontend log: Sending initial request ("Introduce yourself.") to backend (/chat)
Backend log: Received activation request.
Backend log: Calling Gemini API (generateContent) for introduction...
Backend log: Received intro response from Gemini.
Backend log: Sending intro response to frontend.
Assistant (final answer): Hello! I'm the UI Assistant integrated into this application's UI. I can help you with various tasks, answer your questions, and access up-to-date information using Google Search when necessary. How can I help you today?
User: When is the next total solar eclipse visible from California?
Frontend log: Sending query and history to backend (/chat)
Backend log: Received query: "When is the next total solar eclipse visible from California?"
Backend log: Calling Gemini API (generateContent) with GoogleSearch tool enabled...
Backend log: [Gemini API internally determines search is needed and uses the Google Search Tool...]
Backend log: Received response from Gemini API (including grounding data).
Backend log:
--- Grounding metadata (server log) --- [Web Content] Title: Total Solar Eclipse
of 2045 August 12 - NASA Eclipse Web Site URL:
[https://eclipse.gsfc.nasa.gov/SEplot/SEplot2001/SE2045Aug12T.GIF](https://eclipse.gsfc.nasa.gov/SEplot/SEplot2001/SE2045Aug12T.GIF)
Favicon: ... [Web Content] Title: Solar eclipse of April 8, 2024 - Wikipedia
URL:
[https://en.wikipedia.org/wiki/Solar_eclipse_of_April_8,_2024](https://en.wikipedia.org/wiki/Solar_eclipse_of_April_8,_2024)
Favicon: ... [Web Content] Title: Future eclipses in California, USA - Time and
Date URL:
[https://www.timeanddate.com/eclipse/in/usa/california](https://www.timeanddate.com/eclipse/in/usa/california)
Favicon: ...
--- End grounding metadata ---
Backend log: Sending final answer to frontend. Assistant (Final Answer): The
next total solar eclipse with a path crossing California will be on August 12,
2045. Before that, a significant annular solar eclipse will cross parts of
northern California on October 14, 2023, and there was a major total eclipse
visible across other parts of the US on April 8, 2024, which was visible as
a partial eclipse from California.
User: What's the weather like in Sunnyvale right now?
Frontend Log: Sending query and history to backend (/chat)
Backend Log: Received query: "What's the weather like in Sunnyvale right now?"
Backend Log: Calling Gemini API (generateContent) with GoogleSearch tool enabled...
Backend Log: [Gemini API internally determines search is needed and uses the Google Search Tool...]
Backend Log: Received response from Gemini API (including grounding data).
Backend Log:
--- Grounding metadata (server log) --- [web content] Title: Sunnyvale, CA
Weather Conditions - The Weather Channel URL:
[https://weather.com/weather/today/l/Sunnyvale+CA](https://weather.com/weather/today/l/Sunnyvale+CA)?...
Favicon: ... [Web Content] Title: Sunnyvale, CA Current Weather - AccuWeather
URL:
[https://www.accuweather.com/en/us/sunnyvale-ca/](https://www.accuweather.com/en/us/sunnyvale-ca/)...
Favicon: ...
--- End grounding metadata ---
Backend Log: Sending final answer to frontend. Assistant (Final Answer): The
current weather in Sunnyvale, CA (as of Sunday morning, April 13, 2025) is clear
with a temperature around 62°F.
User: Thanks!
Frontend log: Sending query and history to backend (/chat)
Backend log: Received query: "Thanks!"
Backend log: Calling Gemini API (generateContent)...
Backend log: [Gemini API generates conversational response directly...]
Backend log: Received final answer from Gemini API.
Backend log: Sending final answer to frontend.
Assistant (Final Answer): You're welcome! Let me know if you need information on anything else.
Implement frontend HTML
To create the user interface for the AI assistant, implement the following HTML
structure in index.html
.
<!DOCTYPE html>
<html>
<head>
<title>AI UI Assistant (Node.js Backend)</title>
<style>
body { font-family: sans-serif; max-width: 800px; margin: auto; padding: 20px; }
#chatOutput { height: 400px; border: 1px solid #ccc; overflow-y: scroll; padding: 10px; margin-bottom: 10px; background-color: #f9f9f9; }
.message { margin-bottom: 8px; padding: 8px; border-radius: 6px; line-height: 1.4; white-space: pre-wrap; }
.user { text-align: right; background-color: #d1eaff; margin-left: 20%; }
.assistant { text-align: left; background-color: #e0e0e0; margin-right: 20%; }
.system { font-style: italic; color: #666; font-size: 0.9em; text-align: ; background-color: #f0f0f0; }
#inputArea { display: flex; margin-top: 10px;}
#userInput { flex-grow: 1; padding: 8px; border: 1px solid #ccc; border-radius: 4px 0 0 4px;}
#sendButton { padding: 8px 15px; border: 1px solid #ccc; border-left: none; border-radius: 0 4px 4px 0; cursor: pointer; background-color: #eee; }
#sendButton:disabled { cursor: not-allowed; background-color: #f8f8f8; color: #aaa; }
#assistantButton { display: block; width: 100%; padding: 10px; margin-bottom: 10px; cursor: pointer; background-color: #4CAF50; color: white; border: none; border-radius: 4px; font-size: 1em;}
#assistantButton:disabled { background-color: #aaa; cursor: not-allowed;}
</style>
</head>
<body>
<h1>AI UI Assistant Demo (Node.js Backend)</h1>
<button id="assistantButton">Activate Assistant</button>
<div id="chatOutput"></div>
<div id="inputArea">
<input type="text" id="userInput" placeholder="Ask the assistant..." disabled>
<button id="sendButton" disabled>Send</button>
</div>
<script src="/assistant.js"></script>
</body>
</html>