Getting started with the Gemini API and Web apps
Learn how to use the Gemini API and the Google Gen AI SDK for JavaScript and TypeScript to prototype generative AI for web apps. Use the Google Gen AI SDK to make your first generative AI call using the Gemini API in your client-side web application. Explore a sample application and learn how to make multimodal prompts (that combine image and text).
Go back
Introduction to the Gemini API and prompt engineering
Explore Google AI Studio and the capabilities of the Gemini generative AI model. Learn how to design and test the different types of prompts (freeform, structured, and chat), get an API key, and build a simple NodeJS application.
This pathway is useful for further experimentation with Gemini and lays the groundwork for integrating its features into a web application. Optionally, you can also try out the Gemini API using a simple NodeJS web application. Feel free to skip this step and return back to client-side web development in this pathway.
Note that calling the Gemini API directly from your web app using the Google Gen AI SDK is only for prototyping and exploring the Gemini generative AI models. For use cases beyond prototyping (especially production or enterprise-scale apps), use Firebase AI Logic instead. It offers an SDK for Web that has additional security features, support for large media file uploads, and streamlined integrations into the Firebase and Google Cloud ecosystem. Alternatively, you can use the Google Gen AI SDK for JavaScript and TypeScript to access the Gemini models server-side.
Try out the Gemini API template in Firebase Studio
Try out the Gemini API template in Firebase Studio to quickly get started and experiment with a JavaScript-based web app that uses generative AI. The template contains a fully functioning app for you to quickly prototype with the Gemini API on the web.
Firebase Studio is a web-based integrated development environment. It supports a variety of frameworks, including development for both web and cross-platform applications. It is currently available in Public Preview.
The template uses the Vite framework to build a web app that makes multimodal prompts to the Gemini API using the Google AI SDK directly or using Genkit.
To get started, follow the steps to create a new workspace using the "Gemini API" template. Select the "JavaScript Web App" environment and follow the guide to add your Gemini API key and run the application.
Introduction to the Google Gen AI SDK for JavaScript and TypeScript
The Google Gen AI SDK for JavaScript and TypeScript enables you to build your generative AI integration with the Gemini Developer API.
If you're calling the Gemini API directly from your mobile or web app, the Google Gen AI SDK for JavaScript and TypeScript is only for prototyping. There are additional security considerations for using the Gemini API key in your web client applications since you're risking exposing this API key to malicious actors if it's embedded or retrieved by your client application. So, for use cases beyond prototyping (especially production and enterprise-scale apps), migrate to Firebase AI Logic to access Google's generative AI models directly from your client app. Alternatively, you can use the Google Gen AI SDK to access the models server-side instead.
To get started with the Google Gen AI SDK for JavaScript and TypeScript,
set up a project in Google AI Studio, which includes obtaining an API key
for the Gemini Developer API. Next, add the required dependency for the
SDK to your build configuration or import it directly using
@google/genai
. Then, you can initialize the library with
your API key and make your first API call.
Explore the JavaScript sample app
Explore more advanced use cases for the Google Gen AI SDK for JavaScript and TypeScript with the sample app on GitHub.
This example app demonstrates several key use cases in more detail: generating text, photo reasoning (using multimodal inputs), and generating videos using Veo. It also shows how to use content streaming to improve response time by displaying partial results and using the Live API for low-latency voice and video interactions.
Follow the steps in the README
to get started, which includes
configuring your Gemini API key and providing it to the included HTTP server or and samples
apps.
Multimodal prompting using the Google Gen AI SDK
Multimodal prompts combine different types of media together, such as text, images, and audio. For example, you could create prompts that identify objects in an image, extract text from a photo, or reference a picture.
To get started, read this guide about file prompting strategies and multimodal concepts, which includes best practices for designing multimodal prompts.
Next, explore the multimodal capabilities of the Gemini models in Google AI Studio by uploading or selecting a file as part of your prompt.
Learn how to use multimodal inputs using the Google Gen AI SDK for JavaScript and TypeScript, find image requirements for prompts for prompts, and explore the multimodal image reasoning demo in the sample app .
For further reading, see the solution Leveraging the Gemini Pro Vision model for image understanding, multimodal prompts and accessibility.
Prepare for production by migrating to Firebase AI Logic
Using the Google Gen AI SDK for JavaScript and TypeScript to call the Gemini API directly from a web client is only for prototyping and experimentation. When you start to seriously develop your app beyong prototyping (especially as you prepare for production), transition to use Firebase AI Logic and its SDK for Web.
For calling the Gemini API directly from your web app, we strongly recommend using the Firebase AI Logic client SDK for Web. This SDK offers enhanced security features for web apps, including Firebase App Check to help protect your app from unauthorized client access. When you use this SDK, you can include large media files in your requests by using Cloud Storage for Firebase. Firebase AI Logic also integrates with other products in Google's Firebase developer platform (like Cloud Firestore and Firebase Remote Config), while also giving you streamlined access to the tools, workflows, and scale offered through Google Cloud. You can choose a "Gemini API" provider, either the Vertex AI Gemini API or the Gemini Developer API, which also provides a no-cost tier. Review the differences between the two providers to learn more.
Follow this guide to migrate to the Firebase AI Logic client SDK by updating your package dependencies, imports, and changing how the AI model is initialized.
Quiz
Test your knowledge and earn your 'Getting started with the Gemini API and Web Apps' badge.