Instances are the computing units that App Engine uses to automatically scale your application. They are similar to virtual machines from infrastructure providers in that both have a set amount of Memory allocated to them; however, GAE instances don't have the overhead of operating systems or other applications running, so GAE instances have more usable memory than other VMs. They also operate against high-level APIs and not down through layers of code to virtual device drivers, so they're more efficient, and allow all the services to be fully managed.
This section describes what instances are, how they're used to serve your application, and how you can use the Instance Console to understand and control your application.
- Introduction to Instances
- How Applications Scale
- Loading Requests
- Warmup Requests
- The Instances Dashboard
- Instance Billing
Introduction to Instances
Instances are the basic building blocks of App Engine, providing all the resources needed to successfully host your application. This includes the language runtime, the App Engine APIs, and your application's code and memory. Each instance includes a security layer to ensure that instances cannot inadvertently affect each other. At any given time, your application may be running on one instance or many instances, with requests being spread across all of them. We call these dynamic instances, because they start up and shut down automatically based on current needs.
App Engine also offers resident instances, which remain running all the time. Resident instances allow you to improve application performance, and, if your application uses backends, allow you to perform longer-running, larger tasks. You can enable resident instances on a backend, or by setting minimum idle instances.
App Engine charges for instance usage on an hourly basis. If you expect to use a certain number of instance hours weekly, you can save money by purchasing them in advance on the Billing Settings tab of the Administration Console. Note that, if you purchase instance hours in advance, you will be charged for them whether or not you use them. You can track your instance usage on the Instances Dashboard of the Administration Console.
Types of Instances
App Engine offers two types of instances:
- Frontend Instances: An instance running your code and scaling dynamically based on the incoming requests but limited in how long a request can run. Frontend instances handle all requests by default, including task queue and cron.
- Backend Instances: An instance whose duration is determined by your configuration with limited scaling based on your settings. See Backends for more information. You can configure backends to work with task queues and cron.
How Applications Scale
App Engine applications are powered by any number of dynamic instances at any given time, depending on the volume of requests received by your application. As requests for your application increase, so do the number of dynamic instances powering it, and vice versa.
Scaling in Instances
Each instance has its own queue for incoming requests. App Engine monitors the number of requests waiting in each instance's queue. If App Engine detects that queues for an application are getting too long due to increased load, it automatically creates a new instance of the application to handle that load.
App Engine also scales instances in reverse when request volumes decrease. This scaling helps ensure that all of your application's current instances are being used to optimal efficiency and cost effectiveness.
When an application is not being used at all, App Engine turns off its associated dynamic instances, but readily reloads them as soon as they are needed. Reloading instances may result in loading requests and additional latency for users.
You can specify a minimum number of idle instances. Setting an appropriate number of idle instances for your application based on request volume allows your application to serve every request with little latency, unless you are experiencing abnormally high request volume.
The Application Settings section of the Administration Console allows you to optimize for high performance or low cost.
Request Handling in Instances
Your application's latency has the biggest impact on the number of instances needed to serve your traffic. If you service requests quickly, a single instance can handle a lot of requests.
Single-threaded instances (Python or Java) can currently handle one concurrent request. Therefore, there is a direct relationship between the latency and number of requests that can be handled on the instance per second. For example, 10ms latency equals 100 request/second/instance, 100ms latency equals 10 request/second/instance, etc. Multi-threaded instances can handle many concurrent requests. Therefore, there is a direct relationship between the CPU consumed and the number of requests/second.
Java and Python 2.7 apps also support concurrent requests (Java | Python 2.7), so a single instance can handle new requests while waiting for other requests to complete. Concurrency significantly reduces the number of instances your app requires, but you need to design your app specifically with multithreading in mind.
For example, if a B4 backend instance (approx 2.4GHz) consumes 10 Mcycles/request, you can process 240 requests/second/instance. If it consumes 100 Mcycles/request, you can process 24 requests/second/instance, etc. These numbers are the ideal case but are fairly realistic in terms of what you can accomplish on an instance. Multi-Threaded instances are not available for the Python 2.5 runtime.
When App Engine creates a new instance for your application, the instance must first load any libraries and resources required to handle the request. This happens during the first request to the instance, also known as a Loading Request. During a loading request, your application undergoes initialization which causes the request to take longer.
The following best practices allow you to reduce the duration of loading requests:
- Load only the code needed for startup.
- Access the disk as little as possible.
- In some cases, loading code from a zip or jar file is faster than loading from many separate files.
Warmup requests are a specific type of loading request that load application code into an instance ahead of time, before any live requests are made. To learn more about how to use warmup requests, see the Warmup Requests section of the Application Configuration page (Java | Python | Go).
Note: Warmup requests are not always called for every new instance. For example, if the new instance is the very first instance of your application, the user's request is sent to the application directly. As a result, you may still encounter loading requests, even with warmup requests enabled.
The Instances Dashboard
The Instances Dashboard is part of the App Engine Admin Console:
It shows instances assigned to your application with some information and controls:
- Average Queries Per Second (QPS) over the last minute
- Average Latency over the last minute
- The number of requests received in the last minute
- The Age, or how long the instance has been running
- Current memory usage
- The instance's availability, either resident or dynamic.
- A Shutdown button to shut down the instance.
In general, instance usage is billed on an hourly basis based on the instance's uptime. Billing begins when the instance starts and ends fifteen minutes after the instance shuts down. You will be billed only for idle instances up to the number of maximum idle instances set in the Performance Settings tab of the Admin Console. Runtime overhead is counted against the instance memory . This will be higher for Java applications than Python.
Billing is slightly different in resident and dynamic instances:
- For resident instances, billing ends fifteen minutes after the backend is shut down.
- For dynamic instances, billing ends fifteen minutes after the last request has finished processing.