Instances are the computing units that App Engine uses to automatically scale your application. This section describes what instances are, how they're used to serve your application, and how you can use the Instances Dashboard to understand and control your application.
- Introduction to Instances
- How Applications Scale
- Loading Requests
- Warmup Requests
- The Instances Dashboard
- Instance Billing
Introduction to Instances
Instances are the basic building blocks of App Engine, providing all the resources needed to successfully host your application. This includes the language runtime, the App Engine APIs, and your application's code and memory. Each instance includes a security layer to ensure that instances cannot inadvertently affect each other. At any given time, your application may be running on one instance or many instances, with requests being spread across all of them.
Instances are resident or dynamic. A dynamic instance starts up and shuts down automatically based on the current needs. A resident instance runs all the time, which can improve your application's performance.
An instance instantiates the code which is included in an App Engine module.
When you configure your app, you also specify how its modules scale (the initial number of instances for a module and how new instances are created and stopped in response to traffic), and how much time an instance is allowed to handle a request (its deadline).The following table shows how the various module configurations determine whether an instance is resident or dynamic:
|Resident Instances||Dynamic Instances|
|Auto scaling module with one or more minimum idle instances (the instances up to the minimum idle are resident, others are dynamic)||Auto scaling module with no minimum idle instances|
|Manual scaling module||Basic scaling module|
When App Engine was first released, the modules architecture did not exist. Applications were constructed with one frontend and multiple, optional backends. A frontend behaves like an auto scaling version of the default module. Depending on its configuration, a backend module behaves like a manual or basic scaling module. Note that backends have been deprecated.
App Engine charges for instance usage on an hourly basis. If you expect to use a certain number of instance hours weekly, you can save money by purchasing them in advance on the Billing Settings tab of the Administration Console. Note that, if you purchase instance hours in advance, you will be charged for them whether or not you use them. You can track your instance usage on the Instances Dashboard of the Administration Console.
How Applications Scale
App Engine applications are powered by any number of dynamic instances at any given time, depending on the volume of requests received by your application. As requests for your application increase, so do the number of dynamic instances powering it, and vice versa.
Scaling in Instances
Each instance has its own queue for incoming requests. App Engine monitors the number of requests waiting in each instance's queue. If App Engine detects that queues for an application are getting too long due to increased load, it automatically creates a new instance of the application to handle that load.
App Engine also scales instances in reverse when request volumes decrease. This scaling helps ensure that all of your application's current instances are being used to optimal efficiency and cost effectiveness.
When an application is not being used at all, App Engine turns off its associated dynamic instances, but readily reloads them as soon as they are needed. Reloading instances may result in loading requests and additional latency for users.
You can specify a minimum number of idle instances. Setting an appropriate number of idle instances for your application based on request volume allows your application to serve every request with little latency, unless you are experiencing abnormally high request volume.
The Application Settings section of the Administration Console allows you to optimize for high performance or low cost.
Request Handling in Instances
Your application's latency has the biggest impact on the number of instances needed to serve your traffic. If you process requests quickly, a single instance can handle a lot of requests.
Single-threaded instances (Python or Java) can currently handle one concurrent request. Therefore, there is a direct relationship between the latency and number of requests that can be handled on the instance per second. For example, 10ms latency equals 100 request/second/instance, 100ms latency equals 10 request/second/instance, etc. Multi-threaded instances can handle many concurrent requests. Therefore, there is a direct relationship between the CPU consumed and the number of requests/second.
Java and Python 2.7 apps also support concurrent requests (Java | Python 2.7), so a single instance can handle new requests while waiting for other requests to complete. Concurrency significantly reduces the number of instances your app requires, but you need to design your app specifically with multithreading in mind.
For example, if a B4 instance (approx 2.4GHz) consumes 10 Mcycles/request, you can process 240 requests/second/instance. If it consumes 100 Mcycles/request, you can process 24 requests/second/instance, etc. These numbers are the ideal case but are fairly realistic in terms of what you can accomplish on an instance. Multi-Threaded instances are not available for the Python 2.5 runtime.
When App Engine creates a new instance for your application, the instance must first load any libraries and resources required to handle the request. This happens during the first request to the instance, also known as a Loading Request. During a loading request, your application undergoes initialization which causes the request to take longer.
The following best practices allow you to reduce the duration of loading requests:
- Load only the code needed for startup.
- Access the disk as little as possible.
- In some cases, loading code from a zip or jar file is faster than loading from many separate files.
Warmup requests are a specific type of loading request that load application code into an instance ahead of time, before any live requests are made. To learn more about how to use warmup requests, see the Warmup Requests section of the Application Configuration page (Java | Python | Go).
Note: Warmup requests are not always called for every new instance. For example, if the new instance is the very first instance of your application, the user's request is sent to the application directly. As a result, you may still encounter loading requests, even with warmup requests enabled.
The Instances Dashboard
The Instances Dashboard is part of the App Engine Admin Console:
It shows instances assigned to your application with some information and controls:
- Average Queries Per Second (QPS) over the last minute
- Average Latency over the last minute
- The number of requests received in the last minute
- The Age, or how long the instance has been running
- Current memory usage
- The instance's availability, either resident or dynamic.
- A Shutdown button to shut down the instance.
In general, instance usage is billed on an hourly basis based on the instance's uptime. Billing begins when the instance starts and ends fifteen minutes after the instance shuts down. You will be billed only for idle instances up to the number of maximum idle instances set in the Performance Settings tab of the Admin Console. Runtime overhead is counted against the instance memory . This will be higher for Java applications than Python.
Billing is slightly different in resident and dynamic instances:
- For resident instances, billing ends fifteen minutes after the instance is shut down.
- For dynamic instances, billing ends fifteen minutes after the last request has finished processing.