Nitric allows developers to build serverless functions that run on various compute services from multiple cloud vendors. One of the most commonly used is AWS Lambda. One of AWS Lambda's strengths - dynamic scaling - can also be a challenging drawback due to a phenomenon known as Cold Starts.
What are cold starts?
When sending the first request to a Lambda Function, that function will be cold, meaning there are no active instances available to handle the request. So, instead of immediately processing the request, the Lambda service must start an instance before handling the request. This startup time, where the request is idle waiting to be processed, is generally known as a cold start.
You can guarantee a single cold start after deploying a new function because instances start in response to requests. However, there are some less predictable times that they'll occur. For example, if no requests are made for more than 5 minutes, there is a high likelihood that the previous instances have terminated and a new instance will be started. Also, during times of increased load, when the number of incoming requests exceeds the capacity of the current instances, new instances will be started to handle the volume, leading to additional cold starts.
So, how bad are cold starts?
Well, it isn't easy to say. Several factors impact the cold start performance of functions on Lambda. We've experienced cold starts from 40 milliseconds to more than 25 seconds for various functions and containers. This variability means cold starts can be a non-issue, invisible to your end-users, or a contributor to timeouts and other degraded user experience issues for the first user to make a request to a cold function.
Understanding the causes of, and some solutions for, cold start performance can alleviate these issues.
What impacts cold start times?
Our testing and research identified four primary contributors to cold start times:
- Operations performed during initialization
- Instance size (memory and CPU)
- The size and quantity of files read during initialization
- Internal cache behavior within the AWS Lambda service
One quick note about containers on AWS Lambda
To maximize cross-cloud compatibility, the Nitric Framework builds functions as containers, then deploys to services like AWS Lambda. So, we're particularly interested in the cold start performance of containers running on Lambda.
Using containers on Lambda is quite different from zip
file deployments. Containers provide more control but also more pitfalls. Consequently, the details below may or may not be relevant if you're building regular Lambda functions; however, most are applicable in both scenarios.
Operations during init
This factor is probably the easiest to reason about, test, and optimize. Any operations your functions perform before handling incoming requests will impact their cold start times.
Containers deployed to Lambda should minimize any code or processes that execute before calling out to the Lambda runtime and starting to handle requests.
Functions deployed as
zip
files have far less control over the initialization process, meaning initialization will primarily be influenced by the runtime.
Here is a Hello World example in Go to highlight where slowdowns could occur:
main.go
package main
import (
"context"
"github.com/aws/aws-lambda-go/events"
"github.com/aws/aws-lambda-go/lambda"
)
func handle(ctx context.Context, data map[string]interface{}) (interface{}, error) {
return events.APIGatewayProxyResponse{
StatusCode: 200,
Body: "Hello World",
IsBase64Encoded: true,
}, nil
}
func main() {
// Any code that runs here, before lambda.Start(), impacts cold starts
lambda.Start(handle)
}
Instance size and fractional vCPUs
Something unusual about serverless computing services, like AWS Lambda, is the idea of fractional vCPUs, meaning the allocation of less than 1 vCPU to a function. Lambda function vCPUs scale proportionally with their memory allocation (higher memory increases the vCPU allocation and core count). AWS Lambda currently allocates the equivalent of 1 vCPU when configuring a function with 1,769 MB of memory.
Luc van Donkersgoed produced an excellent article about Optimizing Lambda Cost with Multi-Threading, where testing showed that Lambda functions always have access to 2 or more vCPU cores. So, in the case of a 1,769 MB memory allocation providing the equivalent of 1 vCPU, the 1 vCPU limit is imposed across 2 vCPU cores with a form of CPU throttling.
This table shows the vCPU and CPU Ceiling results found in the article above:
Memory | vCPUs | CPU Ceiling |
---|---|---|
832 MB | 2 | 0.50 |
1769 MB | 2 | 1.00 |
3008 MB | 2 | 1.67 |
3009 MB | 3 | 1.70 |
5307 MB | 3 | 2.39 |
5308 MB | 4 | 2.67 |
7076 MB | 4 | 2.84 |
7077 MB | 5 | 3.86 |
8845 MB | 5 | 4.23 |
8846 MB | 6 | 4.48 |
10240 MB | 6 | 4.72 |
The result is that instance size can impact both the cold start and processing time of a function and that the number of processes or threads in your application plays a part in the outcome.
Image size, package size, and I/O
In our testing, we've found that while the size of a function or container can impact cold starts, what appears to be more impactful is how much of that data is read during the initialization step and the total number of files accessed. For example, reading a single large file or importing from a bundle improves the cold start times over reading many files or dependencies individually.
Additionally, larger files also impact performance. For example, importing a specific file from a library tends to be far quicker than importing the entire library.
This appears to be caused by lazy loading of image layer data, particularly during container initialization, and latency introduced by the read operations. In general, you want to access as few files and as little data as possible during the initialization of your functions. For example, we've seen improvement when using ncc to bundle Node.js applications.
Cold, warm and hot
Next, we need to talk about one of the most significant impacts on cold start performance in AWS Lambda: image layer caching. The first invocation of a newly deployed Lambda will be significantly slower than subsequent cold starts, sometimes tens of seconds slower. The reason for this is that Lambda caches container images zonally as well as on individual workers.
During the first cold start, all of these caches will be cold, but subsequent cold starts typically hit one of the faster caches, resulting in much better cold start performance. Here is an example of some results from a simple Node.js application running in a container on Lambda:
Description | Result |
---|---|
First request in a region (cold caches, no running containers) | 9.56s |
Subsequent cold start (warm caches, no running containers) | 2.67s |
All subsequent requests (running containers) | < 0.10s |
Luckily, there are easy options to mitigate the impacts of cold caches on cold starts. AWS recommends provisioned concurrency, which keeps instances of your functions running, ready to respond to requests. This option has the added benefit of eliminating cold starts until you hit the provisioned concurrency threshold.
Another quick option is a schedule that makes periodic requests to your functions to keep the function and caches warm. Using the Nitric Framework, this only takes a few lines of code.
import { schedule } from '@nitric/sdk';
// Execute the function every 5 minutes to keep it warm.
schedule('keep-warm').every('5 minutes', async (ctx) => ctx);
// Your existing function code should live here...
Do cold starts matter?
Now that we've looked at the causes for increased cold start times and a few mitigation options to reduce them. The final question is how much effort should you spend mitigating the impact of cold starts on your application?
It's easy to think that cold starts are a big problem during development. It's disheartening to make your first request to an API and need to wait a few seconds for a response. It's also easy to think it'll be the same for your users. So, to improve the user experience, you go deep down the rabbit hole of cold start optimization.
Luckily, the reality is quite different. Applications with sustained load rarely cold start, and when they do, it's trivial to deal with the latency in a user-facing application through good UX design. As Allen Helton points out in his blog post Let's Stop Talking About Serverless Cold Starts, most teams don't see cold starts as an issue in production.
We'd love to hear your experiences with cold starts. Are they a problem for your users? What techniques have you used to mitigate the impact? Come chat with us on Discord or Twitter.