One of the major drawbacks of AWS Fargate is that the pull times are relatively slow (compared to EC2). This is because EC2 nodes can have a local image cache on the instance. Fargate is serverless compute so does not offer this cache.

Having slow pull times can be troublesome if you have spiky loads. If it takes 60 seconds for the container to launch and 10 seconds for the health-checks of the load balancer to pass you will have 70 seconds from the scaling event (which in case of target tracking can be another 3 minutes). Faster pull times allow your application to scale quicker to meet demand.

There have been some improvements in this space. Notably SOCI (Seekable OCI). This is an open source technology by AWS that stores an image index in the repository. Allowing for faster pull times in Fargate.

I did some benchmarking with an PHP container of 413 MB. Just running a simple artisan command line and benchmarking the start and creation time (so there is no load balancer involved, this could change the benchmarks due to the health checks).

Pulling without SOCI

Starting the container from in a private subnet (with a NAT instance):

Creation time	Start time	Duration
2024-04-10T12:33:42.190Z	2024-04-10T12:34:43.687Z	61 seconds
2024-04-10T12:33:39.054Z	2024-04-10T12:34:38.749Z	59 seconds
2024-04-10T12:33:42.233Z	2024-04-10T12:34:35.717Z	53 seconds
2024-04-10T12:33:40.833Z	2024-04-10T12:34:39.173Z	59 seconds
2024-04-10T12:33:39.402Z	2024-04-10T12:34:41.113Z	62 seconds
2024-04-10T12:33:41.342Z	2024-04-10T12:34:36.574Z	55 seconds
2024-04-10T12:33:41.342Z	2024-04-10T12:34:36.574Z	55 seconds
2024-04-10T12:33:39.964Z	2024-04-10T12:34:35.209Z	56 seconds
2024-04-10T12:33:37.956Z	2024-04-10T12:34:33.268Z	56 seconds
2024-04-10T12:33:38.655Z	2024-04-10T12:34:31.864Z	53 seconds

Pulling without SOCI (using VPC endpoints)

Initially I thought that I was able to get faster pull times by setting up a Gateway S3 endpoint (which you should have by default) and VPC endpoints (ecr.drk and ecr.api). However it turns out this is not the case:

Creation time	Start time	Duration
2024-04-10T12:23:47.643Z	2024-04-10T12:24:42.853Z	55 seconds
2024-04-10T12:23:50.045Z	2024-04-10T12:24:41.766Z	51 seconds
2024-04-10T12:23:53.436Z	2024-04-10T12:24:49.033Z	56 seconds
2024-04-10T12:23:51.992Z	2024-04-10T12:24:50.407Z	59 seconds
2024-04-10T12:23:50.430Z	2024-04-10T12:24:49.354Z	59 seconds
2024-04-10T12:23:50.759Z	2024-04-10T12:24:43.958Z	53 seconds
2024-04-10T12:23:52.621Z	2024-04-10T12:24:54.872Z	62 seconds
2024-04-10T12:23:52.580Z	2024-04-10T12:24:48.212Z	56 seconds
2024-04-10T12:23:51.428Z	2024-04-10T12:24:48.406Z	57 seconds
2024-04-10T12:23:49.307Z	2024-04-10T12:24:37.515Z	48 seconds

We can verify that the VPC endpoint is indeed being used by running a traceroute from inside the VPC:

This session is encrypted using AWS KMS.
sh-5.2$ traceroute 533267114484.dkr.ecr.eu-west-1.amazonaws.com
traceroute to 533267114484.dkr.ecr.eu-west-1.amazonaws.com (10.0.43.48), 30 hops max, 60 byte packets
 1  * * *
 2  * * *
 3  * * *
 4  * * *

The IP address 10.0.43.48 corresponds with the VPC endpoint IP address:

VPC endpoint IP address

Pulling containers using SOCI

Now we setup SOCI by setting up the following CloudFormation template. This will monitor your ECR repositories for pushes, and create the SOCI index in the ECR repository automatically.

There is an alternative way to set up SOCI (for your CI/CD pipeline) but they are outside the scope of this article. After setting up the CloudFormation template, push an image to ECR to see two new artifact types appear after the image has been pushed to the repo. A SOCI Index and an Image Index:

A SOCI Index and an Image Index

Make sure your Fargate platform version is 1.4.0 or higher. We can now pull the containers again from Fargate (no further changes are needed):

Creation time	Start time	Duration
2024-04-10T12:15:49.682Z	2024-04-10T12:16:10.607Z	21 seconds
2024-04-10T12:15:47.849Z	2024-04-10T12:16:08.313Z	21 seconds
2024-04-10T12:15:48.874Z	2024-04-10T12:16:08.970Z	20 seconds
2024-04-10T12:15:48.874Z	2024-04-10T12:16:10.845Z	22 seconds
2024-04-10T12:15:49.308Z	2024-04-10T12:16:11.618Z	22 seconds
2024-04-10T12:15:52.370Z	2024-04-10T12:16:13.479Z	21 seconds
2024-04-10T12:15:48.558Z	2024-04-10T12:16:10.845Z	22 seconds
2024-04-10T12:15:51.273Z	2024-04-10T12:16:10.480Z	19 seconds
2024-04-10T12:15:52.169Z	2024-04-10T12:16:20.487Z	28 seconds
2024-04-10T12:15:50.144Z	2024-04-10T12:16:15.148Z	25 seconds

Pulls with SOCI enabled are 40% faster (average 22 seconds)

I expect that with even bigger containers the results will be even more impressive. AWS has a blog post with a container of 1333MB and their results are 50% faster pull times. Note that SOCI is only worth it if your container is larger than 250 MB in size.

This is a notable difference, especially for services scaling behind a load balancer. If we assume you configured your load balancer with two successful health checks with a interval of 5 seconds. This means that:

Without using SOCI your container would be accepting requests after 10 seconds + 57 seconds = 67 seconds
Using SOCI your container would be accepting requests after 10 seconds + 22 seconds = 32 seconds

This can make all the difference when you have spiky workloads. Especially if you are using target tracking it can take 3 minutes for the target tracking alarm to activate. Having faster pull times can allow you to respond more quickly to load.

To stop using SOCI indexes, simply delete the SOCI Index and the Image Index from the ECR repository (it only works with private repositories currently).

Some more tips to improve your container pull times:

Never use Docker Hub images, they will be pulled through the NAT gateway or NAT instances causing data transfer, and very slow pull times, not to mention they are flaky due to Docker Hub rate limits. Set up pull through cache rules with ECR and Docker Hub
To improve scalability behind a load balancer set the minimum interval for health checks to 5 seconds and an healthy count of 2
Reduce the size of your image by basing it on alpine, or when you are using PHP extensions use this excellent open source library that also cleans up the layers after the build
Separate your workloads, ie. for PHP use the php-cli base image for workers and the php-apache base image for web server containers (the web server container will be larger)

If you need single digit pull times, you can use EC2 ECS instances, they can have an image cache so they can start containers almost instantly. You must then maintain and scale the EC2 instance. Or, you must upvote this roadmap item for AWS and hope they introduce a image cache for Fargate!

Reduce AWS Fargate pull times with SOCI

Pulling without SOCI

Pulling without SOCI (using VPC endpoints)

Pulling containers using SOCI