Why Your AWS ECS Task is Stuck in Pending—And What to Do About It

When an AWS ECS task is stuck in the pending state, the first instinct is often to blame ECS itself. However, more often than not, the real culprit lies outside ECS—from networking misconfigurations to image pull failures. If you find your ECS tasks hanging indefinitely in pending, the key is to dig into the underlying infrastructure. Let’s break down the most common reasons why your ECS task won’t transition to RUNNING and how to fix them.

It’s Not ECS—It’s Your Infrastructure

The biggest misconception when troubleshooting ECS issues is assuming that ECS is at fault. In reality, ECS is just a scheduler—it orchestrates containers but relies heavily on other AWS components to function. The most common reasons for an ECS task getting stuck in pending are:

A large container image that takes too long to pull
ECS cannot access Amazon ECR (or Docker Hub) due to networking issues
Your cluster has insufficient resources (CPU, memory, or Fargate quotas)
IAM roles are misconfigured, preventing ECS from assuming the task execution role

If your ECS tasks are stuck, the first thing you should check is not ECS itself, but your infrastructure setup.

1. Your Container Image is Too Large

One of the most overlooked issues is a large container image. ECS needs to pull the image before launching a container, and if the image is too big, this process can take time—or even fail.

How to Check

Run the following command to see your image size:

docker images

If your image is over 1GB, that’s a red flag. Large images significantly slow down the deployment process, especially when pulled over a slow network.

Solution

Optimize your Docker image by using multi-stage builds or lightweight base images like alpine.
Use Amazon ECR image caching to avoid unnecessary pulls by setting:
```
ECS_IMAGE_PULL_BEHAVIOR=prefer-cached
```
in /etc/ecs/ecs.config.

2. ECS Can’t Access Your Image Repository

ECS needs to pull container images from Amazon ECR, Docker Hub, or another registry. If it lacks network access, your task will stay stuck in pending indefinitely.

How to Check

If you're using Fargate, check if your subnet has internet access:

Public subnets: Must have an Internet Gateway.
Private subnets: Must have a NAT Gateway or AWS PrivateLink configured.

For EC2-based ECS clusters, SSH into an instance and try pulling an image manually:

docker pull <your-image-url>

If this fails, ECS can’t reach the registry.

Solution

For Fargate, ensure your task is in a subnet with a NAT Gateway or AWS PrivateLink.
For EC2, update your security groups and IAM policies to allow outbound internet access.

3. Your Cluster is Out of Resources

If your cluster is running low on CPU or memory, ECS might not be able to schedule new tasks.

How to Check

For Fargate, check your account limits:

aws service-quotas list-service-quotas --service-code fargate

For EC2-based clusters, describe your instances:

aws ecs list-container-instances --cluster your-cluster-name
aws ecs describe-container-instances --cluster your-cluster-name --container-instances <instance-id>

If your instance doesn’t have enough memory or CPU available, ECS won’t schedule the task.

Solution

For Fargate, request a quota increase in AWS Service Quotas.
For EC2, scale up your cluster by adding more instances.

4. IAM Role Misconfigurations

ECS tasks need an execution role to pull images and launch containers. If this role is missing permissions, the task stays in pending.

How to Check

Run:

aws ecs describe-tasks --cluster your-cluster-name --tasks <task-id>

If you see an error related to IAM permissions, your execution role is likely misconfigured.

Solution

Ensure your ECS task execution role has these policies attached:

arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy

Update your IAM role with:

aws iam attach-role-policy --role-name ecsTaskExecutionRole --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy

5. Your Essential Containers are Failing Health Checks

ECS won’t move a task to RUNNING if an essential container is unhealthy. If a non-essential container fails, the essential container waits indefinitely in pending.

How to Check

Run:

aws ecs describe-tasks --cluster your-cluster-name --tasks <task-id>

If your task is waiting on a dependency container that is not HEALTHY, that’s your issue.

Solution

Define health check grace periods in your task definition.
Ensure your non-essential containers don’t block essential ones.

Conclusion

If your AWS ECS task is stuck in pending, the issue is rarely ECS itself. Instead, the most common problems stem from network misconfigurations, large images, resource shortages, or IAM role issues. The key to resolving this quickly is focusing on underlying AWS services rather than ECS itself.

Next time you’re debugging a stuck task, remember: ECS isn’t broken—your infrastructure is.

Fargate: Debugging Pending Tasks Without a Shell

Troubleshooting AWS Fargate tasks stuck in pending is fundamentally different from debugging ECS on EC2 instances. With EC2-based clusters, you can SSH into the instance, check logs, and manually pull images. With Fargate, you have no direct access to the underlying infrastructure—which means you must diagnose issues using AWS logs, task metadata, and network configurations.

If your Fargate task is stuck in pending, the two most common culprits are:

Large container images taking too long to pull.
Network misconfigurations preventing Fargate from reaching Amazon ECR (or Docker Hub).

Let’s go through how to diagnose and resolve both issues.

1. Large Container Images Cause Delays

Unlike EC2 instances, which often cache container images, every Fargate task must pull the image from scratch. If your container image is over 1GB, the pull process takes longer—sometimes long enough for ECS to time out and leave the task stuck in pending.

How to Check Image Size

If your task definition uses Amazon ECR, check your image size:

aws ecr describe-images --repository-name your-repo-name --image-ids imageTag=latest

If you're using Docker Hub or another registry, inspect your local image size:

docker images | grep your-image-name

How to Fix It

Optimize your Docker image by using a lightweight base image (e.g., alpine instead of ubuntu).
Use multi-stage builds to eliminate unnecessary dependencies.
Enable Fargate SOCI caching - see our blog article here

2. Network Issues: Fargate Can’t Pull Images

For Fargate to run a task, it must pull the container image from Amazon ECR, Docker Hub, or another registry. If your VPC configuration is incorrect, the task will never reach the registry—keeping it in pending forever.

How to Check Networking Issues

Verify your task is in the right subnet
- Fargate requires private subnets with outbound internet access (via a NAT Gateway or AWS PrivateLink).
- Run:
```
aws ecs describe-tasks --cluster your-cluster-name --tasks <task-id>
```
  If networking is the issue, the stoppedReason might reference network failures.
Check VPC route tables
- If your subnet doesn’t have a NAT Gateway or VPC endpoint for Amazon ECR, Fargate can’t pull the image.
- Run:
```
aws ec2 describe-route-tables --filters Name=vpc-id,Values=<your-vpc-id>
```
- Ensure there's an outbound route to either:
  - An Internet Gateway (for public subnets).
  - A NAT Gateway (for private subnets).
  - A VPC Endpoint for Amazon ECR (if you’re blocking outbound internet).

Confirm security groups and IAM permissions

Your task execution role must include:

{
  "Effect": "Allow",
  "Action": [
    "ecr:GetAuthorizationToken",
    "ecr:BatchCheckLayerAvailability",
    "ecr:GetDownloadUrlForLayer",
    "logs:CreateLogStream",
    "logs:PutLogEvents"
  ],
  "Resource": "*"
}

Ensure security groups allow outbound HTTPS traffic to ECR or Docker Hub.

3. Debugging with Fargate-Specific Logs

Since you can’t SSH into a Fargate task, CloudWatch Logs and Task Events are your only debugging tools.

Check CloudWatch Logs

Enable logging in your task definition:

"logConfiguration": {
    "logDriver": "awslogs",
    "options": {
        "awslogs-group": "/ecs/your-task-name",
        "awslogs-region": "us-east-1",
        "awslogs-stream-prefix": "ecs"
    }
}

View logs in CloudWatch:

aws logs tail /ecs/your-task-name --follow

If your task never writes logs, it likely failed before starting—meaning networking or IAM permissions are the issue.

Check Task Events

Run:

aws ecs describe-services --cluster your-cluster-name --services your-service-name

Look for events[]—they often contain error messages like:

"Task failed to start due to image pull failure" → Check ECR or network access.
"Task stopped because essential container exited" → Check image compatibility or environment variables.

Final Thoughts: Debugging Fargate is Different

The biggest challenge with Fargate troubleshooting is that you can’t SSH into the underlying infrastructure. Instead, you must rely on:

CloudWatch logs to check if your container started.
ECS task events for failures in networking or execution.
AWS CLI commands to validate image size, IAM roles, and VPC configurations.

If your Fargate task is stuck in pending, chances are it’s either:

✅ Too big (image pull issues) → Optimize your container size.
✅ Misconfigured (networking issues) → Check your NAT Gateway or VPC Endpoints.
✅ Lacking permissions → Ensure ECS can assume the execution role.

By focusing on these areas, you can fix pending tasks faster—without needing direct shell access.

Is your ECS task stuck in PENDING? Here are some solutions

Overwhelmed by AWS?

Struggling with infrastructure? We streamline your setup, strengthen security & optimize cloud costs so you can build great products.

Related AWS best practices blogs

Looking for more interesting AWS blog posts?

Reduce AWS Fargate pull times with SOCI

One of the major drawbacks of AWS Fargate is that the pull times are relatively slow (compared to EC2). This is because EC2 nodes can have a local image cache on the instance. Fargate is serverless co ...

Abstracting Away from Object Storage Like S3 is Always a Good Idea

Abstracting away from object storage like S3 makes your development process more flexible, testable, and environment-agnostic.

Amazon Q is not helpful at all

Amazon Q is supposed to be the next big thing in AWS AI assistance, but instead, it often leaves users frustrated. From hallucinating incorrect responses to failing basic troubleshooting, Amazon Q pro ...

CloudFormation vs Terraform: Why AWS’ Native IaC Falls Behind

Is CloudFormation holding you back? Engineers have long debated whether AWS' native IaC tool is even worth using. Let's break down its biggest flaws and why Terraform (with the right tooling) is the b ...

Debugging unexpected issues with Terraform

Since Terraform is relatively new software, you might need to fix issues in a undocumented way. You can spend hours debugging internal providers this way but there are a couple of ways that can help y ...

Extreme MongoDB database performance with AWS Fargate Sidecars

When you use MongoDB or any other database as a transient data store you will be mainly limited by the network performance of your containers.