AWS Raw ECS Worker Manager¶
The AWS Raw ECS worker manager provisions Scaler workers as AWS Fargate tasks inside an ECS cluster. Unlike the AWS HPC Batch worker manager, which runs each Scaler task as a separate cloud job, the AWS Raw ECS worker manager launches full Scaler worker processes in Fargate containers. This means workers connect back to the scheduler and process tasks the same way local workers do, with the scheduler handling load balancing and scaling.
Prerequisites¶
An AWS account
AWS CLI installed and configured (
aws configure)Python packages:
pip install opengris-scaler boto3A VPC with at least one subnet that has internet access (a public subnet with an Internet Gateway, or a private subnet with a NAT Gateway)
Quick Start¶
Get a subnet ID from your default VPC:
aws ec2 describe-subnets \
--filters "Name=default-for-az,Values=true" \
--query "Subnets[0].SubnetId" \
--output text
Paste the result into the TOML below and run the three commands:
[[worker_manager]]
type = "aws_raw_ecs"
scheduler_address = "tcp://<SCHEDULER_PUBLIC_IP>:8516"
object_storage_address = "tcp://<SCHEDULER_PUBLIC_IP>:8517"
worker_manager_id = "wm-ecs"
ecs_subnets = "subnet-0abc1234def56789a" # paste your subnet ID here
aws_region = "us-east-1"
max_task_concurrency = 4
ecs_task_cpu = 4
ecs_task_memory = 30
# Terminal 1 — Scheduler (use your public/private IP, not 127.0.0.1)
scaler_scheduler tcp://0.0.0.0:8516 \
--policy-content "allocate=even_load; scaling=vanilla"
# Terminal 2 — AWS Raw ECS Worker Manager
$ scaler config.toml
from scaler import Client
def compute(x):
return x ** 2
with Client(address="tcp://<SCHEDULER_PUBLIC_IP>:8516") as client:
futures = client.map(compute, range(50))
print([f.result() for f in futures])
If you need help finding your subnet IDs or setting up permissions, follow the detailed setup below.
Detailed Setup¶
Step 1: Configure AWS Credentials¶
aws configure
# Enter your AWS Access Key ID, Secret Access Key, region (e.g. us-east-1), and output format (json)
Your IAM user needs the following permissions:
ECS:
ecs:CreateCluster,ecs:DescribeClusters,ecs:RegisterTaskDefinition,ecs:DescribeTaskDefinition,ecs:RunTask,ecs:StopTaskIAM:
iam:CreateRole,iam:AttachRolePolicy,iam:GetRole,iam:PassRoleEC2:
ec2:DescribeSubnets,ec2:DescribeSecurityGroups
Or attach the following AWS managed policies for quick setup:
AmazonECS_FullAccess
IAMFullAccess
Step 2: Find Your Subnet IDs¶
The ECS worker manager needs at least one subnet ID to launch Fargate tasks. Find your default VPC subnets:
aws ec2 describe-subnets \
--filters "Name=default-for-az,Values=true" \
--query "Subnets[].SubnetId" \
--output text
Copy one or more subnet IDs (e.g. subnet-0abc1234def56789a).
Step 3: Start the Scheduler¶
The scheduler must be reachable from the Fargate tasks. Use your machine’s public or private IP (not 127.0.0.1):
scaler_scheduler tcp://0.0.0.0:8516 \
--policy-content "allocate=even_load; scaling=vanilla"
Important
Fargate tasks must be able to reach the scheduler address over the network. Ensure your security group allows inbound TCP on port 8516 from the Fargate subnet CIDR, and that the scheduler binds to an accessible IP.
Step 4: Start the AWS Raw ECS Worker Manager¶
scaler_worker_manager aws_raw_ecs tcp://<SCHEDULER_PUBLIC_IP>:8516 \
--ecs-subnets subnet-0abc1234def56789a \
--aws-region us-east-1 \
--max-task-concurrency 4 \
--ecs-task-cpu 4 \
--ecs-task-memory 30
Or use a TOML configuration file:
$ scaler config.toml
[[worker_manager]]
type = "aws_raw_ecs"
scheduler_address = "tcp://<SCHEDULER_PUBLIC_IP>:8516"
object_storage_address = "tcp://<SCHEDULER_PUBLIC_IP>:8517"
worker_manager_id = "wm-ecs"
ecs_subnets = "subnet-0abc1234def56789a"
aws_region = "us-east-1"
max_task_concurrency = 4
ecs_task_cpu = 4
ecs_task_memory = 30
ecs_cluster = "scaler-cluster"
ecs_task_definition = "scaler-task-definition"
ecs_task_image = "public.ecr.aws/v4u8j8r6/scaler:latest"
Step 5: Submit Tasks¶
from scaler import Client
def compute(x):
return x ** 2
with Client(address="tcp://<SCHEDULER_PUBLIC_IP>:8516") as client:
futures = client.map(compute, range(50))
results = [f.result() for f in futures]
print(results)
How It Works¶
The AWS Raw ECS worker manager connects to the Scaler scheduler and sends periodic heartbeats.
When the scheduler’s scaling policy requests more workers, it sends a
StartWorkerGroupcommand.The worker manager calls
ecs:RunTaskto launch a Fargate task running the Scaler worker container.Each Fargate task runs
scaler_clusterinside the container, spawning one or more worker processes (controlled by--ecs-task-cpu).Workers connect back to the scheduler and process tasks like local workers.
When the scheduler wants to scale down, it sends a
ShutdownWorkerGroupcommand and the worker manager stops the Fargate task.
Configuration Reference¶
AWS Raw ECS Parameters¶
scheduler_address(positional, required): Address of the Scaler scheduler. Must be reachable from Fargate tasks.--ecs-subnets(required): Comma-separated list of VPC subnet IDs for Fargate tasks.--aws-region: AWS region (default:us-east-1).--aws-access-key-id: AWS access key (default: uses environment/profile).--aws-secret-access-key: AWS secret key (default: uses environment/profile).--ecs-cluster: ECS cluster name (default:scaler-cluster). Created automatically if missing.--ecs-task-definition: Task definition family name (default:scaler-task-definition). Created automatically if missing.--ecs-task-image: Container image (default:public.ecr.aws/v4u8j8r6/scaler:latest).--ecs-task-cpu: Number of vCPUs per Fargate task (default:4). Also determines the number of worker processes per task.--ecs-task-memory: Memory per Fargate task in GB (default:30).--ecs-python-requirements: Python packages to install in the container at startup (default:tomli;pargraph;parfun;pandas).--ecs-python-version: Python version for the container (default:3.12.11).--max-task-concurrency(-mtc): Maximum number of Fargate tasks (default: number of CPUs − 1).
Common Parameters¶
For worker behavior, logging, and event loop options, see Common Worker Manager Parameters.
Architecture¶
┌─────────┐ ┌───────────┐ ┌──────────────────┐ ┌─────────────────────┐
│ Client │────>│ Scheduler │<───>│ ECS WorkerAdapter│────>│ AWS ECS (Fargate) │
└─────────┘ └─────┬─────┘ └──────────────────┘ └──────────┬──────────┘
│ │
│ ┌──────────────────┐ │
└───────────>│ Object Storage │<───────────────┘
└──────────────────┘ (scaler_cluster
runs inside each
Fargate task)
The scheduler sends scaling commands (
StartWorkerGroup/ShutdownWorkerGroup) to the ECS worker manager.The worker manager calls
ecs:RunTaskto launch Fargate tasks runningscaler_cluster.Workers inside each Fargate task connect back to the scheduler and process tasks like local workers.
The worker manager auto-creates the ECS cluster and task definition on first run if they don’t exist.
Troubleshooting¶
Tasks stuck in PROVISIONING: Check that your subnets have a route to the internet (either a public subnet with an Internet Gateway, or a private subnet with a NAT Gateway). Fargate needs internet access to pull container images.
Workers can’t connect to scheduler: Ensure the scheduler address is a public/private IP reachable from the Fargate subnet. Update security group inbound rules to allow TCP traffic on port 8516.
Permission errors on RunTask:
Ensure the ecsTaskExecutionRole IAM role exists and has the AmazonECSTaskExecutionRolePolicy attached. The worker manager creates this automatically on first run.