SMLL Docs

Scaling

Configure autoscaling, scale-to-zero, and idle timeout for your services.

SMLL supports two scaling modes depending on your service's deployment mode.

Always-on scaling (HPA)

When your service is set to always on with max_replicas > min_replicas, SMLL creates a Kubernetes Horizontal Pod Autoscaler (HPA) that automatically adjusts replicas based on load.

  • Min replicas: the minimum number of instances always running
  • Max replicas: the upper bound for autoscaling

Scale-to-zero (on demand)

Services with on demand deployment mode scale down to zero replicas when idle, eliminating costs during inactive periods.

How it works

SMLL uses the KEDA HTTP add-on to manage scale-to-zero:

  1. An interceptor proxy sits in front of your service
  2. When traffic arrives and no replicas are running, KEDA spins up a new instance
  3. Traffic is held at the proxy until the instance is ready (typically a few seconds)
  4. After a period of no traffic, the service scales back to zero

Idle timeout

The idle timeout controls how long a service stays running after the last request:

  • Default: 300 seconds (5 minutes)
  • Configurable: set any value in the service settings

A shorter timeout saves costs but increases cold start frequency. A longer timeout keeps the service warm for follow-up requests.

Configuring

  1. Go to your service settings
  2. Set Deployment mode to On demand
  3. Set Max replicas (min is automatically 0)
  4. Adjust Idle timeout if needed
  5. Save

Cold starts

The first request after scale-to-zero triggers a cold start. The latency depends on:

  • Image pull time (cached images are faster)
  • Application startup time
  • Health check completion

To minimize cold starts, keep your Docker images small and your application's startup fast.

Quota limits

Scaling is bounded by your VPC's resource quotas:

TierMax pods
Free20
Pay-as-you-go100

See Quotas for details on requesting increases.

On this page