This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Autoscaling your apps

Guides on autoscaling your applications with SpinKube.

1 - Using the `spin kube` plugin

A tutorial to show how autoscaler support can be enabled via the spin kube command.

Horizontal autoscaling support

In Kubernetes, a horizontal autoscaler automatically updates a workload resource (such as a Deployment or StatefulSet) with the aim of automatically scaling the workload to match demand.

Horizontal scaling means that the response to increased load is to deploy more resources. This is different from vertical scaling, which for Kubernetes would mean assigning more memory or CPU to the resources that are already running for the workload.

If the load decreases, and the number of resources is above the configured minimum, a horizontal autoscaler would instruct the workload resource (the Deployment, StatefulSet, or other similar resource) to scale back down.

The Kubernetes plugin for Spin includes autoscaler support, which allows you to tell Kubernetes when to scale your Spin application up or down based on demand. This tutorial will show you how to enable autoscaler support via the spin kube scaffold command.

Prerequisites

Regardless of what type of autoscaling is used, you must determine how you want your application to scale by answering the following questions:

  1. Do you want your application to scale based upon system metrics (CPU and memory utilization) or based upon events (like messages in a queue or rows in a database)?
  2. If you application scales based on system metrics, how much CPU and memory each instance does your application need to operate?

Choosing an autoscaler

The Kubernetes plugin for Spin supports two types of autoscalers: Horizontal Pod Autoscaler (HPA) and Kubernetes Event-driven Autoscaling (KEDA). The choice of autoscaler depends on the requirements of your application.

Horizontal Pod Autoscaling (HPA)

Horizontal Pod Autoscaler (HPA) scales Kubernetes pods based on CPU or memory utilization. This HPA scaling can be implemented via the Kubernetes plugin for Spin by setting the --autoscaler hpa option. This page deals exclusively with autoscaling via the Kubernetes plugin for Spin.

spin kube scaffold --from user-name/app-name:latest --autoscaler hpa --cpu-limit 100m --memory-limit 128Mi

Horizontal Pod Autoscaling is built-in to Kubernetes and does not require the installation of a third-party runtime. For more general information about scaling with HPA, please see the Spin Operator’s Scaling with HPA section

Kubernetes Event-driven Autoscaling (KEDA)

Kubernetes Event-driven Autoscaling (KEDA) is an extension of Horizontal Pod Autoscaling (HPA). On top of allowing to scale based on CPU or memory utilization, KEDA allows for scaling based on events from various sources like messages in a queue, or the number of rows in a database.

KEDA can be enabled by setting the --autoscaler keda option:

spin kube scaffold --from user-name/app-name:latest --autoscaler keda --cpu-limit 100m --memory-limit 128Mi -replicas 1 --max-replicas 10

Using KEDA to autoscale your Spin applications requires the installation of the KEDA runtime into your Kubernetes cluster. For more information about scaling with KEDA in general, please see the Spin Operator’s Scaling with KEDA section

Setting min/max replicas

The --replicas and --max-replicas options can be used to set the minimum and maximum number of replicas for your application. The --replicas option defaults to 2 and the --max-replicas option defaults to 3.

spin kube scaffold --from user-name/app-name:latest --autoscaler hpa --cpu-limit 100m --memory-limit 128Mi -replicas 1 --max-replicas 10

Setting CPU/memory limits and CPU/memory requests

If the node where an application is running has enough of a resource available, it’s possible (and allowed) for that application to use more resource than its resource request for that resource specifies. However, an application is not allowed to use more than its resource limit.

For example, if you set a memory request of 256 MiB, and that application is scheduled to a node with 8GiB of memory and no other appplications, then the application can try to use more RAM.

If you set a memory limit of 4GiB for that application, the webassembly runtime will enforce that limit. The runtime prevents the application from using more than the configured resource limit. For example: when a process in the application tries to consume more than the allowed amount of memory, the webassembly runtime terminates the process that attempted the allocation with an out of memory (OOM) error.

The --cpu-limit, --memory-limit, --cpu-request, and --memory-request options can be used to set the CPU and memory limits and requests for your application. The --cpu-limit and --memory-limit options are required, while the --cpu-request and --memory-request options are optional.

It is important to note the following:

  • CPU/memory requests are optional and will default to the CPU/memory limit if not set.
  • CPU/memory requests must be lower than their respective CPU/memory limit.
  • If you specify a limit for a resource, but do not specify any request, and no admission-time mechanism has applied a default request for that resource, then Kubernetes copies the limit you specified and uses it as the requested value for the resource.
spin kube scaffold --from user-name/app-name:latest --autoscaler hpa --cpu-limit 100m --memory-limit 128Mi --cpu-request 50m --memory-request 64Mi

Setting target utilization

Target utilization is the percentage of the resource that you want to be used before the autoscaler kicks in. The autoscaler will check the current resource utilization of your application against the target utilization and scale your application up or down based on the result.

Target utilization is based on the average resource utilization across all instances of your application. For example, if you have 3 instances of your application, the target CPU utilization is 50%, and each application is averaging 80% CPU utilization, the autoscaler will continue to increase the number of instances until all instances are averaging 50% CPU utilization.

To scale based on CPU utilization, use the --autoscaler-target-cpu-utilization option:

spin kube scaffold --from user-name/app-name:latest --autoscaler hpa --cpu-limit 100m --memory-limit 128Mi --autoscaler-target-cpu-utilization 50

To scale based on memory utilization, use the --autoscaler-target-memory-utilization option:

spin kube scaffold --from user-name/app-name:latest --autoscaler hpa --cpu-limit 100m --memory-limit 128Mi --autoscaler-target-memory-utilization 50

2 - Scaling Spin App With Horizontal Pod Autoscaling (HPA)

This tutorial illustrates how one can horizontally scale Spin Apps in Kubernetes using Horizontal Pod Autscaling (HPA).

Horizontal scaling, in the Kubernetes sense, means deploying more pods to meet demand (different from vertical scaling whereby more memory and CPU resources are assigned to already running pods). In this tutorial, we configure HPA to dynamically scale the instance count of our SpinApps to meet the demand.

Prerequisites

Ensure you have the following tools installed:

  • Docker - for running k3d
  • kubectl - the Kubernetes CLI
  • k3d - a lightweight Kubernetes distribution that runs on Docker
  • Helm - the package manager for Kubernetes
  • Bombardier - cross-platform HTTP benchmarking CLI

We use k3d to run a Kubernetes cluster locally as part of this tutorial, but you can follow these steps to configure HPA autoscaling on your desired Kubernetes environment.

Setting Up Kubernetes Cluster

Run the following command to create a Kubernetes cluster that has the containerd-shim-spin pre-requisites installed: If you have a Kubernetes cluster already, please feel free to use it:

k3d cluster create wasm-cluster-scale \
  --image ghcr.io/spinkube/containerd-shim-spin/k3d:v0.17.0 \
  -p "8081:80@loadbalancer" \
  --agents 2

Deploying Spin Operator and its dependencies

First, you have to install cert-manager to automatically provision and manage TLS certificates (used by Spin Operator’s admission webhook system). For detailed installation instructions see the cert-manager documentation.

# Install cert-manager CRDs
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.3/cert-manager.crds.yaml

# Add and update Jetstack repository
helm repo add jetstack https://charts.jetstack.io
helm repo update

# Install the cert-manager Helm chart
helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.14.3

Next, run the following commands to install the Spin Runtime Class and Spin Operator Custom Resource Definitions (CRDs):

Note: In a production cluster you likely want to customize the Runtime Class with a nodeSelector that matches nodes that have the shim installed. However, in the K3d example, they’re installed on every node.

# Install the RuntimeClass
kubectl apply -f https://github.com/spinkube/spin-operator/releases/download/v0.4.0/spin-operator.runtime-class.yaml

# Install the CRDs
kubectl apply -f https://github.com/spinkube/spin-operator/releases/download/v0.4.0/spin-operator.crds.yaml

Lastly, install Spin Operator using helm and the shim executor with the following commands:

# Install Spin Operator
helm install spin-operator \
  --namespace spin-operator \
  --create-namespace \
  --version 0.4.0 \
  --wait \
  oci://ghcr.io/spinkube/charts/spin-operator

# Install the shim executor
kubectl apply -f https://github.com/spinkube/spin-operator/releases/download/v0.4.0/spin-operator.shim-executor.yaml

Great, now you have Spin Operator up and running on your cluster. This means you’re set to create and deploy SpinApps later on in the tutorial.

Set Up Ingress

Use the following command to set up ingress on your Kubernetes cluster. This ensures traffic can reach your SpinApp once we’ve created it in future steps:

# Setup ingress following this tutorial https://k3d.io/v5.4.6/usage/exposing_services/
cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: nginx
  annotations:
    ingress.kubernetes.io/ssl-redirect: "false"
spec:
  rules:
  - http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: hpa-spinapp
            port:
              number: 80
EOF

Hit enter to create the ingress resource.

Deploy Spin App and HorizontalPodAutoscaler (HPA)

Next up we’re going to deploy the Spin App we will be scaling. You can find the source code of the Spin App in the apps/cpu-load-gen folder of the Spin Operator repository.

We can take a look at the SpinApp and HPA definitions in our deployment file below/. As you can see, we have set our resources -> limits to 500m of cpu and 500Mi of memory per Spin application and we will scale the instance count when we’ve reached a 50% utilization in cpu and memory. We’ve also defined support a maximum replica count of 10 and a minimum replica count of 1:

apiVersion: core.spinkube.dev/v1alpha1
kind: SpinApp
metadata:
  name: hpa-spinapp
spec:
  image: ghcr.io/spinkube/spin-operator/cpu-load-gen:20240311-163328-g1121986
  enableAutoscaling: true
  resources:
    limits:
      cpu: 500m
      memory: 500Mi
    requests:
      cpu: 100m
      memory: 400Mi
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: spinapp-autoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hpa-spinapp
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 50

For more information about HPA, please visit the following links:

Below is an example of the configuration to scale resources:

apiVersion: core.spinkube.dev/v1alpha1
kind: SpinApp
metadata:
  name: hpa-spinapp
spec:
  image: ghcr.io/spinkube/spin-operator/cpu-load-gen:20240311-163328-g1121986
  executor: containerd-shim-spin
  enableAutoscaling: true
  resources:
    limits:
      cpu: 500m
      memory: 500Mi
    requests:
      cpu: 100m
      memory: 400Mi
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: spinapp-autoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hpa-spinapp
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 50

Let’s deploy the SpinApp and the HPA instance onto our cluster (using the above .yaml configuration). To apply the above configuration we use the following kubectl apply command:

# Install SpinApp and HPA
kubectl apply -f https://raw.githubusercontent.com/spinkube/spin-operator/main/config/samples/hpa.yaml

You can see your running Spin application by running the following command:

kubectl get spinapps
NAME          AGE
hpa-spinapp   92m

You can also see your HPA instance with the following command:

kubectl get hpa
NAME                 REFERENCE                TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
spinapp-autoscaler   Deployment/hpa-spinapp   6%/50%    1         10        1          97m

Please note: The Kubernetes Plugin for Spin is a tool designed for Kubernetes integration with the Spin command-line interface. The Kubernetes Plugin for Spin has a scaling tutorial that demonstrates how to use the spin kube command to tell Kubernetes when to scale your Spin application up or down based on demand).

Generate Load to Test Autoscale

Now let’s use Bombardier to generate traffic to test how well HPA scales our SpinApp. The following Bombardier command will attempt to establish 40 connections during a period of 3 minutes (or less). If a request is not responded to within 5 seconds that request will timeout:

# Generate a bunch of load
bombardier -c 40 -t 5s -d 3m http://localhost:8081

To watch the load, we can run the following command to get the status of our deployment:

kubectl describe deploy hpa-spinapp
...
---

Available      True    MinimumReplicasAvailable
Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   hpa-spinapp-544c649cf4 (1/1 replicas created)
Events:
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  11m    deployment-controller  Scaled up replica set hpa-spinapp-544c649cf4 to 1
  Normal  ScalingReplicaSet  9m45s  deployment-controller  Scaled up replica set hpa-spinapp-544c649cf4  to 4
  Normal  ScalingReplicaSet  9m30s  deployment-controller  Scaled up replica set hpa-spinapp-544c649cf4  to 8
  Normal  ScalingReplicaSet  9m15s  deployment-controller  Scaled up replica set hpa-spinapp-544c649cf4  to 10

3 - Scaling Spin App With Kubernetes Event-Driven Autoscaling (KEDA)

This tutorial illustrates how one can horizontally scale Spin Apps in Kubernetes using Kubernetes Event-Driven Autoscaling (KEDA).

KEDA extends Kubernetes to provide event-driven scaling capabilities, allowing it to react to events from Kubernetes internal and external sources using KEDA scalers. KEDA provides a wide variety of scalers to define scaling behavior base on sources like CPU, Memory, Azure Event Hubs, Kafka, RabbitMQ, and more. We use a ScaledObject to dynamically scale the instance count of our SpinApp to meet the demand.

Prerequisites

Please ensure the following tools are installed on your local machine:

  • kubectl - the Kubernetes CLI
  • Helm - the package manager for Kubernetes
  • Docker - for running k3d
  • k3d - a lightweight Kubernetes distribution that runs on Docker
  • Bombardier - cross-platform HTTP benchmarking CLI

We use k3d to run a Kubernetes cluster locally as part of this tutorial, but you can follow these steps to configure KEDA autoscaling on your desired Kubernetes environment.

Setting Up Kubernetes Cluster

Run the following command to create a Kubernetes cluster that has the containerd-shim-spin pre-requisites installed: If you have a Kubernetes cluster already, please feel free to use it:

k3d cluster create wasm-cluster-scale \
  --image ghcr.io/spinkube/containerd-shim-spin/k3d:v0.17.0 \
  -p "8081:80@loadbalancer" \
  --agents 2

Deploying Spin Operator and its dependencies

First, you have to install cert-manager to automatically provision and manage TLS certificates (used by Spin Operator’s admission webhook system). For detailed installation instructions see the cert-manager documentation.

# Install cert-manager CRDs
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.3/cert-manager.crds.yaml

# Add and update Jetstack repository
helm repo add jetstack https://charts.jetstack.io
helm repo update

# Install the cert-manager Helm chart
helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.14.3

Next, run the following commands to install the Spin Runtime Class and Spin Operator Custom Resource Definitions (CRDs):

Note: In a production cluster you likely want to customize the Runtime Class with a nodeSelector that matches nodes that have the shim installed. However, in the K3d example, they’re installed on every node.

# Install the RuntimeClass
kubectl apply -f https://github.com/spinkube/spin-operator/releases/download/v0.4.0/spin-operator.runtime-class.yaml

# Install the CRDs
kubectl apply -f https://github.com/spinkube/spin-operator/releases/download/v0.4.0/spin-operator.crds.yaml

Lastly, install Spin Operator using helm and the shim executor with the following commands:

# Install Spin Operator
helm install spin-operator \
  --namespace spin-operator \
  --create-namespace \
  --version 0.4.0 \
  --wait \
  oci://ghcr.io/spinkube/charts/spin-operator

# Install the shim executor
kubectl apply -f https://github.com/spinkube/spin-operator/releases/download/v0.4.0/spin-operator.shim-executor.yaml

Great, now you have Spin Operator up and running on your cluster. This means you’re set to create and deploy SpinApps later on in the tutorial.

Set Up Ingress

Use the following command to set up ingress on your Kubernetes cluster. This ensures traffic can reach your Spin App once we’ve created it in future steps:

# Setup ingress following this tutorial https://k3d.io/v5.4.6/usage/exposing_services/
cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: nginx
  annotations:
    ingress.kubernetes.io/ssl-redirect: "false"
spec:
  rules:
  - http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: keda-spinapp
            port:
              number: 80
EOF

Hit enter to create the ingress resource.

Setting Up KEDA

Use the following command to setup KEDA on your Kubernetes cluster using Helm. Different deployment methods are described at Deploying KEDA on keda.sh:

# Add the Helm repository
helm repo add kedacore https://kedacore.github.io/charts

# Update your Helm repositories
helm repo update

# Install the keda Helm chart into the keda namespace
helm install keda kedacore/keda --namespace keda --create-namespace

Deploy Spin App and the KEDA ScaledObject

Next up we’re going to deploy the Spin App we will be scaling. You can find the source code of the Spin App in the apps/cpu-load-gen folder of the Spin Operator repository.

We can take a look at the SpinApp and the KEDA ScaledObject definitions in our deployment files below. As you can see, we have explicitly specified resource limits to 500m of cpu (spec.resources.limits.cpu) and 500Mi of memory (spec.resources.limits.memory) per SpinApp:

# https://raw.githubusercontent.com/spinkube/spin-operator/main/config/samples/keda-app.yaml
apiVersion: core.spinkube.dev/v1alpha1
kind: SpinApp
metadata:
  name: keda-spinapp
spec:
  image: ghcr.io/spinkube/spin-operator/cpu-load-gen:20240311-163328-g1121986
  executor: containerd-shim-spin
  enableAutoscaling: true
  resources:
    limits:
      cpu: 500m
      memory: 500Mi
    requests:
      cpu: 100m
      memory: 400Mi
---

We will scale the instance count when we’ve reached a 50% utilization in cpu (spec.triggers[cpu].metadata.value). We’ve also instructed KEDA to scale our SpinApp horizontally within the range of 1 (spec.minReplicaCount) and 20 (spec.maxReplicaCount).:

# https://raw.githubusercontent.com/spinkube/spin-operator/main/config/samples/keda-scaledobject.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: cpu-scaling
spec:
  scaleTargetRef:
    name: keda-spinapp
  minReplicaCount: 1
  maxReplicaCount: 20
  triggers:
    - type: cpu
      metricType: Utilization
      metadata:
        value: "50"

The Kubernetes documentation is the place to learn more about limits and requests. Consult the KEDA documentation to learn more about ScaledObject and KEDA’s built-in scalers.

Let’s deploy the SpinApp and the KEDA ScaledObject instance onto our cluster with the following command:

# Deploy the SpinApp
kubectl apply -f https://raw.githubusercontent.com/spinkube/spin-operator/main/config/samples/keda-app.yaml
spinapp.core.spinkube.dev/keda-spinapp created

# Deploy the ScaledObject
kubectl apply -f https://raw.githubusercontent.com/spinkube/spin-operator/main/config/samples/keda-scaledobject.yaml
scaledobject.keda.sh/cpu-scaling created

You can see your running Spin application by running the following command:

kubectl get spinapps

NAME          READY REPLICAS   EXECUTOR
keda-spinapp  1                containerd-shim-spin

You can also see your KEDA ScaledObject instance with the following command:

kubectl get scaledobject

NAME          SCALETARGETKIND      SCALETARGETNAME   MIN   MAX   TRIGGERS   READY   ACTIVE   AGE
cpu-scaling   apps/v1.Deployment   keda-spinapp      1     20    cpu        True    True     7m

Generate Load to Test Autoscale

Now let’s use Bombardier to generate traffic to test how well KEDA scales our SpinApp. The following Bombardier command will attempt to establish 40 connections during a period of 3 minutes (or less). If a request is not responded to within 5 seconds that request will timeout:

# Generate a bunch of load
bombardier -c 40 -t 5s -d 3m http://localhost:8081

To watch the load, we can run the following command to get the status of our deployment:

kubectl describe deploy keda-spinapp
...
---

Available      True    MinimumReplicasAvailable
Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   keda-spinapp-76db5d7f9f (1/1 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  84s   deployment-controller  Scaled up replica set hpa-spinapp-76db5d7f9f  to 2 from 1
  Normal  ScalingReplicaSet  69s   deployment-controller  Scaled up replica set hpa-spinapp-76db5d7f9f  to 4 from 2
  Normal  ScalingReplicaSet  54s   deployment-controller  Scaled up replica set hpa-spinapp-76db5d7f9f  to 8 from 4
  Normal  ScalingReplicaSet  39s   deployment-controller  Scaled up replica set hpa-spinapp-76db5d7f9f  to 16 from 8
  Normal  ScalingReplicaSet  24s   deployment-controller  Scaled up replica set hpa-spinapp-76db5d7f9f  to 20 from 16