Scaling
The stack can scale automatically horizontally to meet the demands of your customers. We do this at two levels the cluster/node level and the pod/container level. This ensures that your application can serve fluctuating load without manual intervention. This also means that you will not have idle resources during low traffic periods.
Cluster/Node Scaling​
Provisioning the minimum number of nodes (machines) to run the stack is achieved by using a Cluster Autoscaler. The Cluster Autoscaler adds or removes nodes based on the demands of the cluster's pods. While it does not control the number of individual pods required to manage the load, it ensures the correct number of nodes are available to meet the cluster's needs.
We can set hard limits for the number of nodes that the cluster can scale to or scale down to. Having a minimum set of resources is useful as node provisioning can be very slow depending on your compute provider, so it's good to have an allocated set to be able to react to load quickly. The implementation of this varies depending on the compute provider, but we support all major cloud providers as well as local bespoke solutions. Reach out to us if you have more questions about your specific setup.
Pod/Container Scaling​
We have another form of scaling that looks after the number of pods/containers running on the nodes. This is achieved by using the Horizontal Pod Autoscaler. This scales the number of pods in a deployment based on observed resource metrics published by our pods. The controller will adjust the number of replicas to keep these custom metrics within the desired target range.
Similar to the Cluster Autoscaler, we can set hard limits for the number of pods that the deployment can scale to or scale down to. We can also assign different policies to scaling up and down. This is useful for instance if you have an application with a spiky load pattern, you may want to scale up quickly but scale down slowly to prevent rapid fluctuations in resources.