EKS Auto Mode? I was using it before it even launched

The 1st of December 2024 AWS announced EKS Auto Mode. Their aim is, as usual, to get your company a working Kubernetes cluster running as quickly and seamless as possible, and no, I am not going to argue whether you need Kubernetes or not, at least not in this post.

From installing Kubernetes on EC2 and having to manage the control plane yourself, to being able to use AWS Fargate, and the latest iteration: EKS Auto Mode.

The idea this time is for you to run your workloads without needing to care about:

Scalability: Even though auto scaling groups existed before, EKS Auto Mode is using Karpenter under the hood to make this simpler than ever.
VMs (Nodes) Running Your Containers: Are they running the latest OS version? What happens if there is a new vulnerability?

Networking, Storage, Identity, and Load Balancing: These are almost always handled via a quick helm install if you do not choose auto mode, but now it is simple than ever. For example, they offer a simple way to deploy an application load balancer:

apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  labels:
    app.kubernetes.io/name: LoadBalancerController
  name: alb
spec:
  controller: eks.amazonaws.com/alb
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: game-2048
  name: ingress-2048
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
spec:
  ingressClassName: alb
  rules:
    - http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: service-2048
                port:
                  number: 80

But this is the same as if you would have installed the AWS Load Balancer Controller yourself.

How was I able to try it before it released

Well, I did not. And now you can complain about a clickbait title, which you of course can. The reality is that by the end of 2023 I was tasked with bootstrapping the infrastructure for my team. Since nothing was reusable (the rest of the infrastructure was not in the cloud) and we were going to use AWS, I started from scratch.

At that time, the team consisted solely of me. The iterations were crazy fast, supported sometimes by colleagues from other teams, prior knowledge, and but sometimes by the internet: seeing what people were talking about, what was the new hotness, etc. Some of the most important things I ended up adding to the infrastructure were:

Bottlerocket

Bottlerocket is an operating system for hosting containers that is open source and Linux-based. If you are a developer, you might draw a parallel by saying it’s like the scratch image but for your nodes. Of course, this is not 100% true as the latter is totally empty and this image includes some essential components, but the concept is to minimize the attack surface as much as possible.

Load Balancer controller

When running such a complex infrastructure, one should always exercise failover routines. In the worst-case scenario of no recovery, one strategy I have seen previously is to plan and exercise the release of a new cluster and have the traffic routed to it. Either application or network load balancer helps us big time with this, as you can quickly change a target group to point to different nodes. You can even do A/B testing for new clusters where you want to redirect a percentage of the traffic and more.

Karpenter

This is the golden player, and I went from not knowing about it to almost having a Karpenter poster in my bedroom. In a nutshell, the idea is that you define a manifest describing what a node pool should look like for you, including all the possible instance types, architecture, taints (tells the scheduler if something can be placed in a given node), availability zones, etc. Karpenter then spawns nodes that respect your manifest, allowing you to run your workloads on them. Karpenter will also scale, using limits that you can configure, when the Kubernetes scheduler fails to allocate new pods. It will also do bidding for you in case there are cheaper machines available (for example spot instances) and can consolidate your workloads so you are not left with idle memory or CPU.

Does it sound familiar?

In practice, I have essentially been running what EKS Auto Mode offers but without using the feature. So how is it?

In one year, we never actually had to fail over the cluster for any reason other than practicing and refining the process in case we needed it. Even though we had some cases of rogue nodes in a node pool, the only times we need to be careful is when we are performing updates to Kubernetes, which would trigger an update on the managed node group where Karpenter is running. This causes Karpenter to also start replacing nodes with the new AMI version for the nodes it manages.

This can be tricky, as updating the Kubernetes version without first updating the Karpenter version can break compatibility matrix (which happened last week).

Therefore, I had two thoughts regarding this new AWS release:

They are releasing something that looks like what I have been using for a year, I feel reassured.
I want to share my thoughts with the community, since EKS Auto Mode has been out for 4 months and some people might still be skeptical about using it.

Limitations

There are two things to consider before clicking the “Create cluster” button.

Vendor lock-in: This is unavoidable. If you are in for the pros, you are in for the cons. However, the only way to avoid vendor lock-in is to host everything yourself. Buying your own machines is quite cheap, but then you need to hire people with the knowledge to maintain everything. I consider myself a good asset for cloud infrastructure, but ask me to patch a Linux kernel and I will definitely need to search how to do that. Hiring competent people will end up costing a lot of money anyway.

Then again, you might want to set a common ground where you still use the cloud but avoid overly specific features from provider A or B. This is also super tricky because one of the reasons to use the cloud is its ease of getting started; as soon as you start doing workarounds, you end up paying a lot more and might not reap the benefits.

Using cloud but trying to avoid provider-tailored functionality is the equivalent of building an abstraction layer over your database so you can swap it out later. Do not do that. First, it will probably never happen. Second, it will likely require work anyway, as you might be switching from a provider that supports transactions to one that does not, or does it differently. So what are you going to do with your BeginTx() func?

All big cloud companies are running in the same direction, so as AWS has Karpenter, there is an open source project aiming to bring an equivalent of Karpenter for GCP, and so on.

Pricing: As of the day this post is written, you can see on the price calculator that the additional cost of Auto Mode is 12% more than doing everything yourself. Does it cost a lot more? That is up to you to decide, but it definitely adds up over time.

Veredict

As with all new products, you need to consider whether this will help you for your specific use case. If I would need start something from scratch and have no time but money is not an issue I would just jump into it. But some of the questions you need to ask yourself are:

Do you have the FTE capacity to do this with IaC to achieve reproducible cluster creations?
Are you short on money, or do you have a budget that can accommodate this? Support your decision with the pricing calculators.
Do you have some custom loads that might not be supported by running EKS Auto Mode?
What is your time-to-market target? Maybe this solution can help you bootstrap your operations, and then later you can replace it by deploying everything yourself.

If you like the post you can buy me a cup of coffee here :)