Skip to content

2024

Build a managed k8s cluster

👷 This is a work in progress

Build Me A Cluster - The Managed Edition

So you have been asked to build a Kubernetes cluster. What does it mean to build a cluster? At the very least, you will need a spec for your node pool (AKS and GKE)/node group (EKS), the persistent storages and your pick of CNI providers.

Kubernetes cluster

I would argue, however, that to building a cluster does not end there. In fact, what many think of building a cluster, is only the beginning because a cluster ultimately has to run workloads and those workloads have to be deployed securely and once up and running, they have to be monitored and observed. One good reason to think about deploying workloads right at the outset is because to manage and secure the cluster, the cluster operator must deploy certain workloads right after build (more on this later).

On the subject of workloads alone, you have to consider the following:-

  1. Establish security baselines.
  2. Choosing and configure the continuous delivery channels for workloads.
  3. Securing workloads deployment to the cluster.
  4. Pre-configuring for observability, monitoring and telemetry.

Security baselines

Setup logging and monitoring

You will need to set up audit logging/api sever logs for the clusters upon creation, and you will likely be archiving these logs into object storage (S3/GCS/Azure Blog Storage etc.) and perhaps event setup analytics and alarms for them.

Kubernetes cluster with logging and monitoring

Enable PodSecurity

Enable the PodSecurity Admission controller if not enabled by default and configure the default policy for the cluster.

Policy engine

Provision a policy engine such as Kyverno or Open Policy Agent (OPA) to ensure workloads comply with known best practices and are in keeping with the organization's or LOB best practices.

Kubernetes cluster with logging and monitoring

Secure the workload container images

When it comes to deploying workloads, you first thing you would like to do is ensure container images come from trusted sources. To that end, you are very likely to be pulling images from trusted container registries only. In environments with higher security requirements - think financial institutions, defense department contractors - you may not be allowed to reach out to public registries at all and all requests will need to go to an internally hosted container registry such as JFrog Artifactory. Even if the requirements are not as stringent, you will likely be using a cloud provider's container registry (ECR/ACR/GCP Artifact Registry) and configure virtual image repos to pull securely from one or more image repos. This configuration should become part of the cluster build process. In short, you have to ensure the images can come only from trusted sources. Kubernetes container image sources

Secure access to cloud resources

Using Kubernetes secrets to hold credentials required to access cloud resources is a practice frowned upon. There is the ever present danger of your secrets being exposed or stolen. Then there is the additional burden of rotating credentials periodically and recreating/syncing secrets. You want your workloads to just have access and this is done using workload identity. Workload identity is the way forward and it requires granting IAM roles to the cluster owner/operator, so they can create Kubernetes service accounts that give access to their workloads access to cloud resources that their workloads require.

“When you hear hoofbeats, think horses, not zebras”

That is a common medical school idiom: Doctors should consider the most likely possibility first when thinking of a diagnosis.

Here's my story where the hoofbeats turned out to be that of not horses, not zebras but that of a herd of water buffaloes. And man did I get trampled!

TL;DR

  1. Horses? - My 11-year old daughter has food poisoning. I take her to the pediatrician.
  2. Zebras? - No, she has appendicitis. I rush her to the hospital.
  3. Water buffaloes! - The pathology report: Appendiceal Neuroendocrine Tumor. More surgery in the offing.
  4. Trampled - Yet another surgery and Hurricane Beryl, all in a span of 4 days.
  5. A happy ending - Much to be thankful for.

Prologue

I have been divorced a few years now. I share custody of my 3 kids - my 14-year old son and my 11-year old twin daughters - with their mother. They alternate weeks living with me and their mother. I don't have family support in town and there's too much for a single parent to handle, even if I have the kids every alternate week. Blanca came to work for us when the twins were born - like many twins, they were born with some health issues - and I needed the help. She is someone I have trusted to take care of my kids for 11 years now.

Horses

Friday, January 26, 2024 : The kids are with their mother this week. I get a text from her that Eva vomited a few times. Most likely food poisoning. The strawberries are the prime suspect but she couldn't be sure.

Monday, January 29, 2024: It's my week with the kids. The kids come over (Mondays is transition day, that's how we set it up). Eva is still not feeling great. We take the dogs for a walk on Monday and Tuesday and she cannot walk fast because it hurts her tummy. A certain amount of post-retching discomfort is to be expected but this feels different. Pediatrician time.

Zebras

Wednesday, January 31, 2024: I pick Eva from school mid-morning to take her to the pediatrician, who takes not even 5 minutes to diagnose her with appendicitis. We are sent straight to the children's hospital. On the way, I call her mother. At radiology, she is diagnosed with perforated appendicitis and surgery is scheduled for the same day. All the while, she has been stoic and taken everything in stride. I make arrangements for my other daughter to go to after-school childcare, the dogs to be boarded for a few days. I know that I will need to stay overnight with her so I ask Blanca to stay overnight with the other kids.

As I am doing all this, Eva is being wheeled in to pre-op and that's when she finally breaks down. Her mother is able to get someone cover her shift and make it to the hospital, before surgery, to see her. The surgery goes well. The general surgeon sees 200 of these every year. No biggie for him.

I stay overnight at the hospital. Her mother and I work out a new schedule. The other kids go over to her so I can focus on Eva.

Thundering herd of water buffaloes

Friday, February 2, 2024: Eva makes great post-surgery progress. By Friday evening, the doctor on call is comfortable releasing her. We decide she needs to avoid going back-and-forth between my home and her mother's, so I drop her off at her mother's. I go to pick up the dogs from the boarding place before heading home. That's when I get a message from the hospital. Pathology report has been posted: Appendiceal Neuroendocrine Tumor (ANET). I have never felt so cold.

Who has time for grief?

Friday, February 2, 2024: I spend the evening calling my closest friends and howling on the phone for many hours. Patience of the saints must have descended upon them to have listened to me for so long. Another message from the hospital. Pediatric oncology appointment for the 19th.

Sunday February 18, 2024: It's time to tell Eva about what the next day's visit is all about. Yet again, she's stoic. Where does she get her strength and her calmness? I come from a family of anxious people.

Monday February 19, 2024: We meet the oncologist. He explains that ANETs are rare and rarer still in children. Almost all are discovered after appendectomies. <1 % of all appendectomies reveal ANETs. Good news was this one was well-differentiated G1 tumor. This is to say it is slow growing. Bad news was the pathology reported tumor cells close to where the appendix was "cut". Positive margins, it's called. He could not be sure there were no tumor cells on the other side of the cut. Then there was the perforation and the pus that was released. He advised surgery. She would lose part of her colon and part of her small intestines. The doctor asked Eva if she had any questions. "Will I have to miss summer camp?" she asked. The great worries of childhood!

Wednesday February 21, 2024: Post-op follow up with the general surgeon. He recommends a surgery plan. Since it's a slow-growth tumor, there's no rush. We decide to schedule surgery sometime after the school year is over.

Interlude

School is over. The girls attend camp. Surgery is scheduled for July 5th. I am not worried about the surgery. I am worried about the pathology report. Will they get everything or is there yet another shoe to drop? I arrange a hiking trip to Yosemite with my friend. It's not really about the hiking itself. Last time I came unglued when the pathology report blindsided me. This time I know that I should not be alone should I get bad news. Surgery is set for 6-days post-surgery.

Surgery (again)

Thursday July 4, 2024: The kids are with me this week. Happy Independence Day. It's prep day for Eva. Antibiotics. Colon cleanse (it's not pleasant). Anti-emetic for the nausea. A day spent on the can. Screw fireworks.

Friday July 5, 2024: Eva checks into the hospital. She is calm. That is until she changes into the hospital gown. She starts sobbing. By the time she is in pre-op, she is back to her stoic self. Her mother came soon afterwards. She is to stay at the hospital overnight with Eva on this day.

The surgery goes well and I head back home afterwards to the other kids. I am to return the next morning to stay with Eva for the next 48-hours.

Sunday, July 7, 2024: Eva makes better than expected progress and she is discharged early in the evening. This time I take her home.Meanwhile, Hurricane Beryl has been barreling our way and I am underprepared.

Rough Weather

Monday, July 8, 2024: Hurricane Beryl hits. The kids are to go over their mother's but the roads are impassable. The kids get to stay with me an extra day. We lose power later in the day. Eva's finally bummed: no more iPad.

Tuesday, July 9, 2024: The hurricane has passed. Still no power. It's beginning to get warm and uncomfortable without the AC. Their mother finally makes it through and picks them up. She has a whole house generator so I know the kids are better off there. I have a small generator and my internet is still up so I continue to work.

Thursday, July 11, 2024: Still no power. It's hot and humid already. It is too hot to sleep but I have to take the early flight out to Fresno, CA (that's base camp for the hike). I have a 6-hour layover at DFW. There I finally get the pathology report: no evidence of tumor cells. I finally exhaled after 5 months.

Gratitude

Saturday, July 13, 2024: We hike the Misty Trail to Vernal & Nevada Falls.

  • Vernal Falls
Vernal Falls

-

Nevada Falls

Nevada Falls

A special thanks

I want to thank my boss at the time, whom I cannot name for reasons that will become clear soon. I want to thank him for his support during a trying time, for giving me the grace, time and space I needed to care for Eva, accompany her to her doctors' visits while juggling other parental duties. He is a man of faith and he was called to become a missionary by his church, in a part of the world that has been known to be hostile to missionaries. I cannot thank him enough.

Kubernetes init containers - the Datadog example

EDIT - July 11 2024 - Adding explicit links to external docs

When I look at the examples of init containers and how they are used, I am left dissatisfied because the examples feel contrived. So I decided to furnish a real example. This is how Datadog (DD) does it which you can read here. Hold off on reading this documentation if you are yet unfamiliar with Kubernetes operators. It will only confuse you.

The init container's job

The DD instrumentation library is injected into an application container by way of an init container. For example, suppose you have containerized Java application. Before your application container is run, the DD init container is run, which adds the required Java library - as a jar file - to the filesystem and terminates. Then the actual application container runs and it starts using the jar file.

Why not add the jar file during container build?

You may be asking yourselves why this could not have been done at application container build-time? The simple reason: you may not be in a position to recontainerize the application. It may well be a 3rd-party application container which needs to be instrumented.

The Admission Controller's role (optional)

To continue with the Java application example, the gist is that when the DD agent runs in the cluster, it also runs a DD Admission Controller (DDAC) which registers itself with the Kubernetes control plane which intercepts Pod creation requests to the Kubernetes API server before persistence of the Pod objects. This is where the Pod's spec is modified to insert an init container if it contains some annotations that the DD Admission Controller is looking for. Pods without those modifications will be left alone. The annotations tell the DDAC which Pods need to be modified to have the init container inserted in the spec and furthermore, what version of the jar file will be injected.

While I have used Java applications as an example, the same technique is applicable for application containers where the application may have be written in .Net/Python/Golang etc.

  1. The Datadog Admission Controller - https://docs.datadoghq.com/containers/cluster_agent/admission_controller/?tab=datadogoperator
  2. Datadog automatic instrumentation with local library injection - https://docs.datadoghq.com/tracing/trace_collection/library_injection_local/?tab=kubernetes
  3. Datadog tutorial: Instrumenting a Java application https://docs.datadoghq.com/tracing/guide/tutorial-enable-java-admission-controller/#instrument-your-app-with-datadog-admission-controller

kubectl - the underappreciated tool for the Kubernetes developer

EDIT: Grammatical errors fixed.

It's a trap

Admiral Ackbar

kubectl - a dev's perspective

For those who have gotten into the world of Kubernetes tooling, kubectl remains an essential and perhaps underrated tool, not just in its usage but in how it can inform us even when we are not using it directly (like in a bash script) but also in what it can teach us about how to use it to as a guide to when making calls to the Kubernetes API server. The first in this series of articles entry is not a beginner-level entry but it will surely set the tone for more articles down the line.

The TL;DR version

  1. kubectl ... --v=9 is the Kubernetes developer's underrated friend - it reveals much about which Kubernetes API server endpoints are being called, with what parameters and the actual HTTP requests to and responses from the API server.
  2. kubectl transforms the actual JSON/YAML used when creating or editing Kubernetes resources and likewise when fetching Kubernetes resources and displaying them, in unexpected ways and the only good way to observe these differences is by increasing the verbosity of the output. What I am saying is that what kubectl displays is not necessarily what the API server served up.

The God-is-in-the-details version

Have you ever tried to, say get a list of all resources of a kind in a namespace? Say you would like to get a list of all ConfigMaps in a namespace.

Suppose I have 2 ConfigMaps in a namespace, the first of which is

apiVersion: v1
data:
  hello: world
  yoda: do or do not. There is no try
kind: ConfigMap
metadata:
  labels:
    app: blog
  name: hello-world

and the second of which is

apiVersion: v1
data:
  palpatine: There is a great disturbance in the force
  vader: I find your lack of faith disturbing
kind: ConfigMap
metadata:
  labels:
    app: blog
  name: sithisms

and we kubectl apply both of these to a namespace list-resources-ns to create them.

Note the .metadata.labels in both ConfigMaps : app: blog

Once the ConfigMaps have been created, let us fetch both of them simultaneously from the cluster by selecting them using the labels we applied to them app: blog.

kubectl get configmap --selector app=blog -n list-resources-ns 

which gives us the output

NAME          DATA   AGE
hello-world   2      25m
sithisms      2      22m

but now if we change the command above to

kubectl get configmap --selector app=blog -n list-resources-ns -oyaml

we get something like this (.resourceVersion, .creationTimestamp values and the like not withstanding, those would vary in your case)

yaml apiVersion: v1 items: - apiVersion: v1 data: hello: world yoda: do or do not. There is no try kind: ConfigMap metadata: creationTimestamp: "2024-06-30T17:46:35Z" labels: app: blog name: hello-world namespace: list-resources-ns resourceVersion: "11757" uid: 01c03ca8-da39-44ba-991b-7dc4818440cd - apiVersion: v1 data: palpatine: There is a great disturbance in the force vader: I find your lack of faith disturbing kind: ConfigMap metadata: creationTimestamp: "2024-06-30T17:49:39Z" labels: app: blog name: sithisms namespace: list-resources-ns resourceVersion: "11806" uid: 81f99d38-3de8-460f-911f-0d954f165ed1 kind: List metadata: resourceVersion: ""

So far no surprises. Except for what is going on behind the scenes.

  1. First of all, unsurprisingly, kubectl is issuing an HTTP GET to the Kubernetes API server. In fact the GET is issued to the relative URL /api/v1/namespaces/list-resources-ns/configmaps?labelSelector=app%3Dblog&limit=500
  2. Secondly, the API server is returning not a YAML but JSON and kubectl converts the JSON to a YAML. This shouldn't surprise most devs: YAML is better suited to configuration files but JSON is better suited for transporting data compared to YAML since YAML is pretty finicky about indentation. So, kubectl converts the JSON to a YAML before showing you the output.
  3. Thirdly, and this might surprise many, you would expect that the API server returns the JSON-equivalent of the above YAML to have been returned.
{
    "apiVersion": "v1",
    "items": [
        {
            "apiVersion": "v1",
            "data": {
                "hello": "world",
                "yoda": "do or do not. There is no try"
            },
            "kind": "ConfigMap",
            "metadata": {
                "creationTimestamp": "2024-06-30T17:46:35Z",
                "labels": {
                    "app": "blog"
                },
                "name": "hello-world",
                "namespace": "list-resources-ns",
                "resourceVersion": "11757",
                "uid": "01c03ca8-da39-44ba-991b-7dc4818440cd"
            }
        },
        {
            "apiVersion": "v1",
            "data": {
                "palpatine": "There is a great disturbance in the force",
                "vader": "I find your lack of faith disturbing"
            },
            "kind": "ConfigMap",
            "metadata": {
                "creationTimestamp": "2024-06-30T17:49:39Z",
                "labels": {
                    "app": "blog"
                },
                "name": "sithisms",
                "namespace": "list-resources-ns",
                "resourceVersion": "11806",
                "uid": "81f99d38-3de8-460f-911f-0d954f165ed1"
            }
        }
    ],
    "kind": "List",
    "metadata": {
        "resourceVersion": ""
    }
}

except that what the API server actually returns is

{
  "kind": "ConfigMapList",
  "apiVersion": "v1",
  "metadata": {
    "resourceVersion": "15095"
  },
  "items": [
    {
      "metadata": {
        "name": "hello-world",
        "namespace": "list-resources-ns",
        "uid": "01c03ca8-da39-44ba-991b-7dc4818440cd",
        "resourceVersion": "11757",
        "creationTimestamp": "2024-06-30T17:46:35Z",
        "labels": {
          "app": "blog"
        },
        "managedFields": [
          {
            "manager": "kubectl-create",
            "operation": "Update",
            "apiVersion": "v1",
            "time": "2024-06-30T17:46:35Z",
            "fieldsType": "FieldsV1",
            "fieldsV1": {
              "f:data": {
                ".": {},
                "f:hello": {},
                "f:yoda": {}
              }
            }
          }
        ]
      },
      "data": {
        "hello": "world",
        "yoda": "do or do not. There is no try"
      }
    },
    {
      "metadata": {
        "name": "sithisms",
        "namespace": "list-resources-ns",
        "uid": "81f99d38-3de8-460f-911f-0d954f165ed1",
        "resourceVersion": "11806",
        "creationTimestamp": "2024-06-30T17:49:39Z",
        "labels": {
          "app": "blog"
        },
        "managedFields": [
          {
            "manager": "kubectl-create",
            "operation": "Update",
            "apiVersion": "v1",
            "time": "2024-06-30T17:49:39Z",
            "fieldsType": "FieldsV1",
            "fieldsV1": {
              "f:data": {
                ".": {},
                "f:vader": {}
              }
            }
          }
        ]
      },
      "data": {
        "palpatine": "There is a great disturbance in the force",
        "vader": "I find your lack of faith disturbing"
      }
    }
  ]
}

Don't believe me? Try issuing the following

kubectl get cm --selector app=blog -ojson --v=9

The differences are many and profound. Here they are in tabular form.

Differences As printed by kubectl As sent by API server
.kind List ConfigMapList
.items[*].kind ConfigMap Field absent
.items[*].metadata.managedFields Field absent Field present

Now, why does this matter? It may not matter most of the times except when you are a dev creating a client-side tool, either using one of the officially supported kubernetes client libraries or maybe just use curl or any number of HTTP libraries; and you are looking to parse the JSON returned by the API server, you could easily be flummoxed as I was.

I was using the Kubernetes Python client library to fetch a set of secrets in a cluster, checking if the data in each secret was valid and updating only the secrets that needed to be updated and then updating the secrets to the cluster. Except the client kept insisting that apiVersion was absent and kind had not been set in the Secrets object. It was not until I pulled up kubectl and did a kubectl get --selector ... --v=9 that I realized what had gone wrong and the Secret objects - in the Python client it is called the V1Secret class - was indeed incomplete because I had constructed the V1Secret objects directly from the .items[*] JSON-objects and they did not have .kind and .apiVersion set.