Kenneth Kasuba
Director of Security, AI Research
Over the past four years, I've been responsible for architecting and hardening EKS clusters across organizations ranging from Series B startups to Fortune 500 financial services firms. Across every engagement, the pattern is the same: teams adopt Kubernetes for velocity, then discover, usually through a painful audit or an actual incident, that their security posture is nowhere near where it needs to be.
This post distills the architecture patterns I've standardized on after securing EKS clusters running production workloads across AWS, GCP (via GKE with EKS migration patterns), and Azure (AKS). These aren't theoretical recommendations. Every pattern here has been implemented, tested under load, and validated against the CIS Kubernetes Benchmark and the NSA/CISA Kubernetes Hardening Guide v1.2.
The 5 Misconfigurations I Find in Every Audit
Before diving into architecture, let me share the five issues I find in literally every Kubernetes security assessment. If you recognize your environment here, you're not alone, but you do need to act.
- Overprivileged IRSA roles (or no IRSA at all). Teams either skip IAM Roles for Service Accounts entirely and let pods inherit the node's instance profile, or they create a single "app" IAM role with
s3:*andsecretsmanager:*and bind it to every service account. In my experience, this is the single highest-risk finding because it turns any container escape into a full AWS account compromise. - Default namespace usage with no network policies. I've walked into environments with 200+ pods in
default, zeroNetworkPolicyobjects, and flat east-west traffic. Any compromised pod can reach every other pod, the metadata service, and often the Kubernetes API server. - No admission control. No OPA Gatekeeper, no Kyverno, no Pod Security Standards enforcement. Developers can deploy privileged containers, mount the host filesystem, and disable all security contexts. The AWS EKS Best Practices Guide explicitly recommends admission control as a baseline, yet I find it absent in roughly 70% of clusters I audit.
- No runtime detection. No Falco, no Tetragon, no syscall monitoring of any kind. This means that even if an attacker exploits something like CVE-2022-23648 (the containerd image pull vulnerability) or CVE-2021-25741 (kubelet path traversal allowing host filesystem access), you won't know until they've exfiltrated data or moved laterally.
- Stale, unpatched AMIs and control plane versions. I routinely find clusters running EKS versions two or three minor releases behind, with node AMIs that haven't been rotated in months. The control plane and data plane version skew alone violates the Kubernetes version skew policy, and unpatched nodes carry known CVEs in containerd, runc, and the kubelet.
Zero-Trust Pod Architecture
The pattern I've standardized on treats every pod as a potential adversary. This isn't paranoia. It's the only defensible architecture when you're running multi-tenant workloads at scale. The diagram below shows how I layer security controls around every pod in a production namespace:
Let me walk through each layer and the specific implementation details.
Layer 1: Admission Control with OPA Gatekeeper and Kyverno
In my experience, the most impactful security control you can deploy to an EKS cluster is admission control. I've shifted from using OPA Gatekeeper exclusively to a hybrid approach where Kyverno handles the common cases and Gatekeeper handles complex cross-resource policies.
Here's the Kyverno ClusterPolicy I deploy to every cluster as a baseline. This single policy prevents the majority of privilege escalation attacks:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: baseline-pod-security
annotations:
policies.kyverno.io/title: Baseline Pod Security
policies.kyverno.io/category: Pod Security
policies.kyverno.io/severity: high
policies.kyverno.io/description: >-
Enforces baseline pod security standards across all namespaces.
Maps to CIS Kubernetes Benchmark sections 5.2.x.
spec:
validationFailureAction: Enforce
background: true
rules:
- name: deny-privileged-containers
match:
any:
- resources:
kinds:
- Pod
validate:
message: "Privileged containers are not allowed. Set securityContext.privileged to false."
pattern:
spec:
=(initContainers):
- =(securityContext):
=(privileged): false
containers:
- =(securityContext):
=(privileged): false
- name: require-non-root
match:
any:
- resources:
kinds:
- Pod
validate:
message: "Containers must run as non-root. Set runAsNonRoot to true."
pattern:
spec:
=(securityContext):
=(runAsNonRoot): true
containers:
- securityContext:
runAsNonRoot: true
- name: deny-host-namespaces
match:
any:
- resources:
kinds:
- Pod
validate:
message: "Host namespaces (hostPID, hostIPC, hostNetwork) are not allowed."
pattern:
spec:
=(hostPID): false
=(hostIPC): false
=(hostNetwork): false
- name: restrict-volume-types
match:
any:
- resources:
kinds:
- Pod
validate:
message: "Only configMap, emptyDir, projected, secret, downwardAPI, persistentVolumeClaim, and CSI volumes are allowed."
deny:
conditions:
any:
- key: "{{ request.object.spec.volumes[].hostPath | length(@) }}"
operator: GreaterThan
value: 0
- name: require-read-only-root
match:
any:
- resources:
kinds:
- Pod
validate:
message: "Root filesystem must be read-only."
pattern:
spec:
containers:
- securityContext:
readOnlyRootFilesystem: trueFor more complex policies, like ensuring every ServiceAccount that has an IRSA annotation also has a corresponding NetworkPolicy in the same namespace, I use OPA Gatekeeper's ConstraintTemplate with Rego:
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8srequirenetworkpolicy
spec:
crd:
spec:
names:
kind: K8sRequireNetworkPolicy
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8srequirenetworkpolicy
violation[{"msg": msg}] {
input.review.kind.kind == "Pod"
namespace := input.review.object.metadata.namespace
not has_network_policy(namespace)
msg := sprintf("Namespace %v has no NetworkPolicy. All namespaces with workloads must have at least one NetworkPolicy.", [namespace])
}
has_network_policy(namespace) {
some i
pol := data.inventory.namespace[namespace]["networking.k8s.io/v1"]["NetworkPolicy"][i]
}Layer 2: Network Segmentation with Cilium
I migrated away from the default VPC CNI's network policy support to Cilium two years ago and haven't looked back. The reasons are concrete: Cilium's eBPF-based datapath gives you L7 visibility (you can see HTTP methods, gRPC calls, and DNS queries at the pod level), kernel-level enforcement that doesn't depend on iptables chains, and transparent mTLS via Cilium's service mesh integration.
Here's the Terraform module I use to deploy Cilium on EKS with the security-relevant options enabled:
module "cilium" {
source = "cilium/cilium/helm"
version = "1.15.3"
values = [yamlencode({
kubeProxyReplacement = "strict"
k8sServiceHost = var.eks_cluster_endpoint
k8sServicePort = "443"
hubble = {
enabled = true
relay = { enabled = true }
ui = { enabled = true }
metrics = {
enabled = [
"dns:query;rcode;ips",
"drop:sourceContext;destinationContext;reason",
"tcp:flag;sourceContext;destinationContext",
"flow:sourceContext;destinationContext",
"http:method;status;sourceContext;destinationContext"
]
}
}
# Enable transparent encryption
encryption = {
enabled = true
type = "wireguard"
}
# Default deny all ingress/egress
policyEnforcementMode = "always"
# eBPF-based masquerading (replaces iptables)
bpf = {
masquerade = true
tproxy = true
}
# Host-level firewall
hostFirewall = {
enabled = true
}
# Enable bandwidth manager for fair queuing
bandwidthManager = {
enabled = true
}
})]
}The critical setting is policyEnforcementMode = "always". This implements default-deny at the cluster level: no pod can communicate with any other pod unless explicitly allowed by a CiliumNetworkPolicy. Combined with Hubble's flow logs, this gives you complete visibility into every network connection in the cluster.
Layer 3: IRSA Hardening: The Pattern Most Teams Get Wrong
IRSA (IAM Roles for Service Accounts) is the single most important security feature in EKS, and it's also the one I see misconfigured most frequently. The core issue is that teams create IAM trust policies that are too broad.
Here's the pattern I enforce. Notice the Condition block: it restricts the role to a specific service account in a specific namespace, not just "any service account in the OIDC provider":
resource "aws_iam_role" "app_payment_processor" {
name = "${var.cluster_name}-payment-processor"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = {
Federated = var.oidc_provider_arn
}
Action = "sts:AssumeRoleWithWebIdentity"
Condition = {
StringEquals = {
"${var.oidc_provider}:sub" = "system:serviceaccount:payments:payment-processor"
"${var.oidc_provider}:aud" = "sts.amazonaws.com"
}
}
}]
})
# Permission boundary: defense in depth
permissions_boundary = aws_iam_policy.workload_boundary.arn
tags = {
managed-by = "terraform"
cluster = var.cluster_name
namespace = "payments"
service = "payment-processor"
data-class = "pci"
}
}
# Least-privilege policy: only the specific S3 bucket and KMS key needed
resource "aws_iam_role_policy" "payment_processor" {
name = "payment-processor-access"
role = aws_iam_role.app_payment_processor.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"s3:GetObject",
"s3:PutObject"
]
Resource = "${var.payment_bucket_arn}/transactions/*"
Condition = {
StringEquals = {
"s3:x-amz-server-side-encryption" = "aws:kms"
}
}
},
{
Effect = "Allow"
Action = [
"kms:Decrypt",
"kms:GenerateDataKey"
]
Resource = var.payment_kms_key_arn
}
]
})
}The permission boundary (permissions_boundary) is the key defense-in-depth measure that most teams skip. Even if someone modifies the inline policy to grant broader access, the boundary policy caps the maximum permissions. I define the boundary to prohibit IAM modification, Organizations actions, and access to management-plane resources.
Multi-Cloud IAM Integration Patterns
One of the questions I get most frequently is how pod identity works across cloud providers and which approach is "best." Having implemented all three in production, here's my honest comparison:
Cross-Provider Identity Comparison
Enforcing One Identity Per Service
All three providers have converged on the same fundamental pattern: projected service account tokens with OIDC federation. The implementation details differ, but the security model is equivalent. The critical thing to understand is that in all three cases, the security boundary is the ServiceAccount-to-Namespace binding. If you allow multiple services to share a ServiceAccount, you've collapsed your identity boundary.
The pattern I enforce is one ServiceAccount per microservice per namespace, with the IAM role scoped to exactly that combination. In Terraform, this looks like a module that takes the namespace, service account name, and a minimal IAM policy document as inputs, and produces the fully-scoped IRSA binding as output. This eliminates the manual error of copy-pasting trust policies with incorrect sub claims.
Runtime Detection
Layer 4: Runtime Detection with Falco and Tetragon
Admission control and network policies are preventive controls. You also need detective controls: something that watches what's actually happening inside containers at runtime. I deploy both Falco and Tetragon, because they complement each other: Falco excels at high-level behavioral rules with a rich community rule library, while Tetragon provides kernel-level enforcement via eBPF that can actually kill a process before it completes a malicious action.
Here's a custom Falco rule I've written that catches a specific attack pattern I've seen in the wild: an attacker exploiting a web application vulnerability to access the IMDS endpoint and steal IAM credentials:
- rule: Detect IMDS Token Theft via Web Process
desc: >
Detects when a web server process (nginx, apache, node, python, java)
makes an HTTP connection to the EC2 Instance Metadata Service.
This is a strong indicator of SSRF exploitation or container escape
attempting to steal IAM credentials.
condition: >
evt.type in (connect, sendto) and
evt.dir = < and
fd.sip = "169.254.169.254" and
container.id != host and
(proc.name in (nginx, apache2, httpd, node, python, python3, java, dotnet) or
proc.pname in (nginx, apache2, httpd, node, python, python3, java, dotnet))
output: >
IMDS access from web process detected
(container=%container.name pod=%k8s.pod.name ns=%k8s.ns.name
process=%proc.name parent=%proc.pname cmdline=%proc.cmdline
connection=%fd.name user=%user.name image=%container.image.repository)
priority: CRITICAL
tags: [aws, imds, ssrf, credential_theft, mitre_credential_access]
- rule: Detect Service Account Token Read
desc: >
Detects processes reading the projected service account token.
While normal for application initialization, reading this token
from a shell or unexpected process indicates potential credential theft.
condition: >
open_read and
fd.name startswith /var/run/secrets/kubernetes.io/serviceaccount and
container.id != host and
not proc.name in (vault-agent, aws-iam-authenticator, envoy, pilot-agent) and
proc.pname in (bash, sh, dash, zsh, curl, wget)
output: >
Service account token read by suspicious process
(container=%container.name pod=%k8s.pod.name ns=%k8s.ns.name
process=%proc.name parent=%proc.pname file=%fd.name)
priority: HIGH
tags: [kubernetes, credential_access, service_account]For Tetragon, I configure enforcement policies that go beyond detection: they actually prevent the malicious action. This TracingPolicy kills any process that attempts to load a kernel module from within a container:
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: block-kernel-module-load
spec:
kprobes:
- call: "init_module"
syscall: false
args:
- index: 0
type: "nop"
selectors:
- matchNamespaces:
- namespace: Pid
operator: NotIn
values:
- "host_ns"
matchActions:
- action: Sigkill
- call: "finit_module"
syscall: false
args:
- index: 0
type: "nop"
selectors:
- matchNamespaces:
- namespace: Pid
operator: NotIn
values:
- "host_ns"
matchActions:
- action: SigkillThis is particularly important for mitigating kernel exploit chains. If an attacker gains code execution inside a container and attempts to escalate privileges by loading a malicious kernel module, Tetragon terminates the process at the kernel level before the module loads. This would have been effective against several real-world container escape techniques that rely on CAP_SYS_MODULE or abusing writable cgroup mounts.
The Shift-Left Pipeline: Catching Misconfigurations Before They Deploy
All of the runtime controls I've described are your last line of defense. The real goal is to catch misconfigurations before they ever reach the cluster. Over the past two years, I've refined a shift-left pipeline that catches approximately 90% of security issues before they leave the developer's machine or CI environment:
The pipeline has five stages, and the key insight is that each stage catches a different class of issues:
Stage 1: Pre-commit Hooks
I use Kyverno's CLI in pre-commit hooks. Developers get immediate feedback before they even push code:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/kyverno/kyverno
rev: v1.11.4
hooks:
- id: kyverno-apply
name: Kyverno Policy Check
entry: kyverno apply ./policies/ --resource
files: '(deployment|statefulset|daemonset|pod).*\.ya?ml$'
language: golang
- repo: https://github.com/aquasecurity/trivy
rev: v0.50.1
hooks:
- id: trivy-config
name: Trivy Config Scan
entry: trivy config --severity HIGH,CRITICAL --exit-code 1
files: '.*\.ya?ml$'
language: golangStage 2: CI Policy Scanning
In CI (I use GitHub Actions for most clients, but this works with any CI system), I run a comprehensive policy scan that includes OPA Conftest for custom Rego policies, Trivy for both config and image scanning, and SBOM generation with Syft:
# .github/workflows/security-scan.yml
name: Security Scan
on:
pull_request:
paths:
- 'k8s/**'
- 'Dockerfile*'
- 'terraform/**'
jobs:
policy-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run OPA Conftest
uses: open-policy-agent/conftest-action@v2
with:
files: k8s/
policy: policies/
- name: Trivy Image Scan
uses: aquasecurity/trivy-action@master
with:
scan-type: image
image-ref: ${{ github.event.pull_request.head.sha }}
severity: HIGH,CRITICAL
exit-code: 1
format: sarif
output: trivy-results.sarif
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: trivy-results.sarif
- name: Generate SBOM
uses: anchore/sbom-action@v0
with:
format: spdx-json
output-file: sbom.spdx.jsonStage 3: ArgoCD Admission Gate
ArgoCD serves as my GitOps deployment engine, and it's the final gate before resources reach the cluster. I configure ArgoCD to sync through the same admission webhooks that protect the cluster, so even if someone pushes a non-compliant manifest to the Git repo, ArgoCD's sync will fail with a clear policy violation message.
The pattern is straightforward: Kyverno or Gatekeeper is already running in the cluster in Enforce mode. When ArgoCD attempts to apply a resource that violates policy, the admission webhook rejects the request, ArgoCD marks the sync as failed, and the team gets alerted via Slack or PagerDuty. This creates a feedback loop where developers learn the policies by encountering them during deployment, not during an incident post-mortem.
Stage 4: Runtime Monitoring
Even with three layers of shift-left controls, runtime monitoring catches what static analysis cannot: zero-day exploits, supply chain attacks in base images, and insider threats. The Falco and Tetragon rules I showed earlier form this layer, with alerts routed to a SIEM (I typically use Datadog or Elastic Security) for correlation and incident response.
Putting It All Together: The Terraform Module Structure
I package all of these controls into a single Terraform module that can be applied to any EKS cluster. The module structure looks like this:
module "eks_security_baseline" {
source = "git::https://github.com/internal/terraform-eks-security.git?ref=v3.2.1"
cluster_name = module.eks.cluster_name
cluster_endpoint = module.eks.cluster_endpoint
oidc_provider_arn = module.eks.oidc_provider_arn
oidc_provider = module.eks.oidc_provider
# Cilium configuration
enable_cilium = true
cilium_version = "1.15.3"
cilium_encryption = "wireguard"
cilium_default_deny = true
cilium_hubble_enabled = true
# Admission control
enable_kyverno = true
kyverno_version = "3.1.4"
kyverno_validation_action = "Enforce"
enable_gatekeeper = true
gatekeeper_version = "3.15.1"
# Runtime security
enable_falco = true
falco_version = "0.37.1"
falco_driver = "modern_ebpf"
enable_tetragon = true
tetragon_version = "1.0.2"
# Monitoring integration
alert_webhook_url = var.slack_security_webhook
siem_endpoint = var.datadog_log_endpoint
# IRSA defaults
irsa_permission_boundary = aws_iam_policy.workload_boundary.arn
irsa_token_expiration = 3600 # 1 hour, not the default 12
tags = var.common_tags
}Notice irsa_token_expiration = 3600. The default IRSA token lifetime is 12 hours, which is far too long. If an attacker exfiltrates a projected service account token, they have a 12-hour window to use it from outside the cluster. I reduce this to 1 hour for all workloads and 15 minutes for highly sensitive ones. The AWS SDK handles token refresh transparently, so there's no application impact.
Real-World Incident: Why This Matters
Let me share a sanitized version of an incident that validated this architecture. During a penetration test on a financial services client's EKS cluster, the red team exploited a Server-Side Request Forgery (SSRF) vulnerability in a web application to reach the EC2 Instance Metadata Service (IMDS). In a cluster without IRSA, this would have given them the node's IAM role, which typically has permissions to pull ECR images, write CloudWatch logs, and interact with the EKS API. From there, lateral movement is trivial.
In this case, the layered defense worked exactly as designed:
- IRSA meant the node's instance profile had minimal permissions: no S3, no Secrets Manager, no ability to assume other roles. The IMDS credentials were nearly useless.
- Cilium network policy blocked the pod's connection to 169.254.169.254 entirely (we allow IMDS access only from specific system pods that need it). The SSRF attempt never reached IMDS.
- Falco detected the attempted connection to the metadata service from a web process (using the rule I showed above) and fired a CRITICAL alert within 800ms.
- Tetragon logged the full process tree, giving the incident response team a complete timeline: which process initiated the connection, what command-line arguments were used, and the parent process chain back to the container entrypoint.
The total time from exploit attempt to SOC alert was under 2 seconds. The red team's report noted this was the most effective container security architecture they'd encountered across their client base.
Operational Considerations
A few things I've learned the hard way that aren't in any documentation:
Falco: Use the Modern eBPF Driver
Falco's modern eBPF driver is worth the kernel version requirement. The older kernel module driver caused node instability under high syscall volumes in our load tests. Since moving to the modern eBPF driver (requires kernel 5.8+, which all current EKS AMIs support), we've had zero Falco-related node issues. If you're still on the kernel module driver, migrate immediately.
Kyverno: Audit Mode Before Enforce
Kyverno's Audit mode is your friend during rollout. Never deploy Kyverno policies in Enforce mode to an existing cluster without first running in Audit for at least two weeks. I've seen well-intentioned policy rollouts take down production because a critical system pod (usually something in kube-system) violated the new policy. Use Kyverno's policy reports to identify violations, fix them, then switch to Enforce.
Cilium: Plan Default-Deny Rollout
Cilium's policyEnforcementMode: always requires careful planning. When you enable default-deny, every pod that doesn't have an explicit CiliumNetworkPolicy will lose all network connectivity. I handle this by deploying network policies as part of the same Helm chart or Kustomize overlay as the application, so the policy and the workload arrive together. For existing clusters, start with policyEnforcementMode: default and add policies namespace by namespace.
Admission Webhook Latency at Scale
Monitor your admission webhook latency. Both Kyverno and Gatekeeper add latency to every API server request. In clusters with high pod churn (1000+ pod creates per minute during scaling events), I've seen webhook latency spike to 500ms+, which cascades into Deployment rollout timeouts. Set resource requests/limits appropriately, run multiple replicas, and monitor the kyverno_admission_review_duration_seconds metric closely.
What's Next: The Architecture Is Never Done
The patterns I've described here represent the current state of production-hardened EKS security, but the landscape is evolving rapidly. I'm actively evaluating Cilium's service mesh mode as a replacement for Istio (fewer moving parts, better performance), exploring Tetragon's new runtime enforcement policies for file integrity monitoring, and working on integrating Sigstore/cosign image verification into the admission pipeline so that only signed, attested images can run in production.
If you're starting from scratch, my recommendation is: IRSA first, Kyverno second, Cilium third, Falco fourth. Each layer is independently valuable, and you can adopt them incrementally without disrupting existing workloads. The key is to start. A partial implementation of this architecture is infinitely better than the default EKS configuration, which provides almost no workload security out of the box.
The AWS EKS Best Practices Guide is an excellent companion to this post, and I'd encourage every EKS operator to treat the CIS Kubernetes Benchmark as a minimum baseline, not an aspirational target. If you're running production workloads on EKS and you haven't implemented at least IRSA and admission control, you have an urgent security gap that needs to be addressed now, not next quarter.
Get security research in your inbox
AI security, cloud architecture, threat analysis. No spam.