Kubernetes Scenario based Interview Questions

Question 1: Your application running as a Deployment in Kubernetes suddenly starts experiencing a high number of 503 errors. You check the pod status and see that all pods are running and healthy. How would you begin to diagnose the problem?

Expected Answer:

I would start by investigating the application logs within the pods. Even though the pods are reported as healthy by Kubernetes, the application itself might be experiencing issues. I would use kubectl logs <pod-name> to check for any application-level errors, exceptions, or unusual behavior. I would also check the resource consumption (CPU and memory) of the pods using kubectl top pod to see if the application is being overloaded, even if it hasn’t reached the point of crashing or becoming unhealthy according to its probes. Next, I would examine the service definition and any associated Ingress or LoadBalancer configurations to ensure traffic is being routed correctly to the pods. Finally, I’d check the network policies to ensure they aren’t inadvertently blocking traffic.

Skill/Concept Tested: Application debugging in Kubernetes, log analysis, resource monitoring, understanding of Services and Ingress/LoadBalancers, Network Policies.

Question 2: You have a stateful application that requires persistent storage. You’ve deployed it as a StatefulSet with a PersistentVolumeClaim (PVC). One of the underlying nodes hosting the persistent volume fails. What happens to your application and its data? How does Kubernetes handle this?

Expected Answer:

When a node hosting a persistent volume used by a StatefulSet fails, Kubernetes will attempt to reschedule the affected pod to a healthy node. However, the PersistentVolume (PV) is bound to a specific availability zone or node (depending on the storage provisioner). Kubernetes will need to ensure that the PV is accessible from the new node. The behavior depends on the storage provisioner and how the PV was provisioned. For example, with cloud-based storage like EBS or GCE Persistent Disks, the volume can typically be detached from the failed instance and re-attached to the new instance where the pod is being scheduled. This ensures data persistence. The StatefulSet controller will wait for the PV to be available on the new node before starting the pod. There might be a short period of downtime while the volume is being re-attached.

Skill/Concept Tested: Understanding of StatefulSets, Persistent Volumes (PVs), PersistentVolumeClaims (PVCs), data persistence in Kubernetes, node failures, and storage provisioner behavior.

Question 3: You want to expose your application running in a Deployment to the outside world. What are the different ways to achieve this in Kubernetes, and what are the trade-offs of each?

Expected Answer:

There are primarily three ways to expose an application:

Service of type NodePort: This exposes the service on each Node’s IP at a static port (default range: 30000-32767). External traffic can access the service using <NodeIP>:<NodePort>.
- Trade-offs: Simple to set up, but port management can be cumbersome, and you need an external load balancer or other mechanism to route traffic to the correct Node IPs for high availability.
Service of type LoadBalancer: This provisions a cloud provider’s load balancer that automatically routes external traffic to the pods of the service. The cloud provider manages the load balancer.
- Trade-offs: Easy to use in cloud environments, provides high availability and scalability for external access. However, it incurs costs associated with the cloud provider’s load balancer. It’s also cloud-specific.
Ingress: This provides HTTP and HTTPS routing to services based on rules defined in Ingress resources. It typically uses an Ingress controller (like Nginx or Traefik) to implement the routing.
- Trade-offs: Provides flexible routing rules (path-based, host-based), TLS termination, and can manage multiple services with a single IP address. Requires an Ingress controller to be deployed and configured. Can be more complex to set up initially.

Skill/Concept Tested: Understanding of Kubernetes Services (NodePort, LoadBalancer), Ingress, and the trade-offs between different external access methods.

Question 4: Your application needs to securely access secrets, such as API keys and passwords. How would you manage and provide these secrets to your Kubernetes pods?

Expected Answer:

The recommended way to manage secrets in Kubernetes is by using the built-in Secret object. Secrets can be created using kubectl create secret or defined in YAML files. They can be stored as base64 encoded strings (though this is not encryption). To consume secrets in a pod, you can mount them as files in a volume or expose them as environment variables. For more advanced secret management, especially in production environments, I would consider using external secret management solutions like HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault, often integrated with Kubernetes through tools or operators (e.g., Vault agent injector, AWS Secrets and Configuration Provider). These solutions offer better security features like encryption at rest and access control.

Skill/Concept Tested: Kubernetes Secrets, secret management best practices, understanding of mounting volumes and environment variables, awareness of external secret management solutions.

Question 5: You want to ensure that a critical application always has at least three replicas running. How would you configure this in Kubernetes?

Expected Answer:

To ensure at least three replicas of an application are always running, I would use a Deployment object. In the Deployment specification, I would set the replicas field to 3. The Deployment controller will then ensure that three healthy replicas are always available. If any pod fails or the node it’s running on goes down, the Deployment controller will automatically create new pods to maintain the desired replica count.

Skill/Concept Tested: Understanding of Kubernetes Deployments and the replicas field, self-healing capabilities of Deployments.

Question 6: You need to update your application to a new version with minimal downtime. How would you perform a rolling update using Kubernetes?

Expected Answer:

Kubernetes Deployments provide built-in support for rolling updates. When I update the spec.template.spec.containers[*].image field of a Deployment, Kubernetes will perform a rolling update by gradually replacing the old pods with new ones. By default, it ensures that a certain number of old replicas are still available during the update (maxUnavailable) and that a certain number of new replicas are ready before scaling down the old ones (maxSurge). I can configure these parameters in the Deployment’s spec.strategy.rollingUpdate section. Kubernetes handles the creation of new pods and the termination of old ones in a controlled manner, ensuring minimal disruption to the application.

Skill/Concept Tested: Understanding of Deployment update strategies, specifically rolling updates, maxUnavailable, and maxSurge.

Question 7: Your application is experiencing intermittent network connectivity issues. How would you troubleshoot network problems within your Kubernetes cluster?

Expected Answer:

Troubleshooting network issues in Kubernetes involves several steps:

Check Pod Networking: Verify that pods within the same namespace and across namespaces can communicate with each other. Use kubectl exec <pod-name> -- ping <another-pod-ip> or kubectl exec <pod-name> -- curl <another-pod-ip>:<port>.
Inspect Service Endpoints: Ensure that the Service has healthy endpoints (pods selected by the Service’s selector). Use kubectl get endpoints <service-name>.
Examine Network Policies: Verify that Network Policies are not inadvertently blocking traffic between pods or namespaces. Use kubectl get networkpolicy and describe specific policies.
Check DNS Resolution: Ensure that pods can resolve internal Kubernetes DNS names (e.g., <service-name>.<namespace>.svc.cluster.local). Use kubectl exec <pod-name> -- nslookup <service-name>.
Investigate CNI Plugin: The Container Network Interface (CNI) plugin is responsible for pod networking. Check the logs of the CNI plugin’s pods (usually in the kube-system namespace) for any errors.
Node-Level Issues: Check the network configuration on the underlying nodes, including firewall rules (e.g., iptables, firewalld).
Ingress/LoadBalancer Issues: If the issue involves external access, investigate the Ingress controller logs or the cloud provider’s LoadBalancer health checks and configuration.

Skill/Concept Tested: Kubernetes networking fundamentals, Services, Endpoints, Network Policies, DNS resolution, understanding of CNI plugins, basic Linux networking commands.

Question 8: You want to limit the resources (CPU and memory) that a particular set of pods can consume. How would you achieve this in Kubernetes? What are the differences between requests and limits?

Expected Answer:

I would use Resource Requests and Limits in the pod specification.

Requests: Define the minimum amount of CPU and memory that Kubernetes should reserve for the pod. Kubernetes will attempt to schedule the pod on a node that has at least these resources available.
Limits: Define the maximum amount of CPU and memory that a pod is allowed to consume. If a pod tries to exceed its memory limit, it might be OOMKilled (Out Of Memory Killed). If a pod exceeds its CPU limit, it will be throttled, potentially impacting performance.

It’s important to set appropriate requests to ensure pods are scheduled on suitable nodes and to set reasonable limits to prevent a single pod from consuming all available resources on a node, which could impact other pods.

Skill/Concept Tested: Resource management in Kubernetes, understanding of Resource Requests and Limits.

Question 9: You have a multi-tenant Kubernetes cluster, and you want to isolate the resources and access control for different teams or applications. How would you approach this?

Expected Answer:

I would use Namespaces to provide logical isolation. Each team or application can have its own namespace. Within each namespace, I would implement the following:

Resource Quotas: To limit the total amount of resources (CPU, memory, storage, etc.) that can be consumed within a namespace.
Network Policies: To control network traffic between pods within the same namespace and across different namespaces.
RBAC (Role-Based Access Control): To define fine-grained permissions for users and service accounts within each namespace, allowing different teams to manage their own resources without affecting others.
Limit Ranges: To set default resource requests and limits for pods within a namespace.

By combining these mechanisms, I can create a more secure and isolated multi-tenant environment.

Skill/Concept Tested: Multi-tenancy in Kubernetes, Namespaces, Resource Quotas, Network Policies, RBAC, Limit Ranges.

Question 10: You are monitoring your Kubernetes cluster and notice that a particular node is consistently under high CPU load. How would you investigate the cause?

Expected Answer:

To investigate a node with high CPU load:

Identify Running Pods: Use kubectl get pods --all-namespaces -o wide | grep <node-name> to list all pods running on the problematic node.
Check Pod Resource Usage: Use kubectl top pod --all-namespaces and filter by the node to see which pods are consuming the most CPU on that node.
Analyze Pod Logs: For the high-CPU-consuming pods, use kubectl logs <pod-name> -n <namespace> to check for any application-level issues, errors, or unusual activity that might be causing high CPU usage.
Exec into Pods: If necessary, use kubectl exec -it <pod-name> -n <namespace> -- top or other profiling tools within the container to get a more detailed view of the processes consuming CPU.
Check Node-Level Processes: SSH into the node (if possible) and use tools like top, htop, or vmstat to identify any non-containerized processes contributing to the high CPU load.
Review Kubernetes System Pods: Check the resource usage and logs of Kubernetes system pods running on the node (e.g., kubelet, kube-proxy) in the kube-system namespace.

Skill/Concept Tested: Node monitoring and troubleshooting, identifying resource-intensive pods, using kubectl top, log analysis, basic Linux system monitoring tools.

Question 11: You need to automatically scale your application horizontally based on the CPU utilization. How would you configure this in Kubernetes?

Expected Answer:

I would use the Horizontal Pod Autoscaler (HPA). I would define an HPA object that targets my Deployment and specifies the target CPU utilization percentage (e.g., 70%). The HPA controller will then periodically query the metrics API (usually provided by metrics-server or kube-state-metrics) to get the current CPU utilization of the pods in the Deployment. Based on the target and current utilization, the HPA will automatically adjust the number of replicas in the Deployment to maintain the desired average CPU usage. I would need to ensure that the metrics server is deployed and functioning correctly in my cluster for the HPA to work.

Skill/Concept Tested: Horizontal Pod Autoscaler (HPA), understanding of metrics-server, autoscaling based on CPU utilization.

Question 12: You want to run a batch job in your Kubernetes cluster that should only run once and then terminate. What type of Kubernetes object would you use?

Expected Answer:

For a one-off batch job, I would use a Job object. A Job creates one or more pods and ensures that a specified number of them successfully complete. Once the specified number of successful completions is reached, the Job controller marks the Job as completed, and the pods are typically not restarted.

Skill/Concept Tested: Understanding of Kubernetes Jobs for batch processing.

Question 13: You have a set of configuration files that your application needs to access. How would you manage these configuration files in Kubernetes?

Expected Answer:

I would primarily use ConfigMap objects to manage configuration files. ConfigMaps allow you to store configuration data as key-value pairs or as entire files. Pods can then consume ConfigMaps as environment variables, command-line arguments, or as files mounted in a volume. For sensitive configuration data, I would use Secrets instead of ConfigMaps. For more complex scenarios or versioned configurations, I might consider using externalized configuration management tools integrated with Kubernetes.

Skill/Concept Tested: Kubernetes ConfigMaps, understanding of how pods consume ConfigMaps (environment variables, volumes), distinction between ConfigMaps and Secrets.

Question 14: Your application relies on another service within the same Kubernetes cluster. How would your application discover and communicate with this other service?

Expected Answer:

Within a Kubernetes cluster, services are discovered using DNS. Kubernetes automatically assigns a DNS name to each Service in the format <service-name>.<namespace>.svc.cluster.local. My application can use this DNS name to resolve the IP address(es) of the pods backing the target service. Kubernetes also provides environment variables to pods with information about the services in the same namespace. My application can then use standard networking libraries to communicate with the resolved IP addresses and port of the other service.

Skill/Concept Tested: Service discovery in Kubernetes using DNS, understanding of the <service-name>.<namespace>.svc.cluster.local format.

Question 15: You are deploying a new version of your application, and you want to test it with a small percentage of the incoming traffic before fully rolling it out. How can you achieve this in Kubernetes?

Expected Answer:

This can be achieved using various techniques, often involving Ingress controllers or Service Mesh solutions:

Canary Deployments with Ingress: You can deploy the new version of your application alongside the old version with a small number of replicas. Then, configure your Ingress rules to route a small percentage of traffic (e.g., based on headers or weights) to the new version’s service, while the majority of traffic goes to the old version’s service.
Service Mesh (e.g., Istio, Linkerd): Service meshes provide more advanced traffic management capabilities, including weighted traffic splitting. You can configure the service mesh to send a percentage of traffic to the canary deployment.
Blue/Green Deployments: Although not exactly canary, a blue/green deployment involves deploying the new version alongside the old one and then switching all traffic at once after testing. This can be combined with canary testing on the “blue” (new) environment before the full switch.

Skill/Concept Tested: Canary deployments, traffic splitting, understanding of Ingress controllers and Service Meshes.

Question 16: You need to ensure that a specific pod is always running on a node with a particular label (e.g., disktype=ssd). How would you achieve this?

Expected Answer:

I would use Node Selectors or Node Affinity in the pod specification.

Node Selector: This is a simple way to constrain pods to nodes with specific labels. In the pod’s spec.nodeSelector field, I would add a key-value pair matching the desired label (e.g., disktype: ssd). Kubernetes will only schedule the pod on nodes that have this label.
Node Affinity: This provides more flexible and expressive ways to constrain pods to nodes. You can specify requiredDuringSchedulingIgnoredDuringExecution (the scheduler must place the pod on a matching node) or preferredDuringSchedulingIgnoredDuringExecution (the scheduler will try to place the pod on a matching node but will still schedule it elsewhere if no match is found). You can also use In, NotIn, Exists, DoesNotExist, Gt, and Lt operators for more complex label matching.

For this specific requirement (always running on a node with disktype=ssd), I would use requiredDuringSchedulingIgnoredDuringExecution node affinity or a node selector.

Skill/Concept Tested: Pod scheduling, Node Selectors, Node Affinity.

Question 17: You suspect that a container within a pod is crashing frequently. How would you investigate this?

Expected Answer:

Check Pod Status: Use kubectl get pods to check the status of the pod. Look for states like CrashLoopBackOff or a high restart count.
View Recent Events: Use kubectl describe pod <pod-name> to see recent events related to the pod, which might include information about container crashes and reasons.
Examine Container Logs: Use kubectl logs <pod-name> -c <container-name> --previous to view the logs of the container from the previous instance (before the crash). This often provides clues about the cause of the crash.
Check Resource Limits: Ensure that the container is not exceeding its defined resource limits (memory or CPU), which could lead to termination.
Review Application Health Checks: If the container has liveness or readiness probes configured, check if these probes are failing, leading to restarts.

Skill/Concept Tested: Pod and container lifecycle, kubectl get pods, kubectl describe pod, kubectl logs, understanding of resource limits and health probes.

Question 18: You need to perform maintenance on one of your Kubernetes worker nodes (e.g., kernel upgrade). How would you safely take the node offline without impacting your running applications?

Expected Answer:

The recommended way to safely take a node offline is as follows:

Cordon the Node: Use kubectl cordon <node-name> to mark the node as unschedulable. This prevents new pods from being scheduled on it.
Drain the Node: Use kubectl drain <node-name> --ignore-daemonsets --delete-local-data. This command gracefully evicts all pods running on the node (except those managed by DaemonSets, which run on every node, and those with local storage unless --delete-local-data is omitted cautiously). The pods will be rescheduled onto other healthy nodes in the cluster (assuming sufficient resources are available).
Perform Maintenance: Once all pods (except DaemonSet pods) have been evicted, you can safely perform the required maintenance on the node.
Uncordon the Node: After the maintenance is complete and the node is back online and healthy, use kubectl uncordon <node-name> to make it schedulable again. Kubernetes will then be able to schedule new pods onto this node.

Skill/Concept Tested: Node maintenance, understanding of kubectl cordon, kubectl drain, and kubectl uncordon, graceful pod eviction, DaemonSets.

Question 19: You want to implement basic authentication for accessing your application exposed via an Ingress. How would you configure this?

Expected Answer:

Basic authentication for an Ingress can be implemented in several ways, often relying on the Ingress controller’s capabilities:

Using Ingress Annotations: Many Ingress controllers (like Nginx) support annotations to configure basic authentication. This typically involves creating a secret containing the htpasswd file with username/password pairs and then referencing this secret in the Ingress annotation.
External Authentication Services: For more complex authentication requirements, you can integrate with external authentication services using mechanisms supported by your Ingress controller or a service mesh.
Application-Level Authentication: While not strictly an Ingress configuration, you could implement authentication logic directly within your application. However, for basic authentication, the Ingress controller level is often preferred for its simplicity and centralized control.

I would typically opt for using Ingress annotations with an htpasswd secret for basic HTTP authentication.

Skill/Concept Tested: Ingress configuration, basic authentication, understanding of Ingress controller annotations, Kubernetes Secrets.

Question 20: You are using Helm to manage your application deployments. You need to upgrade your application to a new version defined in an updated Helm chart. How would you perform this upgrade?

Expected Answer:

To upgrade an application using Helm, I would use the helm upgrade command. This command takes the release name of the existing deployment and the path to the updated Helm chart (or the chart name and repository details). Helm will then compare the new chart with the previously deployed release and apply the necessary changes to update the Kubernetes resources. I would typically review the changes using helm diff <release-name> <new-chart> before performing the actual upgrade to understand the modifications that will be applied. I can also use flags like --values to provide updated configuration values during the upgrade.

Skill/Concept Tested: Helm package manager, performing application upgrades with helm upgrade, understanding of Helm charts and releases, using helm diff.

Leave a Comment Cancel Reply