DevConf.cz 2021 has ended
Back To Schedule
Friday, February 19 • 12:15pm - 12:40pm
Let There Be Topology-Awareness in Kube-Scheduler!

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.

Performance-critical workloads require topology information in order to use co-located CPU cores and devices for industries like telco, HPC, and IoT. Despite the success of Kubernetes Topology Manager, the current native scheduler does not select a node based on its topology. It's time to solve this problem.

We will introduce the audience to hardware topology, the current state of Topology Manager, gaps in the current scheduling process, and prior out-of-tree solutions. We'll explain the workarounds available right now: custom schedulers, creating scheduling extensions, using node selectors, or manually assigning resources semi-automatically. All these methods have their drawbacks.

Finally, we will explain how we plan to improve the native scheduler to work with Topology Manager. Attendees will learn both current workarounds, and the future of topology aware scheduling in kubernetes.

K8s has taken the world by storm attracting unconventional workloads such as HPC Edge, IoT, Telco and Comm service providers, 5G, AI/ML and NFV solutions to it. This talk would benefit users, engineers, and cluster admins deploying performance sensitive workloads on k8s. Addition of newer nodes running alongside older ones in data centers results in hardware heterogeneity. Motivated by saving physical space in the data centers, newer nodes are packed with more CPUs, enhanced hardware capabilities. Exposing to use fine grain topology information for optimised workload placement would help service providers and VNF vendors too.
We'll explain numerous challenges encountered in efficiently deploying workloads due to inability to understand the hardware topology of the underlying bare metal infrastructure and scheduling based on it.
Scheduler's lack of knowledge of resource topology can lead to unpredictable application performance, in general under-performance, and in the worst case, complete mismatch of resource requests and kubelet policies, scheduling a pod where it is destined to fail, potentially entering a failure loop. Exposing cluster level topology to the scheduler empowers it to make intelligent NUMA aware placement decisions optimizing cluster wide performance of workloads. This would benefit Telco User Group in kubernetes, kubernetes and the overall CNCF ecosystem enabling improved application performance without impacting user experience.

avatar for Swati Sehgal

Swati Sehgal

Principal Software Engineer, Red Hat
Swati Sehgal is a Principal Software Engineer in the Ecosystem Engineering Group at Red Hat. She works to enhance OpenShift and its platform to deliver best-in-class networking applications, leading edge solutions and innovative enhancements across the stack. Her work includes working... Read More →

Friday February 19, 2021 12:15pm - 12:40pm CET
Session Room 5