Lecture 13 - Scheduling

Scheduler Activations

Kernel-level threads	User-level threads
OS is aware of them, manages them	purpose: express concurrency in applications
Purpose: leveraging physical parallelism	Managed by user level thread libraries
Example: pthreads (fn: exit, yield, etc)
Misconception: it’s only in kernel space, it can execute in user space. Note: the kernel just manages them
User vs Kernel Thread ⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠ You can decompress Drawing data with the command palette: ‘Decompress current Excalidraw file’. For more info check in plugin settings under ‘Saving’ Excalidraw Data Text Elements User Kernel State for a kernel thread State for a user-level thread read() User level library is unaware of this block Link to original

Threading Models

M:N threading - Multiplex user-level threads over kernel threads
- where n = kernel threads
- where m = user-level threads
- M>>N
M:M threading - map user-level threads directly to kernel threads
M:1 threading - only run on one core, one kernel threads multiplexes user-level threads (not as used as much anymore)
- no physical parallelism

Tradeoffs between user-level threads

User-level threads	Kernel-level threads
lower overhead	higher overhead
Kernel events are hidden from user level - lots of ways threads block in kernel I/O, page faults - OS can preempt kernel-level threads	OS awareness
Customizable, specialized	general purpose
	hardware parallelism
	more isolation

Scheduler Activations

Scheduler Activations
⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠ You can decompress Drawing data with the command palette: ‘Decompress current Excalidraw file’. For more info check in plugin settings under ‘Saving’

Excalidraw Data

Text Elements

User

Kernel

SA1

SA2

SA3

C1

C2

C3

Read()

Kernel decides how to allocate cores User decides how to schedule among allocated cores

Creates new SA and call to user thread to schedule on new SA instead

SA4

SA 1 blocked, use S4 instead

SA4

Upcall

U1 is runnable U2 is also runnable

U2

U1
Link to original

OS just allocates cores
User-level schedules threads across cores
This is called 2-level scheduling
invariant: # of active SAs = # of cores
OS adds a core to an app: create a new sched. activation
OS revokes a core:
- preempts 2 SAs, uses one for an upcall

In Go

it tries to detect if one kernel thread is blocked then try to wake up a new kernel level thread
not scheduler activations

Summary of Scheduler Activations

Pros/Cons of user-level and kernel-level threads
Use of upcalls to notify user levels whenever scheduling decisions has been made

Decade of Wasted Cores

Linus Torvalds, what do you know?
- his temper
- his response to commits
- ego
- Git, Linux
- advocate for monolithic kernels
- advocate for open source software

Scheduler Goals

Fair allocation of compute - policy
Prioritize important tasks - policy
Work conserving - no core is idle if there are runnable threads
Performance - cache locality
efficient - power usage, locality
scaling
general purpose

Single-Core Scheduling

single core scheduling
⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠ You can decompress Drawing data with the command palette: ‘Decompress current Excalidraw file’. For more info check in plugin settings under ‘Saving’

Excalidraw Data

Text Elements

Thread

FIFO order

runqueue

CFS: Completely Fair Scheduler
Link to original

CFS: Completely Fair Scheduler

Preemptive - timeslice (e.g. a few ms)
weighted - vruntime
Implemented using a red-black tree

Multi-Core Scheduling

one runqueue per core
Multi-core scheduling
⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠ You can decompress Drawing data with the command palette: ‘Decompress current Excalidraw file’. For more info check in plugin settings under ‘Saving’

Excalidraw Data

Text Elements

Core

runqueue
Link to original
Why multiple runqueues / core?
- Avoids contention for 1 queue
  - So runqueue isn’t the bottleneck

Challenges

load imbalance
- expensive to remedy
complex cache hierarchies
hyperthreads (SMT)
threads on core
⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠ You can decompress Drawing data with the command palette: ‘Decompress current Excalidraw file’. For more info check in plugin settings under ‘Saving’

Excalidraw Data

Text Elements

L3 Cache / LLC

Socket, NUMA node

L1 cache

Hyperthreads(SMT)

Physical Core

Other sockets

Other Sockets

L2 cache
Link to original

Load Balancing in Linux

periodic load balancing
“emergency” load balancing - idle core, try to steal
on creation/wakeup
- try to place on an idle core

Metrics

In Linux:
- number of threads
- sum of weights (priorities)
- load - combine weights and average CPU utilization
- groups

Load-Balancing Algorithm

Idea: if we didn’t care about cache locality, we would want to perfectly balance cores, but we do care!
- hierarchical approach
  - 1 core per group balances load
  - prevent need to moving across sockets
  - load bal cores
    ⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠ You can decompress Drawing data with the command palette: ‘Decompress current Excalidraw file’. For more info check in plugin settings under ‘Saving’
    
    Excalidraw Data
    
    Text Elements
    
    Socket
    
    Cores
    
    HyperThreads
    
    bottom up
    Link to original

Group Imbalance Bug

8 NUMA nodes, 8 cores each
64 compilation threads → load 1
2 R threads → load 64 - due to high average
R threads would run on each socket with load of 64, rest of 64 threads would be balanced on rest of NUMA loads. Other 7 cores was just idle
- FIX: rather than average, use minimum instead

Overload on Wakeup Bug

thread wakeups
threads woken on busy local core rather than another idle core
fix: favor idle cores for wakeups

Summary of Wasted Cores

Design of scalable CPU scheduler with per-core runqueues
Approaches to balancing load
CPU scheduling is complex

Recent Writing

Recent Notes

Table of Contents

Lecture 13 - Scheduling

Scheduler Activations

User vs Kernel Thread

Excalidraw Data

Text Elements

Threading Models

Tradeoffs between user-level threads

Scheduler Activations

Scheduler Activations

Excalidraw Data

Text Elements

In Go

Summary of Scheduler Activations

Decade of Wasted Cores

Scheduler Goals

Single-Core Scheduling

single core scheduling

Excalidraw Data

Text Elements

CFS: Completely Fair Scheduler

Multi-Core Scheduling

Multi-core scheduling

Excalidraw Data

Text Elements

Challenges

threads on core

Excalidraw Data

Text Elements

Load Balancing in Linux

Metrics

Load-Balancing Algorithm

load bal cores

Excalidraw Data

Text Elements

Group Imbalance Bug

Overload on Wakeup Bug

Summary of Wasted Cores

Recent Writing

Recent Notes

Graph View

Table of Contents