Lecture 8 - Distributed OSes

interfaces - exposed vs transparent
Centralization vs distribution
implementation

Dist OS
⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠ You can decompress Drawing data with the command palette: ‘Decompress current Excalidraw file’. For more info check in plugin settings under ‘Saving’

Excalidraw Data

Text Elements

Local Network

Diskless workstations

File servers
Link to original

Sprite

Technology Trends

Good for motivating system design and research problems. Eg one metric is scaling but another isn’t

larger memories - increasing DRAM, more storage capacity
rise of networking - interconnected machines
- workstations connected by a network
- sharing
  - finding files(harder compared to a time-shared machine back in the day)
  - idle machines, how to utilize
- management /admin
- transparency - users should not notice that they are running on a distributed platform
Multiprocessors
- parallelism and sharing
- OS → multiprocessors
- they were thinking of how to better support parallelism within the operating system itself

Features of Sprite

Distributed file system
- utilize file caching
  - can do due to larger memories
RPCs
Process migration:
- idle machines utilization
Single global namespace
- used for finding files
Subsystem + Monitors (RCU)
- OS → multiprocessor
Shared Fork
- parallelism and sharing

Caching

Caches file data on both client and server

Sprite OS Caching
⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠ You can decompress Drawing data with the command palette: ‘Decompress current Excalidraw file’. For more info check in plugin settings under ‘Saving’

Excalidraw Data

Text Elements

Client A

Client B

File server

file blocks (4KB)

Inconsistent
Link to original

concurrent sharing
- disable caching
sequential sharing
- version numbers - change file block, increment version number
Sys design - keep common case fast and don’t care as much about non-common cases
backing store - regular files
- everything is represent as a file, helps with caching because everything is same type

Physical Memory Allocation

file buffer cache vs virtual memory
Sprite storage
⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠ You can decompress Drawing data with the command palette: ‘Decompress current Excalidraw file’. For more info check in plugin settings under ‘Saving’

Excalidraw Data

Text Elements

file buffer cache

virtual memory

dynamic split - based on page age
Link to original
Adjust based on requirements

Process Migration

Allow processes to migrate across workstations
Goals: transparency - don’t want it visible to the users
pmake - parallel make tool
Freeze process, then transfer over
Kernel calls:
- Different kernels have different states → different results
- How can we do this transparently?
  - Forwarded back to home machine
- Today - use VM instead because it encapsulates kernel state

Shared Fork

“Processes in shared address space” - threads Unix: share code Sprite: shared_fork → share both code and data (heap)

Why not share stack?
- we want threads to have separate states of execution or diverging states of execution and that requires a different stack per thread
Today
- pthread_create → share code and data

Sprite Summary

file caching protocol
When thinking about research/system design:
- Consider technology trends in system design

LegoOS

Monolithic Servers - CPU + Memory + Storage

Monolithic Server
⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠ You can decompress Drawing data with the command palette: ‘Decompress current Excalidraw file’. For more info check in plugin settings under ‘Saving’

Excalidraw Data

Text Elements

CPU

CPU

CPU

DRAM

DRAM

Memory Bus

PCIe Bus

NIC

GPU

Monolithic Server - “all in one box”
Link to original

Properties:

Cache-coherent interconnects: One CPU write to DRAM is visible to all CPUs
High bandwidth low latency within same server
Network card to connect to data center network

Hardware Disaggregation

Separate all the components inside monolithic server and connect it to data center network individual instead.

HW Disaggregation
⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠ You can decompress Drawing data with the command palette: ‘Decompress current Excalidraw file’. For more info check in plugin settings under ‘Saving’

Excalidraw Data

Text Elements

DRAM

CPU

CPU

CPU

DRAM

Data Center Network
Link to original

organize HW into blades

Benefits

resource elasticity
- Scale up more memory or CPU resources individually/independently
specialized hardware
- Easier to adopt new specialty hardware
more fine-grained failure domains
- improve reliability of system overall
improve resource utilization
- users might over estimate resources, more flexibility
- bin-packing problem - many different jobs and scheduler has to find a place to schedule job
  - how to pack jobs into server as efficiently as possible…
  - result is stranded resources

Performance is a non-goal, due to increase in overhead.

Benefits outweighs overhead
If it’s close enough to Linux monolithic server

Challenges

interconnect:
- 500 Gbps, 50ns latency
data center network: 40 Gbps, 6 us latnecy
today: 100+ GBps, few us latency

LegoOS

Need to implement a significant part of LinuxOS

Split Kernel

pComp, mComp, sComp
monitor - manager for component
LegoOS Split Kernel
⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠ You can decompress Drawing data with the command palette: ‘Decompress current Excalidraw file’. For more info check in plugin settings under ‘Saving’

Excalidraw Data

Text Elements

pComp

mComp

CPUs

Memory

NIC

NIC

NIC

Storage (disks)

Data Center Network

mem

ExCache 4Gb
- needed some mem in pComp for caching to have reasonable performance
In mComp:
- Page Tables
- TLBs
- file buffer cache (key idea: takes up a lot of memory so in mComp) virtual → physical
In pComp:
- virtual addrs
Link to original

2-level memory management

vRegions
vma trees - fine grained memory allocation within vRegions
- just within memory component

Implementation

On commodity hardware, enable and disable different pieces of HW

Evaluation

small working sets - perf. a little worse within 2x on standard server
large working sets - better perf.
- accessing memory over the network is faster than locally

Adoption

disaggregated GPUs
disaggregated storage - blob storage
disaggregated memory - still active area of research
machines are larger, makes bin-packing problem easier

LegoOS Summary

Splitkernel approach → disaggregated OS
Exploring this extreme point in design space
- learning a lot about limits of research and how to backoff and create something valuable

Recent Writing

Recent Notes

Table of Contents

Lecture 8 - Distributed OSes

Dist OS

Excalidraw Data

Text Elements

Sprite

Technology Trends

Features of Sprite

Caching

Sprite OS Caching

Excalidraw Data

Text Elements

Physical Memory Allocation

Sprite storage

Excalidraw Data

Text Elements

Process Migration

Shared Fork

Sprite Summary

LegoOS

Monolithic Server

Excalidraw Data

Text Elements

Hardware Disaggregation

HW Disaggregation

Excalidraw Data

Text Elements

Benefits

Challenges

LegoOS

Split Kernel

LegoOS Split Kernel

Excalidraw Data

Text Elements

2-level memory management

Implementation

Evaluation

Adoption

LegoOS Summary

Recent Writing

Recent Notes

Graph View

Table of Contents