Building on the Previous Course
The prior course covered process management from an operational perspective (how to run, monitor, and kill processes). This lesson goes deeper: how does the OS actually manage processes and threads, why does concurrency make software hard, and what are the practical implications for engineering software?
The Process Model
Every running program is a process. Each process has:
- Its own virtual address space (it thinks it has all the memory)
- Its own file descriptor table (open files and sockets)
- At least one thread of execution
- A process ID (PID) and a parent process
When a process calls fork(), it creates an identical copy of itself (child process). The child gets new PIDs but copies of all the parent’s memory. This is how web servers like nginx and simulation job schedulers spawn workers.
Threads: Concurrency Within a Process
A thread is a unit of execution within a process. Multiple threads in the same process share memory space — they can read and write the same variables.
Process
├── Thread 1: handles user request A
├── Thread 2: handles user request B
└── Thread 3: handles user request C
(all share the same heap memory)
Why threads: CPU-bound tasks can be parallelized across CPU cores. I/O-bound tasks (waiting for network, disk) can be interleaved so one waiting thread doesn’t block others.
Why threads are hard: Shared memory means race conditions.
# RACE CONDITION: two threads modify the same variable
counter = 0
def increment():
global counter
for _ in range(1_000_000):
counter += 1 # Not atomic: read-increment-write
import threading
t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)
t1.start(); t2.start()
t1.join(); t2.join()
print(counter)
# Expected: 2,000,000. Actual: somewhere between 1M and 2M
The counter += 1 operation is not atomic — it’s three steps (read, add, write) and the OS can context-switch between any two of them.
Concurrency Primitives
Lock / Mutex
Only one thread can hold the lock at a time. Others wait.
import threading
lock = threading.Lock()
counter = 0
def safe_increment():
global counter
for _ in range(1_000_000):
with lock: # Acquire lock; release on exit
counter += 1
# Now counter is always exactly 2,000,000
The downside: Locks serialize execution. If all threads spend most of their time waiting for the same lock, you’ve gained nothing from threading.
Deadlock: Thread A holds Lock 1, waits for Lock 2. Thread B holds Lock 2, waits for Lock 1. Both wait forever.
Semaphore: Like a lock, but allows N concurrent holders (useful for limiting parallel connections, worker pool sizes).
Python’s GIL and Its Engineering Implications
Python has a Global Interpreter Lock (GIL) — only one thread executes Python bytecode at a time. This means Python threads don’t give you parallelism for CPU-bound tasks.
For CPU-bound scientific computing in Python, the solutions:
- NumPy/SciPy: Releases the GIL during numerical operations. You get real parallelism.
multiprocessing: Use processes instead of threads. Each has its own GIL. True parallelism, but higher overhead.concurrent.futures.ProcessPoolExecutor: Higher-level API for process parallelism.- Cython, C extensions: Write the bottleneck in C, release the GIL.
- Python 3.13+: Free-threaded mode (experimental GIL removal).
# CPU-BOUND: use processes, not threads
from concurrent.futures import ProcessPoolExecutor
import numpy as np
def run_simulation(case_params):
# Heavy numerical work — runs in a separate process
result = expensive_fem_solve(case_params)
return result
cases = [{"load": 100}, {"load": 200}, {"load": 300}, {"load": 400}]
with ProcessPoolExecutor(max_workers=4) as executor:
results = list(executor.map(run_simulation, cases))
Async I/O: Concurrency Without Threads
For I/O-bound tasks (HTTP requests, database queries, file reading), async/await is often better than threads. A single thread can manage thousands of concurrent I/O operations.
import asyncio
import aiohttp
async def fetch_sensor_data(session, sensor_id):
url = f"https://api.monitoring.io/sensors/{sensor_id}/latest"
async with session.get(url) as response:
return await response.json()
async def fetch_all_sensors(sensor_ids):
async with aiohttp.ClientSession() as session:
tasks = [fetch_sensor_data(session, sid) for sid in sensor_ids]
return await asyncio.gather(*tasks)
# Fetch 500 sensor readings — all concurrent
results = asyncio.run(fetch_all_sensors(range(500)))
Exercise 12.1: Concurrency Model Analysis
Exercise: For each scenario, choose the appropriate concurrency model (threads, processes, async, none) and justify:
- A Python script runs 50 independent FEM jobs, each taking 30 minutes of CPU time
- A web server handles 1,000 simultaneous HTTP requests, each waiting < 100ms for database responses
- A real-time data logger reads from 200 sensors simultaneously via serial ports (I/O-bound)
- A post-processing script reads a 10GB binary results file and computes statistics
- A dashboard server fetches data from 5 external APIs to compose a response, each API call taking 200ms
For each: name the model, explain the choice, identify the key risk.
Quiz
A Python simulation script uses threading.Thread to parallelize CPU-intensive FEM solves, expecting 4x speedup on a 4-core machine. Actual speedup is ~1.1x. What is the cause?
- A) The threads are running on only one core due to OS scheduling
- B) Python’s GIL prevents true CPU parallelism for CPU-bound threaded code
- C) FEM solves are inherently sequential — they can’t be parallelized
- D) Thread creation overhead cancels out the parallelism benefit
Answer
B) Python’s GIL prevents true CPU parallelism for CPU-bound threaded code.
The GIL allows only one thread to execute Python bytecode at a time, regardless of how many CPU cores are available. For CPU-bound tasks, Python threads don’t give parallelism — they time-slice on a single core. The fix is multiprocessing.Pool or ProcessPoolExecutor, which creates separate processes, each with its own GIL, achieving true parallelism.