Databases & Data Management - Software Engineering Fundamentals

Why Databases Are a Design Decision, Not a Given

Choosing a database is an architectural decision with major long-term consequences. Different database technologies make different trade-offs. Using the wrong one for your access patterns leads to performance problems that are extremely hard to fix later.

Relational Databases (SQL)

Structure: Data is organized into tables (relations) with rows and columns. Relationships between tables are expressed via foreign keys. Queries use SQL.

ACID properties (what relational databases guarantee):

Atomicity: A transaction either fully succeeds or fully fails. No partial writes.
Consistency: Every transaction brings the database from one valid state to another.
Isolation: Concurrent transactions behave as if they were sequential.
Durability: Committed data survives crashes.

-- Schema for a structural analysis platform
CREATE TABLE projects (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    owner_user_id INTEGER REFERENCES users(id),
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE analysis_jobs (
    id SERIAL PRIMARY KEY,
    project_id INTEGER REFERENCES projects(id),
    status VARCHAR(50) CHECK (status IN ('queued', 'running', 'complete', 'failed')),
    submitted_at TIMESTAMP DEFAULT NOW(),
    completed_at TIMESTAMP,
    max_displacement DOUBLE PRECISION,
    max_stress DOUBLE PRECISION
);

-- Query: all completed jobs with results for a project
SELECT j.id, j.completed_at, j.max_displacement, j.max_stress
FROM analysis_jobs j
WHERE j.project_id = 42
  AND j.status = 'complete'
ORDER BY j.completed_at DESC
LIMIT 10;

Use relational databases when: You have structured data with well-defined relationships, you need ACID transactions, your queries are complex and join multiple entities. Examples: PostgreSQL, SQLite, MySQL/MariaDB.

NoSQL Databases

Document Databases (MongoDB, Firestore)

Store data as JSON-like documents. Flexible schema — documents in the same collection can have different fields.

// MongoDB document: flexible schema
{
  "_id": "job_abc123",
  "type": "modal_analysis",
  "model": { "nodes": 48000, "elements": 45000, "material": "steel" },
  "results": {
    "frequencies": [2.3, 5.1, 8.7, 12.4],
    "mode_shapes": ["<binary_blob>"],
    "convergence_history": [0.1, 0.05, 0.01, 0.003, 0.0009]
  },
  "metadata": { "solver_version": "2.4.1", "duration_s": 847 }
}

Use when: Schema varies between records, deep nested structures, rapid prototyping.

Time-Series Databases (InfluxDB, TimescaleDB, Prometheus)

Optimized for timestamped numerical data — exactly what sensors produce.

-- TimescaleDB (PostgreSQL extension) hypertable
CREATE TABLE sensor_readings (
    time TIMESTAMPTZ NOT NULL,
    sensor_id INTEGER NOT NULL,
    value DOUBLE PRECISION,
    unit VARCHAR(20)
);

SELECT create_hypertable('sensor_readings', 'time');

-- Automatic chunking by time, fast range queries
SELECT time_bucket('1 minute', time) AS minute,
       sensor_id,
       AVG(value) AS avg_reading
FROM sensor_readings
WHERE time > NOW() - INTERVAL '1 hour'
  AND sensor_id = 42
GROUP BY minute, sensor_id
ORDER BY minute DESC;

Use when: Storing sensor readings, metrics, logs, or any data where time is the primary query dimension.

Key-Value Stores (Redis, DynamoDB)

Extremely fast O(1) lookups by key. Redis lives in RAM — nanosecond reads.

Use when: Caching, session storage, job queue management, feature flags, rate limiting.

Data Modeling for Engineering Workflows

The entity-relationship (ER) approach:

Identify entities (nouns in your domain): Project, User, Model, Job, Result, Sensor
Identify relationships: User owns Projects, Project has many Jobs
Identify cardinalities: One Project → Many Jobs (one-to-many)
Identify attributes for each entity
Normalize to reduce redundancy (3NF for most applications)

Normalization vs. denormalization:

Normalized: Data stored in one place, referenced by key. No redundancy. Queries require joins. Best for write-heavy, consistency-critical data.
Denormalized: Data duplicated for read performance. Faster reads, slower writes, risk of inconsistency. Best for read-heavy analytical queries.

Exercise 14.1: Schema Design for Sensor Data

Exercise: Design a complete database schema for a structural health monitoring (SHM) platform. The system monitors 50 bridges, each with 100–500 sensors.

Requirements:

Store sensor metadata (type, location, installation date, calibration parameters)
Store sensor readings at high frequency (1,000 Hz for some sensors)
Track alert events when readings exceed thresholds
Store processed results (natural frequencies, daily summaries)
Support queries: “all readings from bridge X in the last hour,” “all alert events this week”

Tasks:

Choose the appropriate database technology for each data type. Justify each choice.
Design the table/collection schema for each
Define indexes for the most common query patterns
Estimate data volume for 1 year
Describe your data retention strategy

Quiz

A structural health monitoring team is choosing a database for storing 1,000 readings per second from 300 sensors. They’re considering PostgreSQL (relational) vs. InfluxDB (time-series). Which is more appropriate, and why?

A) PostgreSQL, because ACID compliance ensures no readings are lost
B) InfluxDB, because time-series databases are optimized for timestamped numerical data ingestion and time-range queries
C) Both are equivalent for this use case
D) Neither — only NoSQL document databases can handle this volume

Answer

B) InfluxDB, because time-series databases are optimized for timestamped numerical data ingestion and time-range queries.

Time-series databases are specifically designed for the pattern “insert many timestamped values, query by time range, compute aggregates over time windows.” They use columnar storage, efficient compression, automatic partitioning by time, and built-in downsampling. TimescaleDB (a PostgreSQL extension) is a good middle ground — time-series performance with ACID guarantees.