Cloud Models, Cost/Control Trade-Offs & Modern Deployment - Software Engineering Fundamentals

The Cloud Service Model Spectrum

In Lesson 15, we introduced cloud computing and deployment pipelines. Now we go deeper: understanding the spectrum of cloud service models, how to reason about costs, and how to make informed decisions about where your engineering workloads should run.

Cloud services exist on a spectrum from “you manage everything” to “you manage nothing.” Here’s the landscape:

    You Manage Everything                          Provider Manages Everything
    ←————————————————————————————————————→

    On-Premises       IaaS          PaaS        FaaS/Serverless      SaaS
    +-----------+  +-----------+  +-----------+  +-----------+  +-----------+
    | Application|  | Application|  | Application|  | Function  |  |           |
    | Data       |  | Data       |  | Data       |  | Code      |  |           |
    | Runtime    |  | Runtime    |  |           |  |           |  |           |
    | Middleware |  | Middleware |  |           |  |           |  |   Fully   |
    | OS         |  | OS         |  |           |  |           |  |  Managed  |
    | Virtualizn |  |           |  |           |  |           |  |           |
    | Servers    |  |           |  |           |  |           |  |           |
    | Storage    |  |           |  |           |  |           |  |           |
    | Networking |  |           |  |           |  |           |  |           |
    +-----------+  +-----------+  +-----------+  +-----------+  +-----------+
     You manage:     You manage:    You manage:    You manage:    You manage:
     EVERYTHING      App + OS up    App + Data     Function code  Configuration

Each step to the right trades control for convenience. Each step to the left trades convenience for control. There is no universally correct position on this spectrum — the right choice depends on your workload, team, budget, and compliance requirements.

Key insight: The cloud is not a place. It’s someone else’s computer, with someone else’s software managing it. Every layer the provider manages is a layer you don’t control. Understanding this trade-off is fundamental.

IaaS: Infrastructure as a Service

Examples: AWS EC2, Google Compute Engine, Azure Virtual Machines, DigitalOcean Droplets

IaaS gives you virtual machines (VMs) in the cloud. You get raw compute, storage, and networking. Everything from the operating system up is your responsibility.

You Manage	Provider Manages
Application code	Physical servers
Runtime & dependencies	Networking infrastructure
Operating system & patches	Virtualization layer
Security configuration	Power, cooling, physical security
Scaling decisions	Hardware replacement

Best for:

Workloads requiring full OS control (custom kernels, GPU drivers, specific library versions)
Legacy applications that can’t be easily containerized
Engineering simulations requiring specific hardware configurations (e.g., 64-core machines with 512 GB RAM for FEM solvers)
Compliance requirements mandating OS-level security controls

Trade-off: Maximum flexibility, but you’re responsible for patching, scaling, and availability. If your EC2 instance’s underlying hardware fails at 3 AM, AWS will migrate it — but your application needs to handle the restart gracefully.

PaaS: Platform as a Service

Examples: Heroku, AWS Elastic Beanstalk, Google App Engine, Railway, Render

PaaS abstracts away the operating system. You deploy your application code, and the platform handles the runtime, scaling, and infrastructure.

You Manage	Provider Manages
Application code	Operating system
Data & configuration	Runtime & middleware
Deployment configuration	Scaling infrastructure
	Load balancing
	OS patching & security

Best for:

Web applications and APIs where you don’t need OS-level control
Rapid prototyping and MVPs
Small teams without dedicated DevOps engineers
Engineering dashboards, result viewers, and internal tools

Trade-off: Faster deployment and less operational burden, but you’re constrained to the platform’s supported languages, runtimes, and configurations. If your FEM solver requires a specific Fortran compiler with custom flags, PaaS probably won’t work.

FaaS / Serverless: Functions as a Service

Examples: AWS Lambda, Google Cloud Functions, Azure Functions, Cloudflare Workers

Serverless takes abstraction further: you deploy individual functions, and the platform handles everything else. You don’t think about servers at all — you think about events and responses.

Here’s a typical Lambda function for processing uploaded simulation results:

import json
import boto3

s3 = boto3.client("s3")

def handler(event, context):
    """Process a newly uploaded simulation result file."""
    bucket = event["Records"][0]["s3"]["bucket"]["name"]
    key = event["Records"][0]["s3"]["object"]["key"]

    # Download the result file
    response = s3.get_object(Bucket=bucket, Key=key)
    data = json.loads(response["Body"].read().decode("utf-8"))

    # Extract summary statistics
    summary = {
        "max_stress": max(data["stress_values"]),
        "min_stress": min(data["stress_values"]),
        "mean_stress": sum(data["stress_values"]) / len(data["stress_values"]),
        "node_count": len(data["stress_values"]),
        "source_file": key,
    }

    # Store summary in results bucket
    s3.put_object(
        Bucket="simulation-summaries",
        Key=key.replace(".json", "-summary.json"),
        Body=json.dumps(summary),
    )

    return {"statusCode": 200, "body": json.dumps(summary)}

Best for:

Event-driven processing (file uploads, API requests, queue messages)
Lightweight data transformations and validations
Glue logic between services
Infrequent workloads where you don’t want to pay for idle servers

Limitations you must understand:

Constraint	Typical Limit (AWS Lambda)	Impact on Engineering Workloads
Cold start latency	100ms – 10s (language-dependent)	Unacceptable for real-time control systems
Execution time limit	15 minutes	Cannot run FEM simulations (hours/days)
Memory limit	10 GB	Cannot load large mesh files in memory
CPU allocation	Proportional to memory (up to 6 vCPUs)	Not suitable for CPU-intensive solvers
Deployment package size	250 MB (unzipped)	Large scientific libraries may not fit
No persistent local state	512 MB /tmp	Cannot store intermediate results locally

Key insight: Serverless is excellent for orchestration — triggering, routing, and lightweight processing — but it is fundamentally unsuitable for computation-heavy engineering workloads. A common architectural pattern is using Lambda to orchestrate jobs that run on EC2 or Fargate.

SaaS: Software as a Service

Examples: GitHub, Slack, Jira, Salesforce, Google Workspace, ANSYS Cloud

SaaS is the far end of the spectrum: the provider manages everything. You use the software through a web browser or API. You manage only your data and configuration.

For engineering teams, SaaS is increasingly relevant: cloud-based FEM solvers (ANSYS Cloud, SimScale), project management (Jira, Linear), collaboration (Slack, Teams), and version control (GitHub, GitLab). The trade-off is stark: zero operational burden, but you’re completely dependent on the provider’s roadmap, pricing, and availability.

Cost Modeling: Thinking Like a Financial Engineer

Cloud costs are the most frequently underestimated aspect of cloud migration. Engineers who understand cloud cost drivers make better architectural decisions.

Key Cost Drivers

Cost Category	What Drives It	How to Control It
Compute	Instance type, hours running, number of instances	Right-sizing, spot instances, auto-scaling, reserved instances
Storage	Volume (GB), storage class, IOPS requirements	Lifecycle policies, tiered storage, compression
Data Transfer	Data leaving the cloud (egress), cross-region transfer	CDN caching, keeping processing near data, compression
Managed Services	Databases, queues, load balancers, API gateways	Evaluate build-vs-buy for each service

Rough Estimation Example

Scenario: An engineering firm runs 500 FEM simulation jobs per day. Each job requires 16 vCPUs and 64 GB RAM for approximately 2 hours.

Job Requirements:
  - 500 jobs/day × 2 hours/job = 1,000 compute-hours/day
  - Instance type: c6i.4xlarge (16 vCPUs, 32 GB RAM)
    (Actually need 64 GB → r6i.2xlarge: 8 vCPUs, 64 GB. Adjust.)
  - Corrected: r6i.4xlarge (16 vCPUs, 128 GB RAM) — oversized on RAM
  - Better: r6i.2xlarge (8 vCPUs, 64 GB) × 2 per job = 16 vCPUs effective

Cost Comparison (approximate, us-east-1, 2026 pricing):

  Option 1: On-Demand
    $0.504/hr × 2 instances × 2 hrs × 500 jobs = $1,008/day
    Monthly: ~$30,240

  Option 2: Spot Instances (70% discount typical)
    $0.151/hr × 2 instances × 2 hrs × 500 jobs = $302/day
    Monthly: ~$9,060
    Risk: Spot instances can be interrupted. Need checkpointing.

  Option 3: Reserved Instances (1-year, no upfront, ~40% discount)
    Need enough capacity for peak: ~42 concurrent instances
    $0.302/hr × 42 instances × 24 hrs × 30 days = $9,132/month
    But only using them ~12 hrs/day effectively → 50% waste
    Effective monthly: $9,132 (still cheaper than on-demand)

  Option 4: Spot + On-Demand hybrid
    80% spot ($0.151/hr) + 20% on-demand ($0.504/hr) fallback
    Monthly: ~$12,100
    Best reliability/cost balance

Tip: The cheapest cloud is the cloud you don’t use. Before migrating a workload, ask: “Could we reduce the compute requirement by 10x with a better algorithm?” Algorithmic optimization often beats infrastructure optimization.

AI-Augmented Infrastructure Decisions

AI tools can generate infrastructure configurations quickly. But they often lack context about engineering workload characteristics. Here’s a real example of an engineer correcting AI-generated infrastructure:

Prompt to AI: “Design an AWS architecture for running FEM simulation jobs submitted via a web API.”

AI-generated suggestion: “Use AWS Lambda triggered by API Gateway. Store input files in S3. Lambda processes the simulation and stores results back in S3.”

Engineer’s correction:

# AI suggested Lambda — but FEM jobs run for 2+ hours.
# Lambda has a 15-minute timeout. This architecture will fail.

# Corrected architecture:
# 1. API Gateway + Lambda: Receives job submission (lightweight, < 1 sec)
# 2. Lambda writes job metadata to SQS queue
# 3. ECS Fargate (or EC2 Spot) pulls jobs from SQS
# 4. Fargate runs the actual FEM solver (no time limit)
# 5. Results stored in S3
# 6. SNS notification sent when complete

# Why Fargate with Spot:
# - No time limit (unlike Lambda)
# - No server management (unlike raw EC2)
# - Spot pricing (~70% discount)
# - Auto-scales to zero when no jobs
# - Containers can include FEM solver + dependencies

# Why not Lambda:
# - 15-min timeout is a hard limit (cannot be changed)
# - 10 GB memory limit (large meshes need 64+ GB)
# - 6 vCPU max (FEM solvers need 16-32 cores)
# - Cold start latency wastes solver initialization time

Key insight: AI will confidently suggest architectures that violate hard constraints. The engineer’s role is to know the constraints that the AI doesn’t. In this case: FEM jobs cannot run in Lambda because they exceed its time, memory, and CPU limits. This is exactly the kind of judgment that makes software engineering a human discipline.

Exercise 16.1: Cloud Architecture Cost Model

Exercise: Design a cloud architecture and estimate monthly costs for the following scenario:

Scenario: A structural engineering consultancy serves 100 client firms. Each firm submits an average of 10 simulation jobs per day. Each job requires between 8 and 32 vCPUs (varying by complexity) and runs for 1–4 hours. The system must also support 5 concurrent tasks:

Job submission API — accepts job definitions via REST, validates input, stores files
Job queue management — prioritizes jobs, manages scheduling
Simulation execution — runs the actual FEM solver
Result processing — extracts summaries, generates reports
Client dashboard — web application showing job status and results

For each task, decide:

Which cloud service model (IaaS, PaaS, FaaS, SaaS)?
Which specific AWS/GCP/Azure service?
Estimated monthly cost (rough order of magnitude)

Present your answer as a table with columns: Task, Service Model, Specific Service, Justification, and Estimated Monthly Cost.

Stretch goal: Calculate the total monthly cost and compare it against the cost of buying equivalent on-premises hardware (assume a 3-year amortization and 20% annual maintenance cost).

Quiz

Question: Your team runs a machine learning training job for 2 hours every day. The job requires a GPU instance costing $3.06/hour on-demand or $1,500/month for a reserved instance. Which pricing model is more cost-effective?

Reserved instance, because you use it every day.
On-demand, because 2 hours/day is only about 8% utilization.
Spot instance, because ML training can always be interrupted.
Serverless, because the job only runs for 2 hours.

Answer

b) On-demand, because 2 hours/day is only about 8% utilization.

Let’s do the math: On-demand cost = $3.06/hr × 2 hrs/day × 30 days = $183.60/month. Reserved cost = $1,500/month. On-demand is 8x cheaper for this usage pattern. Reserved instances only make sense when utilization is high enough that the discount overcomes the commitment to 24/7 payment. At 2 hours/day (~8% utilization), you’re paying for 22 hours of idle time with a reserved instance. Option (c) is tempting but incorrect because ML training jobs often cannot tolerate interruption mid-epoch without losing progress. Option (d) is wrong because GPU workloads typically exceed serverless constraints.