CI/CD Quest

Automate Everything with GitHub Actions

Tracks

101

Lessons

Quiz Questions

🏗️ Track 1: CI/CD Foundations

Lesson 1: What CI Is

Continuous Integration (CI) is the practice of frequently merging code changes into a shared repository, where each merge triggers an automated build and test sequence. The core idea is simple: instead of developers working in isolation for weeks and then attempting a painful "big bang" merge, everyone integrates their work early and often — ideally multiple times per day.

The term was popularized by Kent Beck as part of Extreme Programming (XP) in the late 1990s, but the principle is timeless: the longer you wait to integrate code, the more painful and error-prone it becomes. CI eliminates "integration hell" by making integration a non-event.

In a CI workflow, when a developer pushes code to a branch or opens a pull request, an automated system checks out the code, installs dependencies, compiles (if applicable), and runs the full test suite. If anything fails, the team is notified immediately — usually within minutes. This tight feedback loop is what makes CI transformative.

Key principles of CI include: maintaining a single source repository, automating the build, making the build self-testing, keeping the build fast (under 10 minutes is ideal), and fixing broken builds immediately. CI is not just a tool — it's a team discipline.

Tip: CI is a practice first, a tool second. You can have the fanciest CI server in the world, but if your team doesn't commit frequently or fix broken builds immediately, you're not doing CI.

🏗️ Track 1: CI/CD Foundations

Lesson 2: What CD Is

CD stands for two related but distinct practices: Continuous Delivery and Continuous Deployment. Both build on CI, but they differ in the final step.

Continuous Delivery means that every change that passes all stages of your production pipeline is ready to be deployed to production. The deployment itself requires a manual approval step — someone clicks a button or approves a release. The key insight is that the software is always in a deployable state.

Continuous Deployment goes one step further: every change that passes all tests is automatically deployed to production with zero human intervention. There is no manual gate. If the tests pass, it ships. Companies like Netflix, GitHub, and Etsy practice continuous deployment, shipping hundreds of times per day.

Aspect	Continuous Delivery	Continuous Deployment
Auto-deploy to production?	No — manual approval	Yes — fully automatic
Manual step required?	Yes — approve/trigger deploy	No
Risk level	Lower — human checkpoint	Requires excellent test coverage
Best for	Regulated industries, early teams	Mature teams with strong testing
Deployment frequency	On-demand (daily/weekly)	Every passing commit

Most teams start with Continuous Delivery and graduate to Continuous Deployment as their test suites mature and confidence grows. The important thing is that both approaches require a robust CI pipeline as their foundation.

Tip: Don't stress about the "Delivery vs Deployment" distinction too much. The industry often uses "CD" to mean either. What matters is that your pipeline can take code from commit to production reliably and repeatably.

🏗️ Track 1: CI/CD Foundations

Lesson 3: The Old Way

Before CI/CD, software teams followed a painful process that often looked like this: developers would work on features in isolation for days or weeks, maybe on long-lived feature branches. When it was time to release, someone would declare a "merge day" or "integration week" where everyone tried to combine their changes. Chaos ensued.

Manual testing was the norm. A QA team would receive a build, then spend days or weeks clicking through the application, following test scripts on paper or spreadsheets. Bugs found during this phase meant sending the code back to developers, who had already context-switched to other work. The feedback loop could be weeks long.

Deployment was a terrifying, all-hands-on-deck affair. Teams would schedule "deployment windows" — often Friday nights or weekends — where senior engineers would SSH into production servers, manually copy files, run database migrations by hand, and pray. Rollback meant restoring from a backup taken hours earlier (if someone remembered to take one).

The "it works on my machine" problem was endemic. Developers ran different OS versions, different dependency versions, and different configurations. Code that worked perfectly in development would crash in production because of environmental differences that nobody tracked.

Warning: If your current process involves any of these patterns — manual testing before every release, fear of deploying, "merge days," or "it works on my machine" excuses — you have a CI/CD problem. The good news: it's very solvable.

🏗️ Track 1: CI/CD Foundations

Lesson 4: The CI/CD Promise

CI/CD transforms software delivery from a stressful, error-prone ordeal into a boring, predictable routine — and "boring" is exactly what you want in production deployments. The promise is straightforward: every push is tested, every merge is safe, and deployment is just another step in the pipeline.

With a mature CI/CD pipeline, a developer pushes code, automated tests run within minutes, and if everything passes, the code is either ready to deploy or automatically deployed. The entire team can see the status of every build. There's no ambiguity about whether the code works — the pipeline tells you.

The business benefits are significant. Teams with mature CI/CD practices deploy 208x more frequently than those without, have 106x faster lead time from commit to deploy, recover from incidents 2,604x faster, and have a 7x lower change failure rate (from the DORA metrics research). These aren't marginal improvements — they're orders of magnitude.

CI/CD also changes team culture for the better. When deployment is safe and easy, teams ship smaller changes more frequently. Smaller changes are easier to review, easier to test, easier to debug, and easier to roll back. It creates a virtuous cycle of quality and velocity.

Tip: The goal of CI/CD isn't to move fast and break things. It's to move fast and not break things — because you have automated safety nets catching problems before they reach users.

🏗️ Track 1: CI/CD Foundations

Lesson 5: The Feedback Loop

The feedback loop is the heartbeat of CI/CD. When a developer pushes code, they need to know — as quickly as possible — whether that code is correct. The shorter the feedback loop, the cheaper and easier it is to fix problems. A bug caught in 5 minutes costs almost nothing to fix. The same bug caught in 5 weeks costs exponentially more.

In a well-optimized CI pipeline, the feedback loop looks like this: a developer pushes code and within 1-2 minutes, linting and formatting checks complete. Within 3-5 minutes, unit tests finish. Within 5-10 minutes, integration tests pass. The developer gets a green checkmark (or a red X with specific failure details) before they've even context-switched to another task.

Fast feedback loops change developer behavior. When tests run in 3 minutes, developers run them on every commit. When tests take 45 minutes, developers batch up changes and test less frequently — which means bugs accumulate and are harder to isolate. Speed isn't just a convenience; it directly impacts code quality.

The feedback loop extends beyond just tests. Code review is faster when PRs are small and the CI status is visible. Deployment feedback (monitoring, error rates, performance metrics) closes the loop on whether the code works correctly in production. The goal is to shrink every feedback loop in the development process.

Tip: If your CI pipeline takes more than 10 minutes, treat that as a problem to solve. Use caching, parallelism, and selective testing to keep feedback fast. Every minute you shave off the pipeline pays dividends in developer productivity and code quality.

🏗️ Track 1: CI/CD Foundations

Lesson 6: Pipeline Anatomy

A CI/CD pipeline is a series of automated stages that code passes through from commit to production. While every team's pipeline is unique, most follow a common pattern: trigger → checkout → build → test → deploy. Understanding this anatomy helps you design effective pipelines.

The trigger starts the pipeline. This could be a push to a branch, a pull request being opened, a scheduled time (cron), or a manual trigger. The trigger determines when and why the pipeline runs.

The checkout stage clones your repository onto the CI runner — a fresh, clean machine. This ensures every build starts from a known state, eliminating "works on my machine" problems. The build stage installs dependencies, compiles code (if applicable), and prepares the application.

The test stage is where CI earns its keep. This typically includes unit tests, integration tests, linting, type checking, and security scanning. Tests run in parallel where possible to keep the pipeline fast. If any test fails, the pipeline stops and reports the failure.

The deploy stage (the CD part) takes a validated build and pushes it to an environment — staging for review, or production for users. This might involve building a Docker image, uploading to a cloud provider, or syncing files to a server.

# A typical pipeline flow:
# 
#   Push Code
#       │
#       ▼
#   ┌──────────┐
#   │ Checkout  │  Clone repo to clean runner
#   └────┬─────┘
#        │
#        ▼
#   ┌──────────┐
#   │  Build   │  Install deps, compile
#   └────┬─────┘
#        │
#        ▼
#   ┌──────────────────────────────┐
#   │          Test (parallel)     │
#   │  ┌──────┐ ┌──────┐ ┌─────┐ │
#   │  │ Lint │ │ Unit │ │ E2E │ │
#   │  └──────┘ └──────┘ └─────┘ │
#   └────────────┬─────────────────┘
#                │
#                ▼
#   ┌──────────┐
#   │  Deploy  │  Push to staging/production
#   └──────────┘

Tip: Not every pipeline needs every stage. Start simple — even just running tests on every push gives you enormous value. You can add stages incrementally as your needs grow.

🏗️ Track 1: CI/CD Foundations

Lesson 7: Git as the Foundation

CI/CD is deeply intertwined with Git. Every modern CI/CD system is triggered by Git events: pushes, pull requests, tags, and merges. Understanding how Git workflows feed into CI/CD is essential for designing effective pipelines.

The most common workflow is GitHub Flow: create a feature branch from main, make commits, open a pull request, CI runs automatically on the PR, get code review, merge to main, CD deploys. This simple workflow works for most teams and maps naturally to GitHub Actions.

Branches are your CI/CD entry points. You'll configure different pipeline behaviors for different branches: PRs might run tests and linting, pushes to main might run tests plus deploy to staging, and tagged releases might deploy to production. Branch protection rules ensure that code can't be merged without passing CI.

Pull requests are where CI/CD visibility shines. When you open a PR on GitHub, Actions automatically runs your workflows and reports the status directly on the PR page. Reviewers can see at a glance whether the code passes all checks. You can even require CI to pass before merging — this is called a "status check" requirement.

# Typical Git + CI/CD workflow
git checkout -b feature/add-login
# ... make changes ...
git add .
git commit -m "feat: add login page"
git push origin feature/add-login
# → CI runs automatically on push
# → Open PR on GitHub
# → CI runs again on PR
# → Reviewers see green checks
# → Merge to main
# → CD deploys to staging/production

Tip: Enable branch protection on your main branch early. Require status checks to pass and at least one review before merging. This single setting prevents most "broken main" incidents.

🏗️ Track 1: CI/CD Foundations

Lesson 8: Green Builds

A "green build" means the CI pipeline has passed — all tests, linting, and checks succeeded. Maintaining a green main branch is perhaps the most important team discipline in CI/CD. When main is green, any team member can confidently branch from it, deploy from it, or base a hotfix on it.

The rule is simple: never merge a failing build. If CI reports a failure on a pull request, the code doesn't get merged until it's fixed. No exceptions, no "I'll fix it later," no "it's just a flaky test." This discipline is what separates teams that successfully practice CI from those that just have a CI server running in the background.

When a build does break on main (it happens to everyone), fixing it becomes the team's top priority. Drop what you're doing and fix the build. The longer main stays broken, the more it blocks other developers and the harder the fix becomes as more changes pile on top.

Flaky tests — tests that sometimes pass and sometimes fail with the same code — are the enemy of green builds. They erode trust in the pipeline. When developers see random failures, they start ignoring CI results, which defeats the entire purpose. Invest time in eliminating flaky tests. Quarantine them if needed, but don't let them pollute your signal.

Warning: A CI pipeline that's ignored is worse than no CI at all. It creates a false sense of security. If your team routinely merges despite failing checks, you need to stop and fix the culture before adding more tools.

🏗️ Track 1: CI/CD Foundations

Lesson 9: The CI/CD Landscape

The CI/CD tool market is crowded, with options ranging from fully managed cloud services to self-hosted open-source solutions. Here's a comparison of the major platforms as of 2026:

Platform	Type	Config	Free Tier	Best For
GitHub Actions	Cloud (GitHub-native)	YAML	2,000 min/month (private), unlimited (public)	GitHub-hosted projects, open source
GitLab CI	Cloud + Self-hosted	YAML	400 min/month	GitLab users, all-in-one DevOps
CircleCI	Cloud	YAML	6,000 min/month	Complex pipelines, Docker-native
Jenkins	Self-hosted	Groovy/Declarative	Free (open source)	Maximum customization, legacy
Azure DevOps	Cloud + Self-hosted	YAML	1,800 min/month	Microsoft/.NET ecosystem
Travis CI	Cloud	YAML	Limited	Open source (declining popularity)
Buildkite	Hybrid	YAML	Free (self-hosted agents)	Large teams, self-hosted runners

The trend in 2026 is clear: YAML-based configuration, cloud-hosted runners, and deep integration with source control platforms. Jenkins, once the dominant player, has been largely replaced by cloud-native solutions for new projects, though it remains widely used in enterprises with existing investments.

GitHub Actions has become the default choice for projects hosted on GitHub, which is the vast majority of open-source and many commercial projects. Its native integration with GitHub's UI, pull requests, and ecosystem gives it a significant advantage over third-party CI services.

Tip: If your code is on GitHub, start with GitHub Actions. It's the path of least resistance and the free tier is generous. You can always migrate later if your needs outgrow it.

🏗️ Track 1: CI/CD Foundations

Lesson 10: Why GitHub Actions

GitHub Actions has become the dominant CI/CD platform for good reasons. It's not just another CI tool bolted onto GitHub — it's deeply integrated into the platform in ways that third-party tools can't match.

Native integration is the killer feature. CI status appears directly on pull requests. Workflow files live in your repository. Branch protection rules can require Actions checks. Issues, PRs, releases, and deployments all trigger workflows natively. There's no webhook configuration, no external service to manage, no separate UI to check.

The marketplace offers over 15,000 pre-built actions for everything from deploying to AWS to sending Slack notifications. Instead of writing bash scripts for common tasks, you reference a community action with a single line of YAML. Verified publishers and community ratings help you choose quality actions.

The free tier is generous: unlimited minutes for public repositories, and 2,000 minutes per month for private repos on the free plan (3,000 for Pro/Team, 50,000 for Enterprise). For open-source projects, this means completely free CI/CD with no restrictions.

Other advantages include: YAML-based configuration (easy to read and version-control), matrix builds for testing across multiple OS/language versions, built-in secret management, environment protection rules, and OIDC for secure cloud deployments without storing credentials.

Tip: GitHub Actions' biggest advantage is reducing context-switching. Everything — code, CI, deployments, issues, reviews — lives in one place. This sounds minor but saves significant mental overhead across a team.

🏗️ Track 1: CI/CD Foundations

Lesson 11: GitHub Actions Cost

Understanding GitHub Actions pricing helps you optimize your pipeline budget. The good news: for many teams, CI/CD on GitHub Actions is effectively free.

Public repositories get unlimited free minutes on GitHub-hosted runners. This makes GitHub Actions the best deal in CI/CD for open-source projects — zero cost, no limits.

For private repositories, pricing depends on your GitHub plan:

Plan	Included Minutes/Month	Storage
Free	2,000	500 MB
Pro / Team	3,000	1 GB
Enterprise	50,000	50 GB

Minutes are counted differently by runner OS. Linux minutes count at 1x, Windows at 2x, and macOS at 10x. So 2,000 free minutes means 2,000 Linux minutes or 1,000 Windows minutes or 200 macOS minutes.

In January 2026, GitHub reduced runner prices by up to 39%, making Actions even more cost-effective. Overage pricing (after free minutes are exhausted) starts at $0.008/min for Linux. As of March 2026, self-hosted runners now incur a $0.002/min platform charge — previously, self-hosted runners had zero per-minute cost.

Runner OS	Per-Minute Rate (Overage)	Minute Multiplier
Linux (x64)	$0.008	1x
Windows	$0.016	2x
macOS	$0.080	10x
Self-hosted (platform charge, March 2026)	$0.002	—

Tip: To minimize costs: use Linux runners when possible (cheapest), cache dependencies aggressively, use path filters to skip unnecessary runs, and cancel in-progress runs when a new push arrives. Most small-to-medium teams never exceed the free tier.

🏗️ Track 1: CI/CD Foundations

Lesson 12: When CI/CD Is Overkill

Is CI/CD ever overkill? Let's consider the scenarios where you might think it's unnecessary: you're working solo, the project is tiny, or it's just a quick prototype. Even in these cases, the answer is almost always: set up CI/CD anyway.

A solo developer with a 10-line script might not need a full deployment pipeline. But even then, a simple CI workflow that runs your tests on push takes 5 minutes to set up and catches the embarrassing mistakes you make at 2 AM. The cost is near zero; the benefit is real.

The only scenarios where CI/CD is genuinely overkill are: one-off scripts you'll never touch again, pure documentation repositories with no build step, or learning/playground repos where you're just experimenting. Even documentation repos benefit from link-checking and spell-checking in CI.

Here's the truth: the "setup cost" of CI/CD has dropped to near zero with GitHub Actions. A basic workflow is 15 lines of YAML. Templates and starter workflows mean you can have CI running in under 5 minutes. The question isn't "is my project big enough for CI/CD?" — it's "why wouldn't I add it?"

# The simplest useful CI workflow - took 30 seconds to write
name: CI
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install -r requirements.txt
      - run: pytest

Tip: Start with the simplest possible pipeline: run your tests on push. You can always add more stages later. The hardest part of CI/CD is going from zero to one — after that, incremental improvements are easy.

🏗️ Track 1: CI/CD Foundations

Track 1 Quiz

Test your knowledge — 5 questions, +25 XP each correct answer

Q1. What is the main difference between Continuous Delivery and Continuous Deployment?

Continuous Delivery means code is always ready to deploy but requires a manual approval step. Continuous Deployment automatically deploys every change that passes tests.

Q2. How many free CI minutes per month do public GitHub repositories get?

Public repositories on GitHub get unlimited free Actions minutes. The 2,000 min/month limit applies to private repos on the free plan.

Q3. What is the ideal CI pipeline execution time?

Under 10 minutes is the ideal target for a CI pipeline. Fast feedback loops encourage developers to commit and test frequently.

Q4. Which minute multiplier applies to macOS runners on GitHub Actions?

macOS runners consume minutes at 10x the rate of Linux runners. A 10-minute macOS job uses 100 of your included minutes.

Q5. What should a team do when the main branch build breaks?

A broken main branch blocks the entire team. Fixing it should be the top priority to maintain the CI discipline and keep everyone productive.

⚙️ Track 2: GitHub Actions — Core Concepts

Lesson 1: Workflows

A workflow is an automated process defined in a YAML file that lives in your repository's .github/workflows/ directory. Each YAML file in this directory represents one workflow. GitHub automatically detects and registers these files — there's no UI configuration needed.

Workflows are the top-level organizational unit in GitHub Actions. A single repository can have multiple workflows, each triggered by different events and performing different tasks. For example, you might have one workflow for CI (running tests), another for CD (deploying), and another for maintenance tasks (dependency updates, stale issue cleanup).

Every workflow file must define at least two things: when it should run (the on trigger) and what it should do (the jobs section). The name field is optional but highly recommended — it appears in the GitHub Actions UI and makes it much easier to identify workflows at a glance.

# .github/workflows/ci.yml
name: CI Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: echo "Running tests..."
      - run: npm test

Workflow files support all YAML features including comments, anchors, and multi-line strings. They're version-controlled alongside your code, which means changes to your CI/CD pipeline go through the same review process as code changes — a huge advantage over UI-configured CI systems.

Tip: Keep your workflow files focused. Instead of one massive workflow that does everything, create separate workflows for CI, deployment, and maintenance. This makes them easier to understand, debug, and modify independently.

⚙️ Track 2: GitHub Actions — Core Concepts

Lesson 2: Events (Triggers)

Events are what cause a workflow to run. GitHub Actions supports dozens of event types, covering almost everything that can happen in a GitHub repository. The on key in your workflow file specifies which events trigger the workflow.

The most common events are push (code pushed to a branch), pull_request (PR opened, updated, or closed), and workflow_dispatch (manual trigger from the UI). But Actions can respond to issues being opened, releases being published, schedules (cron), labels being applied, and many more.

You can combine multiple events in a single workflow and filter them by branch, path, or type. For example, you might want tests to run on pushes to main and on all pull requests, but deployment only on published releases.

# Multiple event triggers with filters
on:
  push:
    branches: [main, develop]
    paths:
      - 'src/**'
      - 'tests/**'
  pull_request:
    branches: [main]
    types: [opened, synchronize, reopened]
  schedule:
    - cron: '0 6 * * 1'  # Every Monday at 6 AM UTC
  workflow_dispatch:       # Manual trigger button in UI
  release:
    types: [published]

Event types let you be even more specific. For pull_request, you can trigger on opened, synchronize (new commits pushed), closed, labeled, review_requested, and more. This granularity lets you build sophisticated automation without running unnecessary workflows.

Tip: Use paths filters to avoid running CI when only documentation or non-code files change. This saves minutes and keeps your pipeline fast. For example, changes to README.md probably don't need to trigger your full test suite.

⚙️ Track 2: GitHub Actions — Core Concepts

Lesson 3: Jobs

Jobs are the main organizational units within a workflow. Each job runs on a separate runner (virtual machine) and consists of a series of steps. By default, jobs run in parallel — if you have three jobs, all three start simultaneously on separate runners.

Every job must specify a runs-on value that determines which type of runner executes it. You can also give jobs a human-readable name that appears in the GitHub UI. Jobs can depend on other jobs using the needs keyword, creating sequential execution chains.

Jobs are isolated from each other. They run on separate machines, so they don't share file systems, environment variables, or state. If you need to pass data between jobs, you use artifacts or job outputs. This isolation is a feature — it means jobs can't accidentally interfere with each other.

jobs:
  lint:
    name: Lint Code
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm run lint

  test:
    name: Run Tests
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm test

  deploy:
    name: Deploy to Production
    runs-on: ubuntu-latest
    needs: [lint, test]  # Only runs after lint AND test pass
    if: github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4
      - run: ./deploy.sh

In this example, lint and test run in parallel (no dependencies), and deploy only runs after both succeed. The if condition further restricts deployment to pushes to main.

Tip: Design your jobs for parallelism. Running lint, unit tests, and integration tests as separate parallel jobs significantly reduces total pipeline time compared to running them sequentially in one job.

⚙️ Track 2: GitHub Actions — Core Concepts

Lesson 4: Steps

Steps are the individual tasks within a job. They execute sequentially — step 2 doesn't start until step 1 finishes. Each step either runs a shell command (using run) or uses a pre-built action (using uses). Steps share a file system within the same job, so files created by one step are available to subsequent steps.

Steps can have an optional name that appears in the workflow logs, making it much easier to identify what each step does. Without names, steps are labeled by their command or action, which can be cryptic.

Each step runs in a fresh shell by default (bash on Linux/macOS, PowerShell on Windows). Environment variables set with export in one run step don't persist to the next — use the env key or the $GITHUB_ENV file to share environment variables between steps.

steps:
  # Step 1: Use a pre-built action
  - name: Checkout code
    uses: actions/checkout@v4

  # Step 2: Use an action with configuration
  - name: Setup Node.js
    uses: actions/setup-node@v4
    with:
      node-version: '20'
      cache: 'npm'

  # Step 3: Run a shell command
  - name: Install dependencies
    run: npm ci

  # Step 4: Multi-line shell commands
  - name: Run tests and report
    run: |
      echo "Running test suite..."
      npm test -- --coverage
      echo "Tests complete!"

  # Step 5: Step with environment variable
  - name: Deploy
    run: ./deploy.sh
    env:
      API_KEY: ${{ secrets.API_KEY }}
      DEPLOY_ENV: production

The uses keyword references actions by their GitHub repository path and version tag (e.g., actions/checkout@v4). The with keyword passes input parameters to actions. The env keyword sets environment variables for that specific step.

Tip: Always name your steps. When a workflow fails, named steps make it immediately obvious which step broke. "Install dependencies" is much more helpful than "Run npm ci" in failure logs.

⚙️ Track 2: GitHub Actions — Core Concepts

Lesson 5: Runners

Runners are the machines that execute your workflow jobs. GitHub provides hosted runners (managed virtual machines) and you can also set up self-hosted runners on your own hardware. Most teams use GitHub-hosted runners for standard workloads.

GitHub-hosted runners come in several flavors:

Runner Label	OS	CPU	RAM	Storage	Architecture
`ubuntu-latest`	Ubuntu Linux	4 vCPU	16 GB	14 GB SSD	x64
`windows-latest`	Windows Server	4 vCPU	16 GB	14 GB SSD	x64
`macos-13`	macOS (Intel)	4 vCPU	14 GB	14 GB SSD	x64
`macos-14` / `macos-15`	macOS (ARM)	3 M1 CPU	7 GB	14 GB SSD	ARM64
`ubuntu-24.04-arm` (preview)	Ubuntu Linux	4 vCPU	16 GB	14 GB SSD	ARM64
`windows-11-arm` (preview)	Windows 11	4 vCPU	16 GB	14 GB SSD	ARM64

Each runner starts fresh for every job — it's a brand-new virtual machine with no leftover state from previous jobs. Pre-installed software includes common languages (Python, Node.js, Go, Java, .NET), tools (Docker, git, curl, jq), and package managers. You can see the full list of pre-installed software in the runner images repository.

Self-hosted runners run on your own infrastructure. You install the runner application, connect it to your GitHub organization or repository, and jobs can be routed to it via custom labels. Self-hosted runners are useful when you need specialized hardware (GPUs), network access to internal resources, or specific OS/architecture configurations.

Warning: As of March 2026, self-hosted runners now incur a $0.002/min platform charge. While much cheaper than GitHub-hosted runners, they're no longer completely free to operate through GitHub.

⚙️ Track 2: GitHub Actions — Core Concepts

Lesson 6: Actions

Actions are reusable units of code that perform a specific task in your workflow. Instead of writing bash scripts for common operations, you reference an action with the uses keyword. Actions encapsulate complex logic behind a simple interface with well-defined inputs and outputs.

GitHub maintains a set of official actions that form the backbone of most workflows:

Action	Purpose
`actions/checkout@v4`	Clone your repository onto the runner
`actions/setup-python@v5`	Install and configure Python
`actions/setup-node@v4`	Install and configure Node.js
`actions/cache@v4`	Cache dependencies between runs
`actions/upload-artifact@v4`	Upload files from a job
`actions/download-artifact@v4`	Download artifacts in a later job

Actions are referenced by their GitHub repository, optionally an organization, and a version tag. You should always pin to a specific version (like @v4) rather than using @main, because the author could push breaking changes to the default branch at any time.

steps:
  # Official GitHub action
  - uses: actions/checkout@v4

  # Third-party action from the marketplace
  - uses: peaceiris/actions-gh-pages@v4
    with:
      github_token: ${{ secrets.GITHUB_TOKEN }}
      publish_dir: ./build

  # Action from a specific commit SHA (most secure)
  - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683

Tip: For maximum security, pin actions to their full commit SHA rather than a version tag. Tags can be moved, but SHAs are immutable. This prevents supply-chain attacks where a compromised action pushes malicious code under an existing tag.

⚙️ Track 2: GitHub Actions — Core Concepts

Lesson 7: The Marketplace

The GitHub Actions Marketplace hosts over 15,000 community-built actions covering everything from cloud deployments to Slack notifications to code quality checks. Before writing custom bash scripts, check the marketplace — someone has probably already built what you need.

Finding actions is straightforward: visit github.com/marketplace and search by keyword. Each action has a listing page showing its description, usage examples, inputs/outputs, version history, and community ratings.

Not all marketplace actions are created equal. Here's how to evaluate them:

Signal	Good Sign	Red Flag
Publisher	Verified publisher badge ✓	Unknown publisher, no org
Stars	Hundreds or thousands	Single digits
Maintenance	Recent commits, active issues	No updates in 12+ months
Security	Minimal permissions, pinned deps	Requests broad permissions
Documentation	Clear README, examples, changelog	Sparse or missing docs

Some categories of popular marketplace actions include: cloud deployment (AWS, GCP, Azure), container tools (Docker build/push), code quality (linting, testing, coverage), notifications (Slack, Discord, email), release management (changelog, version bumping), and security scanning (CodeQL, Snyk, Trivy).

Warning: Third-party actions execute code on your runner with access to your repository and secrets. Always review the source code of actions before using them, especially those from unknown publishers. Pin to commit SHAs for critical workflows.

⚙️ Track 2: GitHub Actions — Core Concepts

Lesson 8: Workflow Syntax

Let's break down every major section of a GitHub Actions workflow file. Understanding the syntax is essential because YAML is sensitive to indentation and structure — a misplaced space can break your entire workflow.

Here's a fully annotated workflow demonstrating the key syntax elements:

# The workflow name (appears in GitHub UI)
name: Complete CI/CD Pipeline

# When this workflow runs
on:
  push:
    branches: [main]          # Only on pushes to main
  pull_request:
    branches: [main]          # PRs targeting main
  workflow_dispatch:           # Manual trigger from UI

# Environment variables available to ALL jobs
env:
  NODE_ENV: production
  APP_NAME: my-app

# The jobs that make up this workflow
jobs:
  # Job ID (used in `needs` references)
  build-and-test:
    # Human-readable name
    name: Build & Test
    # Runner selection
    runs-on: ubuntu-latest
    # Job-level environment variables
    env:
      CI: true
    # Steps execute sequentially
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Setup Node
        uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install
        run: npm ci

      - name: Test
        run: npm test

  deploy:
    name: Deploy
    runs-on: ubuntu-latest
    needs: build-and-test         # Depends on previous job
    if: github.ref == 'refs/heads/main'  # Only on main
    environment: production       # Uses environment protection
    steps:
      - uses: actions/checkout@v4
      - run: ./deploy.sh
        env:
          DEPLOY_KEY: ${{ secrets.DEPLOY_KEY }}

The hierarchy is: Workflow → Jobs → Steps. The on section defines triggers, env sets variables (at workflow, job, or step level), jobs contains the work, and each job has steps that execute commands or actions.

Tip: Use a YAML linter or the actionlint tool to validate your workflow files before committing. YAML indentation errors are the #1 cause of workflow failures for beginners.

⚙️ Track 2: GitHub Actions — Core Concepts

Lesson 9: Your First Workflow

Let's create your first GitHub Actions workflow from scratch. This is the "Hello World" of CI/CD — a minimal workflow that runs on every push and pull request, demonstrating all the core concepts.

Step 1: In your repository, create the workflows directory:

mkdir -p .github/workflows

Step 2: Create a file called .github/workflows/ci.yml:

name: My First CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  hello-ci:
    name: Hello CI
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Say hello
        run: echo "Hello CI! 🎉 This is my first workflow."

      - name: Show environment info
        run: |
          echo "Runner OS: $RUNNER_OS"
          echo "GitHub Event: $GITHUB_EVENT_NAME"
          echo "Branch: $GITHUB_REF_NAME"
          echo "Commit: $GITHUB_SHA"
          echo "Repository: $GITHUB_REPOSITORY"

      - name: List files
        run: |
          echo "Repository contents:"
          ls -la
          echo ""
          echo "Working directory: $(pwd)"

Step 3: Commit and push:

git add .github/workflows/ci.yml
git commit -m "ci: add first workflow"
git push origin main

Step 4: Go to the "Actions" tab in your GitHub repository. You'll see your workflow running (or completed). Click on it to see the logs for each step. You should see your "Hello CI!" message, environment information, and file listing.

Congratulations — you've just set up continuous integration. Every push to main and every pull request will now trigger this workflow. In the next lessons, we'll replace the "echo" commands with real build and test steps.

Tip: Bookmark the Actions tab in your repository — you'll be visiting it frequently. You can also get email or mobile notifications for workflow failures by configuring your GitHub notification settings.

⚙️ Track 2: GitHub Actions — Core Concepts

Lesson 10: Triggering Events Deep Dive

Now that you understand the basics, let's explore the full range of workflow triggers and how to configure them precisely.

Push with branch and path filters:

# Only run when Python files in src/ change on main or develop
on:
  push:
    branches:
      - main
      - develop
      - 'release/**'     # Glob pattern: release/1.0, release/2.0, etc.
    paths:
      - 'src/**/*.py'
      - 'tests/**/*.py'
      - 'requirements*.txt'
    paths-ignore:
      - 'docs/**'
      - '*.md'

Pull request with activity types:

# Run on specific PR activities
on:
  pull_request:
    types: [opened, synchronize, reopened, labeled]
    branches: [main]

Scheduled (cron) triggers:

# Run on a schedule (UTC timezone)
on:
  schedule:
    - cron: '0 6 * * 1-5'   # 6 AM UTC, Monday-Friday
    - cron: '0 0 1 * *'     # Midnight UTC, 1st of each month

Manual dispatch with inputs:

# Manual trigger with user-provided parameters
on:
  workflow_dispatch:
    inputs:
      environment:
        description: 'Deployment environment'
        required: true
        default: 'staging'
        type: choice
        options: [staging, production]
      dry_run:
        description: 'Dry run (no actual deploy)?'
        type: boolean
        default: true

Workflow run (chain workflows):

# Run after another workflow completes
on:
  workflow_run:
    workflows: ["CI Pipeline"]
    types: [completed]
    branches: [main]

Tip: The paths filter is one of the most impactful optimizations. In a monorepo, you can ensure that changes to the frontend only trigger frontend tests, and backend changes only trigger backend tests. This can cut your CI costs and time dramatically.

⚙️ Track 2: GitHub Actions — Core Concepts

Lesson 11: Workflow Runs

Once a workflow is triggered, it creates a "workflow run" — a specific execution with its own logs, status, and artifacts. Understanding how to read and manage workflow runs is essential for debugging and monitoring your CI/CD pipeline.

In the Actions tab of your repository, you see a list of all workflow runs. Each shows the workflow name, the triggering event (push, PR, schedule), the branch, the commit message, and a status indicator (green check, red X, yellow circle for in-progress, or gray circle for skipped).

Clicking a workflow run shows you the job graph — which jobs ran, their status, and their dependencies. Clicking a specific job shows you each step's logs with timestamps. Failed steps are highlighted in red, and you can expand them to see the exact error message.

Re-running failed jobs: If a job fails due to a transient error (network timeout, flaky test), you can re-run just the failed jobs without re-running the entire workflow. Click "Re-run failed jobs" in the workflow run page. You can also re-run all jobs or a specific job.

Debugging tips:

# Enable debug logging for a re-run:
# Go to the workflow run → "Re-run all jobs" → 
# Check "Enable debug logging"

# Or set these secrets in your repository:
# ACTIONS_STEP_DEBUG = true   (step-level debug output)
# ACTIONS_RUNNER_DEBUG = true (runner-level debug output)

# Add debug output in your steps:
steps:
  - name: Debug context
    run: |
      echo "Event: ${{ github.event_name }}"
      echo "Ref: ${{ github.ref }}"
      echo "SHA: ${{ github.sha }}"
      echo "Actor: ${{ github.actor }}"
      echo "Workflow: ${{ github.workflow }}"

Tip: When a workflow fails, read the logs from the bottom up. The actual error is usually in the last few lines of the failed step. Don't get distracted by the hundreds of lines of "normal" output above it.

⚙️ Track 2: GitHub Actions — Core Concepts

Lesson 12: Workflow File Naming

GitHub Actions automatically detects any .yml or .yaml file in the .github/workflows/ directory. The filename itself doesn't affect functionality — it's purely for your organizational benefit. However, good naming conventions make a big difference in maintainability.

Common naming conventions:

.github/workflows/
├── ci.yml              # Main CI pipeline (tests, lint)
├── cd.yml              # Deployment pipeline
├── release.yml         # Release creation and publishing
├── codeql.yml          # Security scanning
├── dependabot.yml      # Dependency update automation
├── stale.yml           # Stale issue/PR cleanup
├── docs.yml            # Documentation building/deployment
└── nightly.yml         # Scheduled nightly tasks

Some teams prefer more descriptive names, especially in monorepos:

.github/workflows/
├── backend-ci.yml
├── frontend-ci.yml
├── deploy-staging.yml
├── deploy-production.yml
├── e2e-tests.yml
└── infrastructure-validate.yml

The name field inside the YAML file is what appears in the GitHub Actions UI, not the filename. You can name the file ci.yml but give it the display name "Full CI Pipeline with Matrix Testing" — the UI will show the display name.

There's no limit to the number of workflow files you can have, but each active workflow consumes runner minutes when triggered. Be mindful of creating workflows with overlapping triggers that do redundant work.

Tip: Use lowercase with hyphens for filenames (deploy-production.yml), and human-readable titles for the name field. This keeps the filesystem tidy while making the UI informative.

⚙️ Track 2: GitHub Actions — Core Concepts

Track 2 Quiz

Test your knowledge — 5 questions, +25 XP each correct answer

Q1. Where do GitHub Actions workflow files live in a repository?

Workflow files must be placed in the .github/workflows/ directory. GitHub automatically detects any .yml or .yaml file in this location.

Q2. By default, how do jobs in a workflow execute?

Jobs run in parallel by default. To create sequential execution, use the 'needs' keyword to define dependencies between jobs.

Q3. How many vCPUs does a standard GitHub-hosted Linux runner have?

Standard GitHub-hosted Linux runners come with 4 vCPUs, 16 GB RAM, and 14 GB SSD storage.

Q4. What's the most secure way to pin a third-party action?

Pinning to a full commit SHA is the most secure method because SHAs are immutable. Version tags can be moved by the action author, potentially pointing to different (compromised) code.

Q5. Which trigger allows users to run a workflow manually from the GitHub UI?

workflow_dispatch creates a 'Run workflow' button in the Actions tab. It also supports custom input parameters like dropdowns, text fields, and booleans.

🧪 Track 3: Building & Testing Pipelines

Lesson 1: Python CI

Let's build a real-world CI pipeline for a Python project. This workflow installs Python, manages dependencies, runs tests with coverage, and reports results. It's the foundation that most Python projects should start with.

name: Python CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    name: Test Python
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'
          cache: 'pip'  # Built-in pip caching

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt
          pip install -r requirements-dev.txt

      - name: Run linting
        run: |
          pip install ruff
          ruff check .
          ruff format --check .

      - name: Run type checking
        run: |
          pip install mypy
          mypy src/ --ignore-missing-imports

      - name: Run tests with coverage
        run: |
          pip install pytest pytest-cov
          pytest tests/ -v --cov=src --cov-report=xml --cov-report=term-missing

      - name: Upload coverage report
        uses: actions/upload-artifact@v4
        with:
          name: coverage-report
          path: coverage.xml

This workflow uses actions/setup-python@v5 which handles Python installation and version management. The built-in cache: 'pip' option caches downloaded packages between runs, significantly speeding up dependency installation.

The pipeline follows a logical order: lint first (fastest, catches formatting issues), then type check (catches type errors without running code), then tests (the most thorough but slowest check). Failing early on linting saves time and runner minutes.

Tip: Always pin your Python version explicitly (e.g., 3.12) rather than using 3.x. This ensures reproducible builds. When you're ready to test multiple versions, use a matrix strategy (covered in the next lesson).

🧪 Track 3: Building & Testing Pipelines

Lesson 2: Node.js CI

Here's a production-ready CI workflow for a Node.js project. It handles dependency installation with caching, runs tests, and builds the project.

name: Node.js CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    name: Test Node.js
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'  # Caches ~/.npm

      - name: Install dependencies
        run: npm ci  # Clean install from lock file

      - name: Run linting
        run: npm run lint

      - name: Run type checking
        run: npm run typecheck  # If using TypeScript

      - name: Run tests
        run: npm test -- --coverage

      - name: Build
        run: npm run build

      - name: Upload build artifacts
        uses: actions/upload-artifact@v4
        with:
          name: build-output
          path: dist/
          retention-days: 7

Key Node.js CI best practices: Use npm ci instead of npm install — it's faster, stricter, and ensures reproducibility by installing exactly what's in package-lock.json. The cache: 'npm' option in setup-node caches the npm cache directory between runs.

The build artifact is uploaded with actions/upload-artifact@v4, making the built files available for download or for use in subsequent deployment jobs. The retention-days: 7 setting automatically cleans up old artifacts to save storage.

Tip: If you use a monorepo with workspaces, consider using npm ci --workspace=packages/my-app to install dependencies only for the package that changed. Combined with path filters, this can dramatically reduce CI time.

🧪 Track 3: Building & Testing Pipelines

Lesson 3: Multi-Version Testing

Testing against a single language version is risky — your code might work on Python 3.12 but break on 3.10. Matrix strategies let you test against multiple versions simultaneously, catching compatibility issues before your users do.

name: Multi-Version Python CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    name: Python ${{ matrix.python-version }}
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ['3.10', '3.11', '3.12', '3.13']
      fail-fast: false  # Don't cancel other versions if one fails

    steps:
      - uses: actions/checkout@v4

      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
          cache: 'pip'

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt

      - name: Run tests
        run: pytest tests/ -v

The strategy.matrix creates a separate job for each value. With four Python versions, this workflow runs four parallel jobs, each on its own runner. The ${{ matrix.python-version }} expression substitutes the current value in each job.

fail-fast: false is important for version testing. By default, GitHub Actions cancels all matrix jobs when any single one fails. Setting it to false ensures all versions run to completion, giving you a complete picture of compatibility rather than just the first failure.

Tip: Always quote version numbers in YAML matrices. Without quotes, 3.10 is interpreted as 3.1 (a float), which will install Python 3.1 instead of 3.10 — a common and frustrating gotcha.

🧪 Track 3: Building & Testing Pipelines

Lesson 4: Matrix Strategy Deep Dive

Matrix strategies can combine multiple dimensions — not just language versions. You can test across operating systems, dependency versions, configuration options, and more. This is incredibly powerful for ensuring broad compatibility.

jobs:
  test:
    name: ${{ matrix.os }} / Py ${{ matrix.python }} / ${{ matrix.deps }}
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-latest, macos-latest, windows-latest]
        python: ['3.11', '3.12']
        deps: [latest, minimal]
      fail-fast: false
      max-parallel: 4  # Limit concurrent jobs

    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python }}

      - name: Install latest dependencies
        if: matrix.deps == 'latest'
        run: pip install -r requirements.txt

      - name: Install minimal dependencies
        if: matrix.deps == 'minimal'
        run: pip install -r requirements-minimal.txt

      - run: pytest tests/

This creates 3 × 2 × 2 = 12 jobs. You can use include to add specific combinations and exclude to remove them:

strategy:
  matrix:
    os: [ubuntu-latest, macos-latest, windows-latest]
    python: ['3.11', '3.12']
    exclude:
      # Don't test 3.11 on macOS (not needed)
      - os: macos-latest
        python: '3.11'
    include:
      # Add a special combination with extra config
      - os: ubuntu-latest
        python: '3.13'
        experimental: true
        deps: nightly

The maximum number of jobs per workflow run is 256. For large matrices, use max-parallel to limit concurrency and avoid overwhelming your runner pool or burning through minutes too quickly.

Warning: Large matrices can get expensive fast. A 3-OS × 4-version × 2-config matrix creates 24 jobs. At 5 minutes each, that's 120 minutes per workflow run. Use path filters and branch restrictions to avoid running large matrices on every commit.

🧪 Track 3: Building & Testing Pipelines

Lesson 5: Caching

Caching is one of the most impactful optimizations for CI pipelines. Without caching, every workflow run downloads and installs dependencies from scratch. With caching, dependencies are stored between runs, and installation takes seconds instead of minutes.

The actions/cache@v4 action provides a general-purpose caching mechanism. It stores files at a given path, identified by a key. If the key matches on a subsequent run, the cached files are restored instead of being rebuilt.

# Python pip caching
- name: Cache pip packages
  uses: actions/cache@v4
  with:
    path: ~/.cache/pip
    key: pip-${{ runner.os }}-${{ hashFiles('**/requirements*.txt') }}
    restore-keys: |
      pip-${{ runner.os }}-

# Node.js npm caching
- name: Cache node modules
  uses: actions/cache@v4
  with:
    path: ~/.npm
    key: npm-${{ runner.os }}-${{ hashFiles('**/package-lock.json') }}
    restore-keys: |
      npm-${{ runner.os }}-

# Rust cargo caching
- name: Cache cargo
  uses: actions/cache@v4
  with:
    path: |
      ~/.cargo/bin/
      ~/.cargo/registry/index/
      ~/.cargo/registry/cache/
      ~/.cargo/git/db/
      target/
    key: cargo-${{ runner.os }}-${{ hashFiles('**/Cargo.lock') }}
    restore-keys: |
      cargo-${{ runner.os }}-

The cache key strategy is crucial. Use hashFiles() to include your lock file in the key — when dependencies change, the lock file changes, generating a new cache key. The restore-keys fallback allows partial cache matches when the exact key isn't found, which is still faster than a clean install.

Before and after caching comparison:

Step	Without Cache	With Cache
Install Python deps	45-90 seconds	5-10 seconds
Install Node.js deps	30-60 seconds	3-8 seconds
Rust build	5-15 minutes	30-90 seconds

Tip: Many setup actions (like actions/setup-python@v5 and actions/setup-node@v4) have built-in caching via the cache parameter. Use the built-in caching when available — it's simpler than configuring actions/cache manually.

🧪 Track 3: Building & Testing Pipelines

Lesson 6: Artifacts

Artifacts are files produced during a workflow run that you want to persist — build outputs, test reports, coverage data, compiled binaries, screenshots, and logs. GitHub Actions provides upload-artifact and download-artifact actions to pass files between jobs or make them available for download after the run completes.

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci && npm run build

      - name: Upload build output
        uses: actions/upload-artifact@v4
        with:
          name: dist-files
          path: dist/
          retention-days: 5  # Auto-delete after 5 days

  deploy:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - name: Download build output
        uses: actions/download-artifact@v4
        with:
          name: dist-files
          path: dist/

      - name: Deploy
        run: |
          echo "Deploying files from dist/..."
          ls -la dist/

Artifacts are the primary way to pass files between jobs (remember, jobs run on separate runners with separate file systems). A common pattern is to build in one job, then deploy the build output from another job.

You can upload multiple artifacts with different names, and download them selectively. Artifacts are visible in the workflow run UI and can be downloaded as ZIP files by anyone with repository access.

Tip: Set retention-days to a reasonable value (5-30 days) to avoid accumulating storage costs. Build artifacts that are deployed don't need to be retained for long. Test reports might need longer retention for auditing.

🧪 Track 3: Building & Testing Pipelines

Lesson 7: Test Reporting

Running tests in CI is only half the battle — you also need clear, accessible reporting so the team can quickly understand test results without digging through raw logs.

Coverage reporting shows which lines of code are covered by tests:

- name: Run tests with coverage
  run: |
    pytest tests/ \
      --cov=src \
      --cov-report=xml:coverage.xml \
      --cov-report=term-missing \
      --cov-branch \
      --junitxml=test-results.xml

- name: Upload coverage to Codecov
  uses: codecov/codecov-action@v4
  with:
    file: coverage.xml
    token: ${{ secrets.CODECOV_TOKEN }}
    fail_ci_if_error: false

Coverage badges provide at-a-glance visibility in your README. Services like Codecov and Coveralls generate badges automatically and can be configured to fail PRs that decrease coverage below a threshold.

For Node.js projects, similar reporting is available:

- name: Run tests with coverage
  run: |
    npx jest --coverage --ci --reporters=default --reporters=jest-junit
  env:
    JEST_JUNIT_OUTPUT_DIR: ./reports

- name: Upload test results
  uses: actions/upload-artifact@v4
  if: always()  # Upload even if tests fail
  with:
    name: test-results
    path: |
      reports/
      coverage/

The if: always() condition ensures test reports are uploaded even when tests fail — which is exactly when you need them most.

Tip: Set a minimum coverage threshold in your CI pipeline. A common starting point is 80%. Don't aim for 100% — it leads to brittle tests of trivial code. Focus coverage on business logic and error handling.

🧪 Track 3: Building & Testing Pipelines

Lesson 8: Linting

Linting catches code quality issues, style violations, and potential bugs without running the code. It's typically the fastest check in your pipeline and should run first — catching a formatting issue in 10 seconds is better than waiting 5 minutes for tests to reveal it.

Python linting with Ruff (extremely fast, replaces flake8 + isort + many more):

name: Lint

on: [push, pull_request]

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install Ruff
        run: pip install ruff

      - name: Check linting
        run: ruff check .

      - name: Check formatting
        run: ruff format --check .

JavaScript/TypeScript linting with ESLint and Prettier:

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci

      - name: ESLint
        run: npx eslint . --max-warnings 0

      - name: Prettier check
        run: npx prettier --check "src/**/*.{ts,tsx,js,jsx,json,css}"

The --max-warnings 0 flag for ESLint treats warnings as errors in CI, preventing gradual accumulation of code quality issues that everyone ignores.

Tip: Make linting auto-fixable locally. Developers should be able to run ruff format . or npx prettier --write . to fix issues before pushing. CI should only check, never auto-fix, to keep the pipeline predictable.

🧪 Track 3: Building & Testing Pipelines

Lesson 9: Type Checking

Static type checking catches an entire class of bugs at compile time — null reference errors, wrong argument types, missing return values, and more. Adding type checking to your CI pipeline catches these errors before they reach production.

Python type checking with mypy:

- name: Type check with mypy
  run: |
    pip install mypy
    mypy src/ --strict --ignore-missing-imports

# Or with pyright (faster, used by VS Code):
- name: Type check with pyright
  run: |
    pip install pyright
    pyright src/

TypeScript type checking:

- name: Type check
  run: npx tsc --noEmit
  # --noEmit checks types without producing output files

Type checking often catches bugs that unit tests miss — especially around edge cases, null handling, and function contracts. A function that accepts str but is sometimes called with None won't be caught by tests that only pass strings, but a type checker will flag it immediately.

For Python projects, you can start with relaxed type checking and gradually make it stricter:

# pyproject.toml - Start lenient, tighten over time
[tool.mypy]
python_version = "3.12"
warn_return_any = true
warn_unused_configs = true
check_untyped_defs = true
# Enable stricter checks as you add type annotations:
# disallow_untyped_defs = true
# strict = true

Tip: Don't try to add strict type checking to an untyped codebase all at once. Start with check_untyped_defs = true, then gradually enable stricter settings as you add type annotations. CI can enforce that type coverage doesn't decrease.

🧪 Track 3: Building & Testing Pipelines

Lesson 10: Security Scanning

CI is the perfect place to catch security vulnerabilities before they reach production. Automated security scanning can detect known vulnerabilities in dependencies, find code-level security issues, and ensure compliance with security policies.

CodeQL (GitHub's own semantic code analysis):

name: CodeQL Analysis

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  schedule:
    - cron: '0 6 * * 1'  # Weekly scan

jobs:
  analyze:
    runs-on: ubuntu-latest
    permissions:
      security-events: write
    steps:
      - uses: actions/checkout@v4

      - name: Initialize CodeQL
        uses: github/codeql-action/init@v3
        with:
          languages: python, javascript

      - name: Perform CodeQL Analysis
        uses: github/codeql-action/analyze@v3

Dependabot automatically creates PRs to update vulnerable dependencies. Configure it in .github/dependabot.yml:

# .github/dependabot.yml
version: 2
updates:
  - package-ecosystem: "pip"
    directory: "/"
    schedule:
      interval: "weekly"
    open-pull-requests-limit: 10

  - package-ecosystem: "npm"
    directory: "/"
    schedule:
      interval: "weekly"

2026 Security Roadmap: GitHub is adding several new security features to Actions: deterministic dependency locking (a dependencies: section in workflow YAML), scoped secrets with fine-grained access, a native egress firewall to control outbound network traffic, and policy-driven execution controls with actor and event rules via rulesets. These features address the most common supply-chain attack vectors in CI/CD.

Warning: Security scanning should never be the only line of defense. It catches known vulnerabilities and common patterns. Combine automated scanning with code review, dependency auditing, and security-conscious development practices.

🧪 Track 3: Building & Testing Pipelines

Lesson 11: Build Artifacts

Many projects need a build step that compiles source code, bundles assets, or produces distributable packages. CI is the ideal place to produce build artifacts because the build environment is clean, reproducible, and consistent across all team members.

name: Build & Package

on:
  push:
    branches: [main]
    tags: ['v*']

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - name: Install and build
        run: |
          npm ci
          npm run build
        env:
          NODE_ENV: production

      - name: Upload build
        uses: actions/upload-artifact@v4
        with:
          name: production-build
          path: |
            dist/
            !dist/**/*.map  # Exclude source maps
          retention-days: 30

  package-python:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Build Python package
        run: |
          pip install build
          python -m build  # Creates sdist and wheel in dist/

      - name: Upload package
        uses: actions/upload-artifact@v4
        with:
          name: python-package
          path: dist/*.whl

Build artifacts produced in CI have important properties: they're built from the exact commit that was tested, they're built in a clean environment (no leftover state from previous builds), and they're identical for everyone — no "it builds differently on my machine" problems.

Tip: Never deploy code that was built locally. Always deploy artifacts built in CI. This ensures that what was tested is exactly what gets deployed, and that the build is reproducible by anyone.

🧪 Track 3: Building & Testing Pipelines

Lesson 12: Fail Fast

The "fail fast" principle means stopping the pipeline as early as possible when a problem is detected. There's no point running a 10-minute test suite if a 10-second lint check would have caught the issue. Structuring your pipeline to fail fast saves time and runner minutes.

Pipeline ordering for fast failure:

jobs:
  # Stage 1: Fast checks (seconds)
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install ruff && ruff check .

  typecheck:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install mypy && mypy src/

  # Stage 2: Tests (minutes) — only if fast checks pass
  test:
    needs: [lint, typecheck]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install -r requirements.txt && pytest

  # Stage 3: Build (minutes) — only if tests pass
  build:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: docker build -t myapp .

In matrix strategies, fail-fast controls whether other matrix jobs are cancelled when one fails:

strategy:
  fail-fast: true   # Default: cancel all if one fails
  matrix:
    os: [ubuntu-latest, macos-latest, windows-latest]

Set fail-fast: true (the default) when you want quick feedback — if the code fails on Ubuntu, it'll probably fail on macOS too, so why wait? Set fail-fast: false when you need the full picture, like when debugging a platform-specific issue.

Tip: Order your pipeline steps from fastest to slowest: formatting check → lint → type check → unit tests → integration tests → e2e tests → build → deploy. Each stage gates the next, so failures are caught at the cheapest possible point.

🧪 Track 3: Building & Testing Pipelines

Lesson 13: Parallel Jobs

Running jobs in parallel is one of the simplest ways to speed up your CI pipeline. Instead of running lint, test, and build sequentially (total time = sum of all), run them simultaneously (total time = longest single job).

name: Parallel CI

on: [push, pull_request]

jobs:
  lint:
    name: "🔍 Lint"
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install ruff && ruff check .  # ~10 seconds

  typecheck:
    name: "📐 Type Check"
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install pyright && pyright src/  # ~30 seconds

  test-unit:
    name: "🧪 Unit Tests"
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: |
          pip install -r requirements.txt
          pytest tests/unit/ -v  # ~2 minutes

  test-integration:
    name: "🔗 Integration Tests"
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_PASSWORD: testpass
        ports: ['5432:5432']
    steps:
      - uses: actions/checkout@v4
      - run: |
          pip install -r requirements.txt
          pytest tests/integration/ -v  # ~3 minutes

  # This diagram shows the parallel execution:
  #
  #  ┌──────────┐  ┌────────────┐  ┌────────────┐  ┌──────────────────┐
  #  │   Lint   │  │ Type Check │  │ Unit Tests │  │ Integration Tests│
  #  │  ~10s    │  │   ~30s     │  │   ~2min    │  │      ~3min       │
  #  └──────────┘  └────────────┘  └────────────┘  └──────────────────┘
  #       │              │               │                  │
  #       └──────────────┴───────────────┴──────────────────┘
  #                              │
  #                     Total: ~3 minutes
  #                   (not 5.5 min sequential)

  deploy:
    name: "🚀 Deploy"
    needs: [lint, typecheck, test-unit, test-integration]
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4
      - run: echo "All checks passed! Deploying..."

With parallel execution, the total CI time equals the duration of the slowest job (3 minutes for integration tests) rather than the sum of all jobs (5.5 minutes). For larger pipelines, the savings are even more dramatic.

Tip: Parallel jobs do consume more runner minutes total (each job runs on its own runner). But the wall-clock time improvement is worth it for developer productivity. A pipeline that finishes in 3 minutes instead of 10 means faster feedback and happier developers.

🧪 Track 3: Building & Testing Pipelines

Track 3 Quiz

Test your knowledge — 4 questions, +25 XP each correct answer

Q1. Why should you use `npm ci` instead of `npm install` in CI pipelines?

`npm ci` does a clean install from package-lock.json, is faster (skips some resolution steps), and fails if the lock file is out of sync with package.json — ensuring reproducible builds.

Q2. Why must you quote Python version '3.10' in a YAML matrix?

In YAML, 3.10 without quotes is parsed as the floating-point number 3.1. You must quote it as '3.10' to preserve it as a string, ensuring Python 3.10 (not 3.1) is installed.

Q3. What is the maximum number of jobs per workflow run in a matrix strategy?

GitHub Actions supports a maximum of 256 jobs per workflow run. Large matrices should be designed with this limit in mind.

Q4. What does `fail-fast: false` do in a matrix strategy?

By default, when one matrix job fails, all other running jobs are cancelled. Setting fail-fast to false lets all jobs run to completion, giving you the full picture of which combinations pass and which fail.

🔐 Track 4: Secrets, Environments & Variables

Lesson 1: Repository Secrets

Secrets are encrypted values that you store in your repository's settings and access in workflows. They're used for sensitive data like API keys, access tokens, database passwords, and signing certificates. GitHub encrypts secrets using libsodium sealed boxes and never exposes them in logs.

To create a secret: go to your repository → Settings → Secrets and variables → Actions → New repository secret. Give it a name (uppercase with underscores by convention) and a value.

Important rules about secrets:

Secrets are not passed to workflows triggered by pull requests from forks (security protection)
Secret values cannot be read back from the UI after creation — only updated or deleted
If a secret value is accidentally printed in logs, GitHub automatically masks it with ***
Secrets are available as encrypted environment variables to the runner process
Maximum of 1,000 secrets per repository, each up to 48 KB

# Using secrets in a workflow
steps:
  - name: Deploy to production
    run: ./deploy.sh
    env:
      AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
      AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
      DATABASE_URL: ${{ secrets.DATABASE_URL }}

Warning: Never hardcode secrets in your workflow files, even "temporarily." Workflow files are version-controlled, and secrets committed to git are effectively compromised forever — even if you remove them in a later commit, they're still in the git history.

🔐 Track 4: Secrets, Environments & Variables

Lesson 2: Using Secrets

Accessing secrets in workflows uses the ${{ secrets.SECRET_NAME }} expression syntax. Secrets can be passed to steps as environment variables, used in with parameters for actions, or used in conditional expressions.

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      # Pass as environment variable
      - name: Deploy
        run: |
          echo "Deploying to $DEPLOY_TARGET..."
          curl -X POST "$API_ENDPOINT/deploy" \
            -H "Authorization: Bearer $API_TOKEN" \
            -d '{"version": "${{ github.sha }}"}'
        env:
          API_TOKEN: ${{ secrets.API_TOKEN }}
          API_ENDPOINT: ${{ secrets.API_ENDPOINT }}
          DEPLOY_TARGET: production

      # Pass as action input
      - name: Login to Docker Hub
        uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}

      # Use in conditional
      - name: Notify Slack
        if: secrets.SLACK_WEBHOOK != ''
        run: |
          curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
            -d '{"text": "Deployment complete!"}'

If you reference a secret that doesn't exist, it evaluates to an empty string — it doesn't cause an error. This can be useful for optional integrations (like the Slack example above) but can also silently break your workflow if you misspell a secret name.

Tip: Use env: to pass secrets as environment variables rather than interpolating them directly in run: commands. This is more secure because environment variables are handled by the OS, while ${{}} interpolation happens in the YAML preprocessor and could be logged in debug mode.

🔐 Track 4: Secrets, Environments & Variables

Lesson 3: Organization Secrets

If you manage multiple repositories, creating the same secrets in each one is tedious and error-prone. Organization secrets let you define secrets once at the organization level and share them across selected repositories.

Organization secrets are created in your organization's settings → Secrets and variables → Actions. You can control access with three visibility levels:

Visibility	Description
All repositories	Every repo in the org can access the secret
Private repositories	Only private repos can access it
Selected repositories	Only specific repos you choose can access it

When a repository has a secret with the same name as an organization secret, the repository secret takes precedence. This allows individual repos to override organization defaults when needed.

# Organization secrets are accessed the same way as repo secrets
steps:
  - name: Deploy with org-wide AWS credentials
    run: aws s3 sync ./build s3://my-bucket/
    env:
      AWS_ACCESS_KEY_ID: ${{ secrets.ORG_AWS_ACCESS_KEY_ID }}
      AWS_SECRET_ACCESS_KEY: ${{ secrets.ORG_AWS_SECRET_ACCESS_KEY }}

Tip: Use organization secrets for shared infrastructure credentials (AWS, GCP, Docker registry). Use repository secrets for repo-specific values. Use the "selected repositories" visibility to follow the principle of least privilege.

🔐 Track 4: Secrets, Environments & Variables

Lesson 4: Environment Secrets

Environments in GitHub Actions represent deployment targets like staging, production, and preview. Each environment can have its own secrets and variables, allowing the same workflow to use different credentials depending on where it's deploying.

Create environments in your repository → Settings → Environments. Each environment can have unique secrets (e.g., production has the production database URL, staging has the staging database URL).

jobs:
  deploy-staging:
    runs-on: ubuntu-latest
    environment: staging  # Uses staging secrets
    steps:
      - uses: actions/checkout@v4
      - name: Deploy to staging
        run: ./deploy.sh
        env:
          DATABASE_URL: ${{ secrets.DATABASE_URL }}  # Staging DB URL
          API_KEY: ${{ secrets.API_KEY }}             # Staging API key

  deploy-production:
    runs-on: ubuntu-latest
    needs: deploy-staging
    environment: production  # Uses production secrets
    steps:
      - uses: actions/checkout@v4
      - name: Deploy to production
        run: ./deploy.sh
        env:
          DATABASE_URL: ${{ secrets.DATABASE_URL }}  # Production DB URL
          API_KEY: ${{ secrets.API_KEY }}             # Production API key

The secret names are identical in both jobs (DATABASE_URL, API_KEY), but the values differ because each job runs in a different environment. This keeps your workflow DRY while maintaining environment isolation.

Tip: Environment secrets take precedence over repository secrets of the same name. This means you can have a default DATABASE_URL at the repo level and override it in each environment.

🔐 Track 4: Secrets, Environments & Variables

Lesson 5: Variables

Variables are like secrets but for non-sensitive configuration values. They're stored in plain text and visible in the repository settings. Use variables for values like feature flags, deployment URLs, version numbers, and configuration options that aren't secret.

Variables are accessed with ${{ vars.VARIABLE_NAME }} (note: vars, not secrets). Like secrets, variables exist at repository, environment, and organization levels.

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4

      - name: Configure deployment
        run: |
          echo "Deploying to: ${{ vars.DEPLOY_URL }}"
          echo "Region: ${{ vars.AWS_REGION }}"
          echo "Feature flags: ${{ vars.FEATURE_FLAGS }}"
          echo "Max instances: ${{ vars.MAX_INSTANCES }}"

      - name: Build with configuration
        run: npm run build
        env:
          NEXT_PUBLIC_API_URL: ${{ vars.API_URL }}
          NEXT_PUBLIC_ENV: ${{ vars.ENVIRONMENT_NAME }}

# Example variable values:
# Repository level:
#   APP_NAME = "my-app"
#   LOG_LEVEL = "info"
#
# Environment: staging
#   DEPLOY_URL = "https://staging.example.com"
#   AWS_REGION = "us-east-1"
#   MAX_INSTANCES = "2"
#
# Environment: production
#   DEPLOY_URL = "https://example.com"
#   AWS_REGION = "us-east-1"
#   MAX_INSTANCES = "10"

Tip: Use variables for anything that differs between environments but isn't sensitive. This keeps environment-specific configuration out of your code and makes it easy to change without modifying workflow files.

🔐 Track 4: Secrets, Environments & Variables

Lesson 6: Environment Protection Rules

Environment protection rules add safety gates to your deployment pipeline. They prevent accidental deployments and enforce approval workflows, especially important for production environments.

Protection rules are configured per environment in repository settings. Available rules include:

Rule	Description	Settings
Required reviewers	Specified people must approve before deploy	Up to 6 reviewers
Wait timer	Delay between approval and deployment	1 to 43,200 minutes (30 days)
Deployment branches	Only specific branches can deploy	Branch name patterns or tags

jobs:
  deploy-production:
    runs-on: ubuntu-latest
    environment:
      name: production
      url: https://example.com  # Shows as deployment URL in GitHub UI
    steps:
      - uses: actions/checkout@v4
      - name: Deploy to production
        run: ./deploy.sh

# When this job reaches the 'production' environment:
# 1. GitHub pauses the workflow
# 2. Required reviewers receive a notification
# 3. A reviewer approves (or rejects) in the GitHub UI
# 4. If a wait timer is set, the countdown begins
# 5. Only after all conditions are met does the job execute
# 6. Only pushes to allowed branches can trigger this workflow

Protection rules create an audit trail. Every deployment shows who approved it, when, and from which branch. This is invaluable for regulated industries and for post-incident analysis.

Tip: Even if you practice continuous deployment, consider adding a required reviewer for production. It takes seconds to approve and provides a human checkpoint that catches the rare but costly mistakes that automated tests miss.

🔐 Track 4: Secrets, Environments & Variables

Lesson 7: GITHUB_TOKEN

GITHUB_TOKEN is a special secret that GitHub automatically creates for every workflow run. It provides authenticated access to the GitHub API and repository, scoped to the current repository. You don't need to create or manage it — it's always available.

By default, GITHUB_TOKEN has read/write access to common permissions, but you can (and should) restrict it using the permissions key:

jobs:
  release:
    runs-on: ubuntu-latest
    # Restrict token to only what's needed
    permissions:
      contents: write      # Create releases
      pull-requests: read  # Read PR info
      issues: read         # Read issue info

    steps:
      - uses: actions/checkout@v4

      - name: Create release
        run: |
          gh release create "v1.0.0" \
            --title "Release v1.0.0" \
            --notes "First release" \
            --target main
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

      # Token also used implicitly by many actions
      - name: Comment on PR
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body: 'CI passed! Ready for review.'
            })

Available permission scopes include: id-token, contents, actions, attestations, checks, deployments, discussions, issues, models, packages, pages, pull-requests, security-events, and statuses.

Warning: Always set explicit permissions in your workflow. The default permissions are often broader than needed. Following the principle of least privilege, only grant the specific permissions your workflow requires.

🔐 Track 4: Secrets, Environments & Variables

Lesson 8: OIDC

OpenID Connect (OIDC) lets your GitHub Actions workflows authenticate to cloud providers (AWS, GCP, Azure) without storing long-lived credentials as secrets. Instead, GitHub generates a short-lived token for each workflow run, which the cloud provider verifies and exchanges for temporary credentials.

This is significantly more secure than storing access keys as secrets. OIDC tokens are short-lived (valid only for the duration of the job), automatically scoped to the specific repository and workflow, and leave no permanent credentials to rotate or leak.

AWS OIDC authentication:

jobs:
  deploy:
    runs-on: ubuntu-latest
    permissions:
      id-token: write   # Required for OIDC
      contents: read
    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials via OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsRole
          aws-region: us-east-1
          # No access keys needed!

      - name: Deploy to S3
        run: aws s3 sync ./build s3://my-app-bucket/

GCP OIDC authentication:

- name: Authenticate to Google Cloud
  uses: google-github-actions/auth@v2
  with:
    workload_identity_provider: 'projects/123/locations/global/workloadIdentityPools/github/providers/my-repo'
    service_account: 'deploy@my-project.iam.gserviceaccount.com'

Azure OIDC authentication:

- name: Azure login
  uses: azure/login@v2
  with:
    client-id: ${{ secrets.AZURE_CLIENT_ID }}
    tenant-id: ${{ secrets.AZURE_TENANT_ID }}
    subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

Tip: If your cloud provider supports OIDC, use it instead of stored access keys. The setup is slightly more complex (you need to configure a trust relationship in your cloud provider), but the security improvement is substantial — no more rotating access keys or worrying about leaked credentials.

🔐 Track 4: Secrets, Environments & Variables

Lesson 9: The env Context

Environment variables in GitHub Actions can be set at three levels: workflow, job, and step. Each level's variables are available to all children. Step-level variables override job-level, which override workflow-level.

name: Environment Variables Demo

# Workflow-level: available to ALL jobs and steps
env:
  APP_NAME: my-application
  LOG_LEVEL: info

jobs:
  build:
    runs-on: ubuntu-latest
    # Job-level: available to all steps in this job
    env:
      NODE_ENV: production
      BUILD_NUMBER: ${{ github.run_number }}

    steps:
      - name: Show variables
        # Step-level: only available in this step
        env:
          STEP_VAR: hello
        run: |
          echo "App: $APP_NAME"          # Workflow-level
          echo "Node env: $NODE_ENV"     # Job-level
          echo "Build: $BUILD_NUMBER"    # Job-level
          echo "Step var: $STEP_VAR"     # Step-level

      - name: Set dynamic variable
        run: echo "COMMIT_SHORT=$(git rev-parse --short HEAD)" >> $GITHUB_ENV

      - name: Use dynamic variable
        run: echo "Short commit: $COMMIT_SHORT"

The $GITHUB_ENV file is special — writing name=value pairs to it sets environment variables for all subsequent steps in the same job. This is how you dynamically set variables based on previous step results.

For multi-line values:

# Multi-line environment variable
echo "JSON_CONFIG<> $GITHUB_ENV
echo '{"key": "value", "nested": {"a": 1}}' >> $GITHUB_ENV
echo "EOF" >> $GITHUB_ENV

Tip: Use $GITHUB_ENV sparingly — it's a shared mutable state between steps, which can make workflows harder to debug. Prefer step outputs ($GITHUB_OUTPUT) for passing data between steps when possible.

🔐 Track 4: Secrets, Environments & Variables

Lesson 10: Contexts & Expressions

GitHub Actions provides several contexts — objects containing information about the workflow run, repository, event, and environment. You access them with the ${{ }} expression syntax.

Context	Contains	Example
`github`	Event info, repo, branch, SHA, actor	`${{ github.ref_name }}`
`env`	Environment variables	`${{ env.MY_VAR }}`
`secrets`	Encrypted secrets	`${{ secrets.API_KEY }}`
`steps`	Outputs from previous steps	`${{ steps.build.outputs.version }}`
`matrix`	Current matrix values	`${{ matrix.os }}`
`runner`	Runner info (OS, arch, temp dir)	`${{ runner.os }}`
`job`	Current job status	`${{ job.status }}`
`needs`	Outputs from dependent jobs	`${{ needs.build.outputs.version }}`

Expressions support operators, functions, and conditionals:

steps:
  # String comparison
  - if: github.ref == 'refs/heads/main'
    run: echo "On main branch"

  # Boolean logic
  - if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    run: echo "Push to main"

  # Contains function
  - if: contains(github.event.head_commit.message, '[skip ci]')
    run: echo "Skipping CI"

  # String functions
  - if: startsWith(github.ref, 'refs/tags/v')
    run: echo "Tag push: ${{ github.ref_name }}"

  # Status functions
  - if: always()          # Run even if previous steps failed
    run: echo "Always runs"

  - if: failure()          # Only if a previous step failed
    run: echo "Something failed!"

  # Format function
  - run: echo "${{ format('Hello {0}, run #{1}', github.actor, github.run_number) }}"

Tip: Use github.event_name to vary behavior between push and PR triggers, and github.ref_name to get the branch or tag name. These two contexts cover most conditional logic needs.

🔐 Track 4: Secrets, Environments & Variables

Lesson 11: Security Best Practices

CI/CD pipelines have access to your secrets, your code, and your production infrastructure. A compromised pipeline is a compromised system. Follow these security best practices to minimize risk.

1. Never echo secrets:

# WRONG — can leak in logs
- run: echo "Token is ${{ secrets.API_TOKEN }}"

# CORRECT — pass as environment variable
- run: curl -H "Authorization: Bearer $API_TOKEN" https://api.example.com
  env:
    API_TOKEN: ${{ secrets.API_TOKEN }}

2. Pin actions to commit SHAs:

# RISKY — tag can be moved to malicious code
- uses: actions/checkout@v4

# SECURE — SHA is immutable
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4

3. Minimal permissions:

# Set restrictive permissions at workflow level
permissions:
  contents: read

jobs:
  deploy:
    permissions:
      contents: read
      id-token: write  # Only the deploy job needs OIDC

4. Protect against script injection:

# VULNERABLE — PR title could contain malicious commands
- run: echo "PR: ${{ github.event.pull_request.title }}"

# SAFE — pass untrusted input via environment variable
- run: echo "PR: $PR_TITLE"
  env:
    PR_TITLE: ${{ github.event.pull_request.title }}

5. Limit secret exposure: Only pass secrets to the specific steps that need them. Don't set them at the workflow or job level unless every step in the job needs access.

Warning: Workflow files from pull requests run with read-only access to the base repository's secrets. Never use pull_request_target to bypass this — it gives fork PRs write access to your secrets, which is a critical security vulnerability.

🔐 Track 4: Secrets, Environments & Variables

Lesson 12: Secret Rotation

Secrets should be rotated regularly — even if you don't suspect a breach. Rotation limits the window of exposure if a secret is compromised and ensures that old credentials can't be used by former team members or leaked logs.

Manual rotation process:

Generate new credentials in the external service (AWS, API provider, etc.)
Update the secret in GitHub (Settings → Secrets → Update)
Verify the workflow still works with the new credentials
Revoke the old credentials in the external service

Automated rotation with a scheduled workflow:

name: Rotate Secrets

on:
  schedule:
    - cron: '0 0 1 * *'  # Monthly on the 1st
  workflow_dispatch:       # Manual trigger for emergency rotation

jobs:
  rotate:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
    steps:
      - name: Authenticate to AWS via OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/SecretRotationRole
          aws-region: us-east-1

      - name: Rotate API key
        run: |
          # Generate new key
          NEW_KEY=$(aws secretsmanager get-random-password --query 'RandomPassword' --output text)
          # Update in AWS Secrets Manager
          aws secretsmanager update-secret --secret-id my-api-key --secret-string "$NEW_KEY"
          # Update in GitHub Secrets via API
          # (requires a GitHub App or PAT with repo admin permissions)
          echo "Secret rotated successfully"

2026 Update — Scoped Secrets: GitHub's 2026 security roadmap includes scoped secrets, which will allow even finer-grained control over which workflows, jobs, and steps can access specific secrets. This reduces the blast radius if any part of your pipeline is compromised.

Tip: The best approach to secret management is to minimize the number of long-lived secrets. Use OIDC for cloud providers (no stored credentials), use GITHUB_TOKEN for GitHub API access (automatically generated and short-lived), and rotate remaining secrets on a regular schedule.

🔐 Track 4: Secrets, Environments & Variables

Track 4 Quiz

Test your knowledge — 4 questions, +25 XP each correct answer

Q1. What happens when you reference a secret that doesn't exist?

A non-existent secret evaluates to an empty string without causing an error. This can be useful for optional secrets but can also silently break workflows if you misspell a secret name.

Q2. What is the main advantage of OIDC over stored access keys?

OIDC generates short-lived tokens for each workflow run. There are no long-lived credentials stored as secrets, eliminating the risk of leaked keys and the need for manual rotation.

Q3. How many required reviewers can an environment protection rule have?

Environment protection rules support up to 6 required reviewers. All specified reviewers must approve before the deployment proceeds.

Q4. What is the secure way to use untrusted input (like PR titles) in a run command?

Always pass untrusted input via environment variables. Direct interpolation in `run:` commands is vulnerable to script injection — a malicious PR title could execute arbitrary commands.

⚡ Track 5: Advanced Workflows

Lesson 1: Job Dependencies

By default, jobs in a workflow run in parallel. The needs keyword creates dependencies between jobs, forcing them to run sequentially. This is essential when one job's output is required by another — you can't deploy before tests pass.

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm run lint

  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm test

  build:
    needs: [lint, test]  # Waits for BOTH lint and test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm run build

  deploy-staging:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - run: echo "Deploying to staging..."

  deploy-production:
    needs: deploy-staging
    runs-on: ubuntu-latest
    environment: production
    steps:
      - run: echo "Deploying to production..."

# Execution graph:
#   lint ──┐
#          ├──> build ──> deploy-staging ──> deploy-production
#   test ──┘

The needs keyword accepts a single job ID or an array of job IDs. A job with multiple dependencies waits until all of them complete successfully. If any dependency fails, the dependent job is skipped (unless you use if: always()).

You can also create complex dependency graphs. For example, staging deployment might need both build and security-scan, while production deployment needs staging deployment plus manual approval.

Tip: Visualize your job dependency graph before building it. Draw boxes for each job and arrows for dependencies. This helps you identify opportunities for parallelism and ensure the flow makes logical sense.

⚡ Track 5: Advanced Workflows

Lesson 2: Conditional Execution

The if keyword controls whether a job or step runs. Combined with expressions and status functions, you can build sophisticated conditional logic in your workflows.

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm test

  deploy:
    needs: test
    runs-on: ubuntu-latest
    # Only deploy on push to main (not PRs)
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    steps:
      - run: echo "Deploying..."

  notify-success:
    needs: deploy
    runs-on: ubuntu-latest
    # Only if deploy succeeded
    if: success()
    steps:
      - run: echo "Deployment succeeded!"

  notify-failure:
    needs: [test, deploy]
    runs-on: ubuntu-latest
    # Only if any previous job failed
    if: failure()
    steps:
      - run: echo "Something failed! Check the logs."

  cleanup:
    needs: [test, deploy]
    runs-on: ubuntu-latest
    # Always run, even if previous jobs failed or were cancelled
    if: always()
    steps:
      - run: echo "Cleaning up resources..."

  report:
    needs: test
    runs-on: ubuntu-latest
    # Only on cancelled workflows
    if: cancelled()
    steps:
      - run: echo "Workflow was cancelled"

The four status functions are: success() (default if no if specified — runs only if all dependencies succeeded), failure() (at least one dependency failed), always() (runs regardless of dependency status), and cancelled() (workflow was cancelled).

Tip: The if: always() condition is essential for cleanup steps. Without it, resource cleanup only runs on success — and failed runs are exactly when you need cleanup most (e.g., tearing down test infrastructure).

⚡ Track 5: Advanced Workflows

Lesson 3: Reusable Workflows

Reusable workflows let you define a workflow once and call it from other workflows — like functions for your CI/CD. This eliminates duplication when multiple repositories or workflows need the same pipeline logic.

The reusable workflow (callee) — defined with workflow_call trigger:

# .github/workflows/reusable-test.yml
name: Reusable Test Pipeline

on:
  workflow_call:
    inputs:
      python-version:
        description: 'Python version to test'
        required: false
        type: string
        default: '3.12'
      run-coverage:
        description: 'Whether to run coverage'
        required: false
        type: boolean
        default: true
    secrets:
      codecov-token:
        required: false

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ inputs.python-version }}
      - run: pip install -r requirements.txt
      - run: pytest tests/ -v ${{ inputs.run-coverage && '--cov=src' || '' }}
      - if: inputs.run-coverage
        uses: codecov/codecov-action@v4
        with:
          token: ${{ secrets.codecov-token }}

The caller workflow — uses uses to reference the reusable workflow:

# .github/workflows/ci.yml
name: CI

on: [push, pull_request]

jobs:
  test-3-11:
    uses: ./.github/workflows/reusable-test.yml
    with:
      python-version: '3.11'
    secrets:
      codecov-token: ${{ secrets.CODECOV_TOKEN }}

  test-3-12:
    uses: ./.github/workflows/reusable-test.yml
    with:
      python-version: '3.12'
    secrets:
      codecov-token: ${{ secrets.CODECOV_TOKEN }}

  # Can also call workflows from other repos:
  # uses: my-org/shared-workflows/.github/workflows/test.yml@main

Tip: Create a shared repository (e.g., my-org/ci-workflows) for reusable workflows. All repos in your organization can reference them. This creates a single source of truth for your CI/CD standards.

⚡ Track 5: Advanced Workflows

Lesson 4: Composite Actions

Composite actions bundle multiple steps into a single reusable action. While reusable workflows operate at the workflow level, composite actions operate at the step level — they're used within a job's steps. Think of them as custom functions you can call from any workflow.

# .github/actions/setup-and-test/action.yml
name: 'Setup and Test Python'
description: 'Install Python, dependencies, and run tests'

inputs:
  python-version:
    description: 'Python version'
    required: false
    default: '3.12'
  test-path:
    description: 'Path to test directory'
    required: false
    default: 'tests/'

outputs:
  coverage:
    description: 'Test coverage percentage'
    value: ${{ steps.test.outputs.coverage }}

runs:
  using: 'composite'
  steps:
    - name: Set up Python
      uses: actions/setup-python@v5
      with:
        python-version: ${{ inputs.python-version }}
        cache: 'pip'

    - name: Install dependencies
      shell: bash
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt
        pip install pytest pytest-cov

    - name: Run tests
      id: test
      shell: bash
      run: |
        COVERAGE=$(pytest ${{ inputs.test-path }} --cov=src --cov-report=term | grep 'TOTAL' | awk '{print $4}')
        echo "coverage=$COVERAGE" >> $GITHUB_OUTPUT

Using the composite action in a workflow:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup and test
        id: test
        uses: ./.github/actions/setup-and-test
        with:
          python-version: '3.12'

      - name: Report coverage
        run: echo "Coverage: ${{ steps.test.outputs.coverage }}"

Tip: Use composite actions when you have 3+ steps that are repeated across multiple jobs or workflows. They're easier to create than reusable workflows (just an action.yml file) and work at the step level, giving you more flexibility in how you compose your jobs.

⚡ Track 5: Advanced Workflows

Lesson 5: Concurrency

Concurrency controls prevent multiple runs of the same workflow from executing simultaneously. This is critical for deployments — you don't want two deployment workflows racing to push different versions to production.

name: Deploy

on:
  push:
    branches: [main]

# Concurrency group: only one run per branch
concurrency:
  group: deploy-${{ github.ref }}
  cancel-in-progress: true  # Cancel older runs when new one starts

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: ./deploy.sh

The group key defines the concurrency scope. All workflow runs with the same group key are considered part of the same concurrency group. Only one run in a group can be active at a time.

cancel-in-progress: true cancels the currently running workflow when a new one starts. This is ideal for CI on feature branches — if you push three commits in quick succession, only the last one needs to be tested. Without cancel-in-progress, the new run queues and waits.

# Common concurrency patterns:

# Per-branch (CI): cancel old runs on same branch
concurrency:
  group: ci-${{ github.ref }}
  cancel-in-progress: true

# Per-environment (deploy): queue, don't cancel
concurrency:
  group: deploy-production
  cancel-in-progress: false

# Per-PR: cancel old runs for same PR
concurrency:
  group: pr-${{ github.event.pull_request.number }}
  cancel-in-progress: true

Tip: Use cancel-in-progress: true for CI workflows (no need to test superseded commits). Use cancel-in-progress: false for deployments (you want them to complete in order, not skip versions).

⚡ Track 5: Advanced Workflows

Lesson 6: Timeout

Every GitHub Actions job has a default timeout of 360 minutes (6 hours). If your job doesn't complete within this window, it's automatically cancelled. You should set explicit, shorter timeouts to catch hung jobs early and conserve runner minutes.

jobs:
  test:
    runs-on: ubuntu-latest
    timeout-minutes: 15  # Kill if tests take more than 15 minutes
    steps:
      - uses: actions/checkout@v4
      - run: npm test

  build:
    runs-on: ubuntu-latest
    timeout-minutes: 30
    steps:
      - uses: actions/checkout@v4
      - run: npm run build

  deploy:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
      - uses: actions/checkout@v4
      - name: Deploy with step timeout
        timeout-minutes: 5  # Individual step timeout
        run: ./deploy.sh

Without explicit timeouts, a hung process (infinite loop, network stall, zombie process) will burn runner minutes for up to 6 hours before being killed. That's potentially expensive and blocks other runs in the concurrency group.

Set timeouts based on your typical job duration plus a reasonable buffer. If your tests normally take 5 minutes, a 15-minute timeout catches hangs without false-positiving on slow runs. Monitor your job durations over time and tighten timeouts as your pipeline stabilizes.

Tip: Individual step timeouts (set on the step, not the job) are useful for steps that interact with external services. A deployment step that normally takes 2 minutes should have a 5-minute timeout to catch cases where the target server is unresponsive.

⚡ Track 5: Advanced Workflows

Lesson 7: Continue on Error

By default, if any step fails, the job stops immediately. The continue-on-error: true flag lets a step fail without failing the entire job. This is useful for non-critical checks, optional integrations, and experimental tests.

jobs:
  quality:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      # Critical: must pass
      - name: Run tests
        run: pytest tests/

      # Non-critical: nice to have, but don't block the pipeline
      - name: Performance benchmark
        continue-on-error: true
        run: |
          pytest tests/benchmarks/ --benchmark-only
          # Even if benchmarks fail, the job continues

      # Experimental: testing new Python version
      - name: Test on Python 3.13 (experimental)
        continue-on-error: true
        run: |
          # This might fail — it's bleeding edge
          pip install -r requirements.txt
          pytest tests/

      # This step runs even if the above failed
      - name: Upload results
        run: echo "All checks complete"

You can also use continue-on-error at the job level in matrix strategies to mark experimental combinations:

strategy:
  matrix:
    include:
      - python: '3.12'
        experimental: false
      - python: '3.13'
        experimental: true

continue-on-error: ${{ matrix.experimental }}

Warning: Use continue-on-error sparingly. When a step is allowed to fail silently, failures can go unnoticed for weeks. Always pair it with explicit monitoring or alerting for the "soft-failed" step.

⚡ Track 5: Advanced Workflows

Lesson 8: Step Outputs

Steps can produce outputs that subsequent steps in the same job can consume. This is how you pass data between steps — version numbers, computed values, file paths, and flags.

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Get version
        id: version  # ID is required to reference outputs
        run: |
          VERSION=$(cat package.json | jq -r '.version')
          echo "version=$VERSION" >> $GITHUB_OUTPUT
          echo "short_sha=$(git rev-parse --short HEAD)" >> $GITHUB_OUTPUT

      - name: Check if tests should run
        id: check
        run: |
          if git diff --name-only HEAD~1 | grep -q 'src/'; then
            echo "run_tests=true" >> $GITHUB_OUTPUT
          else
            echo "run_tests=false" >> $GITHUB_OUTPUT
          fi

      - name: Use version
        run: |
          echo "Building version ${{ steps.version.outputs.version }}"
          echo "Commit: ${{ steps.version.outputs.short_sha }}"

      - name: Conditional test run
        if: steps.check.outputs.run_tests == 'true'
        run: npm test

      - name: Build with version tag
        run: |
          docker build -t myapp:${{ steps.version.outputs.version }} .
          docker tag myapp:${{ steps.version.outputs.version }} myapp:latest

The pattern is: set id on the producing step, write name=value to $GITHUB_OUTPUT, then reference with ${{ steps.<id>.outputs.<name> }}. The output file supports only single-line values — for multi-line content, use a delimiter:

# Multi-line output
echo "changelog<> $GITHUB_OUTPUT
git log --oneline HEAD~5..HEAD >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT

Tip: Step outputs are the primary way to pass computed values between steps. Unlike environment variables ($GITHUB_ENV), outputs are namespaced by step ID, making them more explicit and less prone to naming collisions.

⚡ Track 5: Advanced Workflows

Lesson 9: Job Outputs

Job outputs let you pass data between jobs. Since jobs run on separate runners, you can't share files directly — job outputs provide a way to pass small values (strings, numbers, JSON) from one job to dependent jobs.

jobs:
  build:
    runs-on: ubuntu-latest
    # Declare outputs at the job level
    outputs:
      version: ${{ steps.version.outputs.version }}
      image-tag: ${{ steps.docker.outputs.tag }}
      should-deploy: ${{ steps.check.outputs.deploy }}

    steps:
      - uses: actions/checkout@v4

      - name: Determine version
        id: version
        run: echo "version=$(cat VERSION)" >> $GITHUB_OUTPUT

      - name: Build Docker image
        id: docker
        run: |
          TAG="${{ steps.version.outputs.version }}-$(git rev-parse --short HEAD)"
          docker build -t myapp:$TAG .
          echo "tag=$TAG" >> $GITHUB_OUTPUT

      - name: Check deployment conditions
        id: check
        run: |
          if [[ "${{ github.ref }}" == "refs/heads/main" ]]; then
            echo "deploy=true" >> $GITHUB_OUTPUT
          else
            echo "deploy=false" >> $GITHUB_OUTPUT
          fi

  deploy:
    needs: build
    runs-on: ubuntu-latest
    if: needs.build.outputs.should-deploy == 'true'
    steps:
      - name: Deploy version
        run: |
          echo "Deploying image: myapp:${{ needs.build.outputs.image-tag }}"
          echo "Version: ${{ needs.build.outputs.version }}"

Job outputs have a maximum size of 1 MB per job and 50 MB total per workflow run. For larger data, use artifacts instead. The output values must be strings — for structured data, serialize to JSON and parse with fromJson() in the consuming job.

Tip: Job outputs are best for small, scalar values like version numbers, flags, and identifiers. For files, build artifacts, or any data over a few KB, use upload-artifact and download-artifact instead.

⚡ Track 5: Advanced Workflows

Lesson 10: Dynamic Matrix

Sometimes you need matrix values that aren't known at YAML-write time. Dynamic matrices generate matrix values from a previous step's output, enabling data-driven pipelines that adapt to your repository's content.

name: Dynamic Matrix

on: [push]

jobs:
  detect-changes:
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.set-matrix.outputs.matrix }}
    steps:
      - uses: actions/checkout@v4

      - name: Detect changed services
        id: set-matrix
        run: |
          # Find which services have changed files
          CHANGED=$(git diff --name-only HEAD~1 | grep '^services/' | cut -d'/' -f2 | sort -u)
          # Convert to JSON array
          MATRIX=$(echo "$CHANGED" | jq -R -s -c 'split("
") | map(select(length > 0))')
          echo "matrix=$MATRIX" >> $GITHUB_OUTPUT
          echo "Will test: $MATRIX"

  test:
    needs: detect-changes
    if: needs.detect-changes.outputs.matrix != '[]'
    runs-on: ubuntu-latest
    strategy:
      matrix:
        service: ${{ fromJson(needs.detect-changes.outputs.matrix) }}
    steps:
      - uses: actions/checkout@v4
      - name: Test ${{ matrix.service }}
        run: |
          cd services/${{ matrix.service }}
          npm test

The key function is fromJson(), which converts a JSON string from a job output into a matrix value. The producing job constructs a JSON array, and the consuming job parses it into matrix entries.

This pattern is powerful for monorepos: instead of testing everything on every push, you detect which services changed and only test those. For a monorepo with 20 services, this can reduce CI time from 20 parallel jobs to just the 1-3 that actually changed.

Tip: Always handle the empty case. If no services changed, the matrix would be empty, and GitHub Actions would fail. Use if: needs.detect-changes.outputs.matrix != '[]' to skip the job when there's nothing to test.

⚡ Track 5: Advanced Workflows

Lesson 11: Path Filters

Path filters control which file changes trigger a workflow. They're one of the most effective ways to reduce unnecessary CI runs, especially in monorepos or repositories with documentation alongside code.

# Only run when source code or tests change
on:
  push:
    paths:
      - 'src/**'
      - 'tests/**'
      - 'requirements*.txt'
      - 'pyproject.toml'
    paths-ignore:
      - 'docs/**'
      - '*.md'
      - '.github/ISSUE_TEMPLATE/**'

# Separate workflow for documentation
---
name: Docs
on:
  push:
    paths:
      - 'docs/**'
      - 'mkdocs.yml'
jobs:
  deploy-docs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install mkdocs-material && mkdocs build

You can use paths (include list) or paths-ignore (exclude list), but not both in the same event. If you use paths, only changes matching those patterns trigger the workflow. If you use paths-ignore, all changes trigger except those matching the patterns.

Glob patterns are supported: ** matches any number of directories, * matches any characters in a filename, and ? matches a single character. Patterns are matched against the full file path from the repository root.

Tip: At minimum, every project should have paths-ignore: ['*.md', 'docs/**'] on their CI workflow. Changes to README files and documentation don't need to trigger a full test suite. This simple filter can save thousands of CI minutes per month.

⚡ Track 5: Advanced Workflows

Lesson 12: Manual Inputs

The workflow_dispatch trigger creates a "Run workflow" button in the Actions tab, with optional input parameters. This is perfect for on-demand operations like deployments, data migrations, and maintenance tasks.

name: Manual Deployment

on:
  workflow_dispatch:
    inputs:
      environment:
        description: 'Target environment'
        required: true
        type: choice
        options:
          - staging
          - production
      version:
        description: 'Version to deploy (e.g., v1.2.3)'
        required: true
        type: string
      dry-run:
        description: 'Dry run (simulate without deploying)?'
        required: false
        type: boolean
        default: false
      log-level:
        description: 'Log verbosity'
        required: false
        type: choice
        options:
          - info
          - debug
          - trace
        default: info

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: ${{ inputs.environment }}
    steps:
      - uses: actions/checkout@v4
        with:
          ref: ${{ inputs.version }}

      - name: Deploy
        run: |
          echo "Environment: ${{ inputs.environment }}"
          echo "Version: ${{ inputs.version }}"
          echo "Dry run: ${{ inputs.dry-run }}"
          echo "Log level: ${{ inputs.log-level }}"

          if [[ "${{ inputs.dry-run }}" == "true" ]]; then
            echo "DRY RUN — skipping actual deployment"
          else
            ./deploy.sh --env ${{ inputs.environment }}
          fi

Supported input types: string (free text), choice (dropdown), boolean (checkbox), and environment (environment selector). Maximum 10 inputs per workflow, with a 65,535 character payload limit.

Tip: Always include a dry-run boolean input for destructive operations. It lets operators verify what would happen before actually doing it. Default it to true for extra safety.

⚡ Track 5: Advanced Workflows

Lesson 13: Scheduled Workflows

The schedule trigger runs workflows on a cron schedule, useful for nightly builds, dependency updates, security scans, and data refresh tasks. Schedules use POSIX cron syntax in UTC timezone.

on:
  schedule:
    # ┌───── minute (0-59)
    # │ ┌───── hour (0-23)
    # │ │ ┌───── day of month (1-31)
    # │ │ │ ┌───── month (1-12)
    # │ │ │ │ ┌───── day of week (0-6, Sunday=0)
    # │ │ │ │ │
    - cron: '30 5 * * 1-5'   # 5:30 AM UTC, weekdays
    - cron: '0 0 * * 0'      # Midnight UTC, Sundays

# Common patterns:
# '0 * * * *'        - Every hour
# '0 6 * * *'        - Daily at 6 AM UTC
# '0 6 * * 1-5'      - Weekdays at 6 AM UTC
# '0 0 * * 0'        - Weekly on Sunday midnight
# '0 0 1 * *'        - Monthly on the 1st
# '*/15 * * * *'     - Every 15 minutes

Example: Nightly dependency check and test:

name: Nightly CI

on:
  schedule:
    - cron: '0 4 * * *'  # 4 AM UTC daily

jobs:
  test-latest-deps:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - name: Install latest dependencies (no lock file)
        run: pip install -r requirements.txt --upgrade
      - name: Run tests
        run: pytest tests/ -v
      - name: Notify on failure
        if: failure()
        run: |
          curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
            -d '{"text": "⚠️ Nightly CI failed! Latest dependencies may have breaking changes."}'

Important limitations: scheduled workflows only run on the default branch. The shortest interval is every 5 minutes, but GitHub may delay or skip scheduled runs during high-load periods. Scheduled workflows may also be disabled after 60 days of repository inactivity.

Warning: Cron schedules are in UTC. A schedule of '0 9 * * *' is 9 AM UTC, which is 1 AM PST, 4 AM EST, or 6 PM JST. Always calculate the local time equivalent and add a comment for clarity.

⚡ Track 5: Advanced Workflows

Lesson 14: Repository Dispatch

Repository dispatch triggers workflows from external events via the GitHub API. This lets you integrate GitHub Actions with any system that can make HTTP requests — webhooks from other services, custom scripts, Slack bots, or even other CI systems.

# Workflow triggered by external event
name: External Trigger

on:
  repository_dispatch:
    types: [deploy, run-tests, refresh-data]

jobs:
  handle-event:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Handle deploy event
        if: github.event.action == 'deploy'
        run: |
          echo "Deploy triggered by: ${{ github.event.client_payload.triggered_by }}"
          echo "Version: ${{ github.event.client_payload.version }}"
          echo "Environment: ${{ github.event.client_payload.environment }}"

      - name: Handle test event
        if: github.event.action == 'run-tests'
        run: |
          echo "Running tests for: ${{ github.event.client_payload.test_suite }}"
          pytest ${{ github.event.client_payload.test_suite }}

Triggering from external systems:

# Trigger via curl
curl -X POST \
  -H "Accept: application/vnd.github+json" \
  -H "Authorization: Bearer ghp_YOUR_TOKEN" \
  https://api.github.com/repos/OWNER/REPO/dispatches \
  -d '{
    "event_type": "deploy",
    "client_payload": {
      "version": "v1.2.3",
      "environment": "production",
      "triggered_by": "slack-bot"
    }
  }'

# Trigger via Python
import requests

requests.post(
    "https://api.github.com/repos/OWNER/REPO/dispatches",
    headers={
        "Authorization": "Bearer ghp_YOUR_TOKEN",
        "Accept": "application/vnd.github+json"
    },
    json={
        "event_type": "deploy",
        "client_payload": {
            "version": "v1.2.3",
            "environment": "production"
        }
    }
)

Tip: Repository dispatch is powerful for cross-repo automation. When Service A deploys, it can trigger tests or deployments in Service B via repository dispatch. This creates loosely coupled, event-driven CI/CD across your organization.

⚡ Track 5: Advanced Workflows

Track 5 Quiz

Test your knowledge — 5 questions, +25 XP each correct answer

Q1. What does `cancel-in-progress: true` do in a concurrency group?

When cancel-in-progress is true, a new workflow run in the same concurrency group cancels the currently running one. This is ideal for CI branches where only the latest push matters.

Q2. What is the maximum size of job outputs per job?

Job outputs have a maximum size of 1 MB per job and 50 MB total per workflow run. For larger data, use artifacts instead.

Q3. Which function converts a JSON string from a job output into a matrix value?

The fromJson() function parses a JSON string into a GitHub Actions value, enabling dynamic matrix strategies based on computed data from previous jobs.

Q4. How many inputs can a workflow_dispatch trigger have?

workflow_dispatch supports a maximum of 10 inputs per workflow, with a total payload limit of 65,535 characters.

Q5. What is the default job timeout in GitHub Actions?

The default timeout is 360 minutes (6 hours). You should set explicit, shorter timeouts on your jobs to catch hung processes early and save runner minutes.

🚀 Track 6: Deployment Pipelines

Lesson 1: Deployment Strategies

Choosing the right deployment strategy depends on your application's architecture, tolerance for downtime, and risk appetite. Here are the main strategies used in production:

Strategy	How It Works	Downtime	Risk	Rollback Speed
Direct/In-place	Replace old version with new	Brief	High	Slow (redeploy)
Blue-Green	Two identical envs; switch traffic	Zero	Low	Instant (switch back)
Canary	Route % of traffic to new version	Zero	Very low	Fast (route to old)
Rolling	Gradually replace instances	Zero	Medium	Medium

Direct/in-place is the simplest — stop the old version, start the new one. Fine for small apps and staging environments, but causes downtime and has no easy rollback.

Blue-Green maintains two identical production environments. The "blue" environment runs the current version while "green" gets the new version. After testing green, you switch the load balancer to point at green. Rollback is instant — switch back to blue. The downside is maintaining two full environments.

Canary releases send a small percentage of traffic (e.g., 5%) to the new version while the rest continues hitting the old version. If metrics look good, gradually increase to 100%. This minimizes blast radius — if the new version has issues, only 5% of users are affected.

Rolling updates replace instances one at a time. In a cluster of 10 servers, you update one, verify it's healthy, then update the next. At any point during the rollout, both versions are serving traffic.

Tip: Start with direct deployment for staging and blue-green for production. As your team matures, move to canary deployments. The strategy should match your risk tolerance and infrastructure capabilities.

🚀 Track 6: Deployment Pipelines

Lesson 2: GitHub Pages

GitHub Pages provides free static site hosting directly from your repository. Combined with GitHub Actions, it's a zero-cost deployment pipeline for documentation, blogs, portfolios, and any static website.

name: Deploy to GitHub Pages

on:
  push:
    branches: [main]

permissions:
  contents: read
  pages: write
  id-token: write

concurrency:
  group: pages
  cancel-in-progress: true

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - name: Build
        run: |
          npm ci
          npm run build

      - name: Upload Pages artifact
        uses: actions/upload-pages-artifact@v3
        with:
          path: dist/

  deploy:
    needs: build
    runs-on: ubuntu-latest
    environment:
      name: github-pages
      url: ${{ steps.deploy.outputs.page_url }}
    steps:
      - name: Deploy to GitHub Pages
        id: deploy
        uses: actions/deploy-pages@v4

This workflow uses GitHub's native Pages deployment actions, which handle all the infrastructure. The build job produces the static files, uploads them as a Pages artifact, and the deploy job publishes them to GitHub Pages.

For simpler setups or sites using Jekyll, Hugo, or plain HTML, you can also use the popular peaceiris/actions-gh-pages action which pushes to a gh-pages branch.

Tip: GitHub Pages with Actions is the easiest way to deploy a static site. Enable it in Settings → Pages → Source → GitHub Actions. Your site will be at https://username.github.io/repo-name/.

🚀 Track 6: Deployment Pipelines

Lesson 3: Docker Build & Push

Containerizing your application with Docker is the foundation of modern deployment. CI/CD pipelines typically build the Docker image, run tests against it, push it to a registry, and then deploy from the registry.

name: Docker Build & Push

on:
  push:
    branches: [main]
    tags: ['v*']

jobs:
  docker:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write  # For GitHub Container Registry

    steps:
      - uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Login to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ghcr.io/${{ github.repository }}
          tags: |
            type=ref,event=branch
            type=semver,pattern={{version}}
            type=sha,prefix=

      - name: Build and push
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

This workflow uses Docker Buildx for multi-platform builds and layer caching. The docker/metadata-action automatically generates appropriate tags — branch names for branches, version numbers for tags, and short SHAs for every push.

The cache-from/cache-to: type=gha settings use GitHub's built-in caching to store Docker layers between builds, dramatically speeding up rebuilds when only a few layers change.

Tip: Push to GitHub Container Registry (ghcr.io) by default — it's free for public repos and integrates seamlessly with GitHub permissions. Only push to Docker Hub or cloud-specific registries (ECR, GCR, ACR) if your deployment target requires it.

🚀 Track 6: Deployment Pipelines

Lesson 4: Cloud Deployment: AWS

Deploying to AWS from GitHub Actions uses OIDC for authentication (no stored credentials) and AWS CLI or specialized actions for the actual deployment. Here are common patterns for different AWS services:

Deploy to Amazon ECS (containers):

jobs:
  deploy-ecs:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsRole
          aws-region: us-east-1

      - name: Login to Amazon ECR
        uses: aws-actions/amazon-ecr-login@v2

      - name: Build and push to ECR
        run: |
          docker build -t $ECR_REGISTRY/myapp:${{ github.sha }} .
          docker push $ECR_REGISTRY/myapp:${{ github.sha }}

      - name: Update ECS service
        run: |
          aws ecs update-service \
            --cluster production \
            --service myapp \
            --force-new-deployment

Deploy static site to S3 + CloudFront:

- name: Deploy to S3
  run: |
    aws s3 sync ./dist s3://my-website-bucket/ --delete
    aws cloudfront create-invalidation \
      --distribution-id ${{ secrets.CF_DISTRIBUTION_ID }} \
      --paths "/*"

Deploy Lambda function:

- name: Deploy Lambda
  run: |
    zip -r function.zip src/ requirements.txt
    aws lambda update-function-code \
      --function-name my-function \
      --zip-file fileb://function.zip

Tip: Always use OIDC (aws-actions/configure-aws-credentials@v4 with role-to-assume) instead of storing AWS access keys as secrets. It's more secure and eliminates the need for key rotation.

🚀 Track 6: Deployment Pipelines

Lesson 5: Cloud Deployment: GCP & Azure

Google Cloud Platform — Deploy to Cloud Run:

jobs:
  deploy-gcp:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v4

      - name: Authenticate to GCP
        uses: google-github-actions/auth@v2
        with:
          workload_identity_provider: 'projects/123/locations/global/workloadIdentityPools/github/providers/my-repo'
          service_account: 'deploy@my-project.iam.gserviceaccount.com'

      - name: Set up Cloud SDK
        uses: google-github-actions/setup-gcloud@v2

      - name: Build and deploy to Cloud Run
        run: |
          gcloud builds submit --tag gcr.io/my-project/myapp:${{ github.sha }}
          gcloud run deploy myapp \
            --image gcr.io/my-project/myapp:${{ github.sha }} \
            --region us-central1 \
            --platform managed

Azure — Deploy to App Service:

jobs:
  deploy-azure:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Login to Azure
        uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

      - name: Deploy to Azure Web App
        uses: azure/webapps-deploy@v3
        with:
          app-name: my-web-app
          package: ./dist

      - name: Azure logout
        run: az logout

All three major cloud providers support OIDC authentication from GitHub Actions, eliminating the need for stored credentials. The setup varies slightly (AWS uses IAM roles, GCP uses Workload Identity Federation, Azure uses App Registrations), but the principle is the same: short-lived tokens exchanged at runtime.

Tip: Test cloud deployments in a staging environment first. Create a separate cloud project/account for staging with the same configuration as production. This catches deployment-specific issues without affecting users.

🚀 Track 6: Deployment Pipelines

Lesson 6: SSH Deployment

For applications hosted on traditional VPS servers (DigitalOcean, Linode, Hetzner), SSH-based deployment is a straightforward approach. You connect to the server via SSH and execute deployment commands.

name: Deploy via SSH

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup SSH key
        run: |
          mkdir -p ~/.ssh
          echo "${{ secrets.SSH_PRIVATE_KEY }}" > ~/.ssh/deploy_key
          chmod 600 ~/.ssh/deploy_key
          ssh-keyscan -H ${{ secrets.SERVER_HOST }} >> ~/.ssh/known_hosts

      - name: Deploy with rsync
        run: |
          rsync -avz --delete \
            -e "ssh -i ~/.ssh/deploy_key" \
            ./dist/ \
            ${{ secrets.SERVER_USER }}@${{ secrets.SERVER_HOST }}:/var/www/myapp/

      - name: Restart application
        run: |
          ssh -i ~/.ssh/deploy_key ${{ secrets.SERVER_USER }}@${{ secrets.SERVER_HOST }} << 'EOF'
            cd /var/www/myapp
            npm install --production
            pm2 restart myapp
            echo "Deployment complete!"
          EOF

      - name: Smoke test
        run: |
          sleep 10
          STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://myapp.example.com/health)
          if [ "$STATUS" != "200" ]; then
            echo "Health check failed with status $STATUS"
            exit 1
          fi
          echo "Health check passed!"

The workflow stores the SSH private key as a secret, sets up the SSH configuration, uses rsync to sync files to the server, then executes remote commands to restart the application. A smoke test at the end verifies the deployment succeeded.

Warning: SSH deployment grants full server access. Use a dedicated deploy user with minimal permissions, disable password authentication, and restrict the SSH key to specific commands if possible. Never use the root user for deployments.

🚀 Track 6: Deployment Pipelines

Lesson 7: Vercel & Netlify

Vercel and Netlify offer the simplest deployment experience for frontend applications. Both platforms can deploy automatically from GitHub without any workflow configuration, but using GitHub Actions gives you more control over the process.

Vercel deployment with GitHub Actions:

name: Deploy to Vercel

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Deploy to Vercel
        run: |
          npm install -g vercel
          vercel pull --yes --environment=production --token=${{ secrets.VERCEL_TOKEN }}
          vercel build --prod --token=${{ secrets.VERCEL_TOKEN }}
          vercel deploy --prebuilt --prod --token=${{ secrets.VERCEL_TOKEN }}
        env:
          VERCEL_ORG_ID: ${{ secrets.VERCEL_ORG_ID }}
          VERCEL_PROJECT_ID: ${{ secrets.VERCEL_PROJECT_ID }}

Netlify deployment with GitHub Actions:

name: Deploy to Netlify

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - run: npm ci && npm run build

      - name: Deploy to Netlify
        uses: netlify/actions/cli@master
        with:
          args: deploy --prod --dir=dist
        env:
          NETLIFY_AUTH_TOKEN: ${{ secrets.NETLIFY_AUTH_TOKEN }}
          NETLIFY_SITE_ID: ${{ secrets.NETLIFY_SITE_ID }}

Both platforms also support preview deployments for pull requests — every PR gets its own unique URL for testing. This is incredibly valuable for design review and QA before merging.

Tip: Even if you use Vercel/Netlify's built-in GitHub integration for automatic deploys, consider adding a GitHub Actions workflow for running tests before deployment. The platform integration skips your CI checks.

🚀 Track 6: Deployment Pipelines

Lesson 8: Release Creation

Automating release creation ensures that every release is consistent, well-documented, and tagged properly. GitHub Actions can create releases automatically when you push a tag.

name: Create Release

on:
  push:
    tags:
      - 'v*'  # Trigger on version tags: v1.0.0, v2.1.3, etc.

jobs:
  release:
    runs-on: ubuntu-latest
    permissions:
      contents: write
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history for changelog

      - name: Generate changelog
        id: changelog
        run: |
          # Get commits since last tag
          PREVIOUS_TAG=$(git describe --tags --abbrev=0 HEAD^ 2>/dev/null || echo "")
          if [ -z "$PREVIOUS_TAG" ]; then
            CHANGES=$(git log --oneline --pretty=format:"- %s (%h)" HEAD)
          else
            CHANGES=$(git log --oneline --pretty=format:"- %s (%h)" $PREVIOUS_TAG..HEAD)
          fi
          echo "changes<> $GITHUB_OUTPUT
          echo "$CHANGES" >> $GITHUB_OUTPUT
          echo "EOF" >> $GITHUB_OUTPUT

      - name: Create GitHub release
        run: |
          gh release create ${{ github.ref_name }} \
            --title "Release ${{ github.ref_name }}" \
            --notes "${{ steps.changelog.outputs.changes }}" \
            --verify-tag
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

      # Optionally attach build artifacts
      - run: npm ci && npm run build
      - run: tar czf dist.tar.gz dist/
      - run: |
          gh release upload ${{ github.ref_name }} dist.tar.gz
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

The workflow triggers on tag pushes matching v*, generates a changelog from commit messages since the last tag, and creates a GitHub release with the changelog as the release notes. Build artifacts can be attached to the release for easy download.

Tip: Use the gh CLI (pre-installed on runners) for release management — it's simpler than using the GitHub API directly and handles authentication via GITHUB_TOKEN automatically.

🚀 Track 6: Deployment Pipelines

Lesson 9: Version Bumping

Automated version bumping removes the manual step of updating version numbers. Tools like semantic-release analyze commit messages to determine the next version, update version files, create tags, and generate changelogs automatically.

How it works: Conventional commit messages encode the type of change:

# Patch release (1.0.0 → 1.0.1)
fix: resolve login timeout issue
fix(auth): handle expired tokens correctly

# Minor release (1.0.0 → 1.1.0)
feat: add dark mode toggle
feat(dashboard): add export to CSV

# Major release (1.0.0 → 2.0.0)
feat!: redesign API endpoints
BREAKING CHANGE: /api/v1 is now /api/v2

Semantic release workflow:

name: Release

on:
  push:
    branches: [main]

jobs:
  release:
    runs-on: ubuntu-latest
    permissions:
      contents: write
      issues: write
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install semantic-release
        run: npm install -g semantic-release @semantic-release/changelog @semantic-release/git

      - name: Release
        run: npx semantic-release
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Semantic-release reads the commit history since the last release, determines the version bump type, updates the version in package.json, generates a CHANGELOG.md, creates a git tag, and publishes a GitHub release — all automatically.

Tip: Enforce conventional commits with a commit-msg hook (using commitlint) and a CI check. This ensures every commit message follows the format, so semantic-release can always determine the correct version bump.

🚀 Track 6: Deployment Pipelines

Lesson 10: Database Migrations

Running database migrations as part of your CI/CD pipeline ensures that database schema changes are applied consistently and automatically. This eliminates the risky practice of running migrations manually in production.

name: Deploy with Migrations

on:
  push:
    branches: [main]

jobs:
  test-migrations:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_DB: testdb
          POSTGRES_USER: test
          POSTGRES_PASSWORD: testpass
        ports: ['5432:5432']
        options: --health-cmd pg_isready --health-interval 10s
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Test migrations
        run: |
          pip install -r requirements.txt
          # Run migrations against test database
          alembic upgrade head
          # Verify migrations are reversible
          alembic downgrade -1
          alembic upgrade head
        env:
          DATABASE_URL: postgresql://test:testpass@localhost:5432/testdb

  deploy:
    needs: test-migrations
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4

      - name: Run production migrations
        run: |
          pip install -r requirements.txt
          alembic upgrade head
        env:
          DATABASE_URL: ${{ secrets.PRODUCTION_DATABASE_URL }}

      - name: Deploy application
        run: ./deploy.sh

The workflow first tests migrations against a disposable database (using GitHub's service containers), verifying both the upgrade and downgrade paths. Only after migrations pass testing does it run them against the production database.

Warning: Always test migration rollbacks (downgrade). A migration that can't be rolled back is a one-way door — if something goes wrong after migrating, you can't easily undo it. Test both upgrade and downgrade in CI.

🚀 Track 6: Deployment Pipelines

Lesson 11: Smoke Tests

Smoke tests are lightweight checks that verify your application is running correctly after deployment. They're the final safety net — if smoke tests fail, you know the deployment broke something and should be rolled back.

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4
      - run: ./deploy.sh

  smoke-test:
    needs: deploy
    runs-on: ubuntu-latest
    steps:
      - name: Wait for deployment to stabilize
        run: sleep 30

      - name: Health check
        run: |
          STATUS=$(curl -sf -o /dev/null -w "%{http_code}" https://myapp.example.com/health)
          if [ "$STATUS" != "200" ]; then
            echo "❌ Health check failed: HTTP $STATUS"
            exit 1
          fi
          echo "✅ Health check passed: HTTP $STATUS"

      - name: API smoke test
        run: |
          # Test critical endpoints
          ENDPOINTS=(
            "https://myapp.example.com/api/status"
            "https://myapp.example.com/api/users?limit=1"
            "https://myapp.example.com/api/products?limit=1"
          )
          for URL in "${ENDPOINTS[@]}"; do
            STATUS=$(curl -sf -o /dev/null -w "%{http_code}" "$URL")
            if [ "$STATUS" != "200" ]; then
              echo "❌ FAIL: $URL returned $STATUS"
              exit 1
            fi
            echo "✅ PASS: $URL"
          done

      - name: Response time check
        run: |
          TIME=$(curl -sf -o /dev/null -w "%{time_total}" https://myapp.example.com/)
          echo "Response time: ${TIME}s"
          if (( $(echo "$TIME > 5.0" | bc -l) )); then
            echo "❌ Response too slow: ${TIME}s > 5.0s"
            exit 1
          fi

Smoke tests should be fast (under 2 minutes), reliable (no flaky assertions), and test only critical paths. They're not a replacement for your full test suite — they're a quick sanity check that the deployment didn't break anything fundamental.

Tip: Include a /health endpoint in every application that returns 200 when the app is running and can connect to its dependencies (database, cache, external APIs). This single endpoint powers health checks across monitoring, load balancers, and CI smoke tests.

🚀 Track 6: Deployment Pipelines

Lesson 12: Rollback

When a deployment goes wrong, you need to get back to a working state as quickly as possible. Automated rollback triggered by failed smoke tests minimizes downtime and human panic.

jobs:
  deploy:
    runs-on: ubuntu-latest
    outputs:
      previous-version: ${{ steps.version.outputs.previous }}
    steps:
      - uses: actions/checkout@v4

      - name: Record current version
        id: version
        run: |
          PREV=$(curl -s https://myapp.example.com/api/version | jq -r '.version')
          echo "previous=$PREV" >> $GITHUB_OUTPUT

      - name: Deploy new version
        run: ./deploy.sh

  smoke-test:
    needs: deploy
    runs-on: ubuntu-latest
    steps:
      - name: Run smoke tests
        id: smoke
        run: |
          sleep 15
          curl -sf https://myapp.example.com/health || exit 1

  rollback:
    needs: [deploy, smoke-test]
    runs-on: ubuntu-latest
    if: failure()  # Only run if smoke tests failed
    steps:
      - uses: actions/checkout@v4

      - name: Rollback to previous version
        run: |
          echo "Rolling back to: ${{ needs.deploy.outputs.previous-version }}"
          ./deploy.sh --version ${{ needs.deploy.outputs.previous-version }}

      - name: Verify rollback
        run: |
          sleep 15
          curl -sf https://myapp.example.com/health
          echo "Rollback successful"

      - name: Notify team
        run: |
          curl -X POST ${{ secrets.SLACK_WEBHOOK }} -d '{
            "text": "⚠️ Deployment rolled back! Version ${{ needs.deploy.outputs.previous-version }} restored. Check logs."
          }'

The pattern: deploy → smoke test → if smoke test fails, trigger rollback job. The rollback job restores the previous version and verifies it's working. The team is notified so they can investigate what went wrong.

Tip: Rollback should be a practiced, tested procedure — not something you figure out during an outage. Run rollback drills regularly. If your rollback process isn't automated and tested, it's not a real rollback plan.

🚀 Track 6: Deployment Pipelines

Lesson 13: Deployment Notifications

Keeping the team informed about deployment status is essential. Automated notifications via Slack, Discord, or email ensure everyone knows when a deployment succeeds, fails, or needs attention.

Slack notification:

- name: Notify Slack on success
  if: success()
  run: |
    curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
      -H 'Content-Type: application/json' \
      -d '{
        "blocks": [
          {
            "type": "section",
            "text": {
              "type": "mrkdwn",
              "text": "✅ *Deploy succeeded*\n*Repo:* ${{ github.repository }}\n*Branch:* ${{ github.ref_name }}\n*Actor:* ${{ github.actor }}\n*Commit:* `${{ github.sha }}`"
            }
          }
        ]
      }'

- name: Notify Slack on failure
  if: failure()
  run: |
    curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
      -H 'Content-Type: application/json' \
      -d '{
        "text": "❌ Deploy FAILED for ${{ github.repository }} by ${{ github.actor }}. <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View logs>"
      }'

Discord notification:

- name: Notify Discord
  if: always()
  run: |
    STATUS="${{ job.status }}"
    COLOR=$([[ "$STATUS" == "success" ]] && echo "3066993" || echo "15158332")
    curl -X POST ${{ secrets.DISCORD_WEBHOOK }} \
      -H 'Content-Type: application/json' \
      -d "{
        "embeds": [{
          "title": "Deployment $STATUS",
          "description": "${{ github.repository }} deployed by ${{ github.actor }}",
          "color": $COLOR,
          "url": "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
        }]
      }"

Tip: Don't over-notify. Send notifications for deployments to production and for failures. Mute notifications for routine CI runs on feature branches — they create noise and get ignored. Reserve the notification channel for events that require attention.

🚀 Track 6: Deployment Pipelines

Track 6 Quiz

Test your knowledge — 4 questions, +25 XP each correct answer

Q1. Which deployment strategy offers instant rollback by switching traffic between two identical environments?

Blue-Green deployment maintains two identical environments. You deploy to the inactive one, test it, then switch traffic. Rollback is instant — just switch traffic back to the original environment.

Q2. What does the `--delete` flag do in `aws s3 sync`?

The --delete flag removes files from the S3 destination that don't exist in the local source directory, ensuring S3 is an exact mirror of your build output.

Q3. Why should you test database migration rollbacks (downgrade) in CI?

A migration without a working rollback is a one-way door. If something goes wrong after applying it in production, you can't undo the schema change. Testing both upgrade and downgrade in CI catches this before production.

Q4. When should smoke tests run in a deployment pipeline?

Smoke tests run immediately after deployment to verify the application is functioning correctly in the target environment. They're the final safety net before users interact with the new version.

🤖 Track 7: CI/CD for AI/ML Projects

Lesson 1: LLM Eval in CI

Large Language Model evaluations in CI ensure that changes to prompts, model configurations, or supporting code don't degrade output quality. Just as traditional CI runs unit tests on every commit, ML CI runs evaluation suites against model outputs.

name: LLM Eval

on:
  push:
    paths:
      - 'prompts/**'
      - 'src/llm/**'
      - 'evals/**'

jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
          cache: 'pip'

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Run LLM evaluations
        run: |
          python -m evals.run_suite \
            --suite prompts/eval_suite.yaml \
            --model gpt-4o \
            --output results/eval_results.json
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

      - name: Check eval thresholds
        run: |
          python -m evals.check_thresholds \
            --results results/eval_results.json \
            --min-accuracy 0.85 \
            --min-relevance 0.80 \
            --max-latency-p95 5000

      - name: Upload eval results
        uses: actions/upload-artifact@v4
        if: always()
        with:
          name: eval-results
          path: results/

The workflow triggers only when prompt files or LLM code changes (using path filters). It runs an evaluation suite against the model, checks that quality metrics meet minimum thresholds, and uploads results for review. If any metric drops below the threshold, the pipeline fails.

Common evaluation metrics include: accuracy (correct vs. expected output), relevance (output relevance to input), coherence, latency, cost per request, and task-specific metrics like summarization quality or code generation correctness.

Tip: Start with a small eval suite (20-50 examples) that covers your most important use cases. Run it on every prompt change. Expand the suite over time as you discover edge cases and failure modes.

🤖 Track 7: CI/CD for AI/ML Projects

Lesson 2: Prompt Regression Testing

Prompt regression testing detects when changes to prompts, system messages, or few-shot examples cause quality degradation. It's the ML equivalent of snapshot testing — you compare current outputs against a known-good baseline.

# tests/test_prompt_regression.py
import json
import pytest
from pathlib import Path
from myapp.llm import generate_response

BASELINE_FILE = Path("tests/baselines/prompt_outputs.json")

@pytest.fixture
def baseline():
    return json.loads(BASELINE_FILE.read_text())

TEST_CASES = [
    {
        "id": "summarize-article",
        "input": "Summarize: The Federal Reserve raised rates by 25bps...",
        "criteria": {
            "must_contain": ["Federal Reserve", "interest rate"],
            "max_length": 200,
            "sentiment": "neutral"
        }
    },
    {
        "id": "classify-support-ticket",
        "input": "I can't log in to my account after resetting password",
        "criteria": {
            "expected_category": "authentication",
            "confidence_min": 0.8
        }
    }
]

@pytest.mark.parametrize("case", TEST_CASES, ids=[c["id"] for c in TEST_CASES])
def test_prompt_output(case, baseline):
    response = generate_response(case["input"])

    # Check structural requirements
    if "max_length" in case["criteria"]:
        assert len(response) <= case["criteria"]["max_length"]

    if "must_contain" in case["criteria"]:
        for term in case["criteria"]["must_contain"]:
            assert term.lower() in response.lower(), f"Response missing required term: {term}"

    # Compare with baseline (allow some variation)
    if case["id"] in baseline:
        from difflib import SequenceMatcher
        similarity = SequenceMatcher(None, response, baseline[case["id"]]).ratio()
        assert similarity > 0.6, f"Output diverged from baseline (similarity: {similarity:.2f})"

# CI workflow for prompt regression
- name: Run prompt regression tests
  run: pytest tests/test_prompt_regression.py -v --tb=long
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

- name: Update baselines (on main only)
  if: github.ref == 'refs/heads/main'
  run: python scripts/update_baselines.py

Tip: LLM outputs are inherently non-deterministic. Don't test for exact string matches. Instead, test for structural properties (length, format), required content (key terms, categories), and similarity to baselines (allowing 30-40% variation).

🤖 Track 7: CI/CD for AI/ML Projects

Lesson 3: Model Testing

Model testing in CI verifies that ML models produce correct, consistent outputs. This includes loading the model, running inference on test inputs, and checking that outputs meet quality thresholds.

name: Model Tests

on:
  push:
    paths:
      - 'models/**'
      - 'src/inference/**'

jobs:
  test-model:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          lfs: true  # Pull large files (model weights)

      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install dependencies
        run: |
          pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
          pip install -r requirements.txt

      - name: Run model tests
        run: |
          pytest tests/test_model.py -v \
            --model-path models/classifier_v2.pt \
            --test-data tests/fixtures/test_samples.json

# tests/test_model.py
import torch
import pytest

def test_model_loads(model_path):
    '''Model file exists and loads without errors.'''
    model = torch.load(model_path, map_location="cpu")
    assert model is not None

def test_model_output_shape(model, sample_input):
    '''Model produces expected output dimensions.'''
    output = model(sample_input)
    assert output.shape == (1, 10)  # batch=1, classes=10

def test_model_accuracy_threshold(model, test_dataset):
    '''Model accuracy meets minimum threshold.'''
    correct = 0
    total = len(test_dataset)
    for input_tensor, label in test_dataset:
        pred = model(input_tensor.unsqueeze(0)).argmax(dim=1)
        if pred.item() == label:
            correct += 1
    accuracy = correct / total
    assert accuracy >= 0.90, f"Accuracy {accuracy:.2%} below 90% threshold"

def test_model_inference_time(model, sample_input):
    '''Inference completes within latency budget.'''
    import time
    start = time.time()
    for _ in range(100):
        model(sample_input)
    avg_ms = (time.time() - start) / 100 * 1000
    assert avg_ms < 50, f"Average inference {avg_ms:.1f}ms exceeds 50ms budget"

Tip: Use CPU-only PyTorch (--index-url https://download.pytorch.org/whl/cpu) in CI to avoid installing CUDA drivers. Most model tests don't need GPU — you're testing correctness, not training speed.

🤖 Track 7: CI/CD for AI/ML Projects

Lesson 4: Data Validation

Bad data is the #1 cause of ML model failures. Data validation in CI catches data quality issues before they corrupt your models — schema violations, missing values, distribution drift, and constraint violations.

# validate_data.py — using Pandera for schema validation
import pandera as pa
from pandera import Column, Check, DataFrameSchema
import pandas as pd

training_schema = DataFrameSchema({
    "feature_1": Column(float, Check.in_range(-1.0, 1.0), nullable=False),
    "feature_2": Column(float, Check.greater_than(0), nullable=False),
    "category": Column(str, Check.isin(["A", "B", "C"]), nullable=False),
    "label": Column(int, Check.isin([0, 1]), nullable=False),
    "timestamp": Column(pa.DateTime, nullable=False),
}, coerce=True, strict=True)

def validate_training_data(filepath: str):
    df = pd.read_csv(filepath)

    # Schema validation
    training_schema.validate(df)

    # Distribution checks
    assert len(df) >= 1000, f"Too few samples: {len(df)}"
    assert df["label"].mean() > 0.1, "Severe class imbalance: <10% positive"
    assert df["label"].mean() < 0.9, "Severe class imbalance: >90% positive"
    assert df.duplicated().mean() < 0.05, "More than 5% duplicate rows"

    print(f"✅ Validation passed: {len(df)} rows, {len(df.columns)} columns")

if __name__ == "__main__":
    validate_training_data("data/training_data.csv")

# CI workflow for data validation
- name: Validate training data
  run: python validate_data.py

- name: Check data freshness
  run: |
    LATEST=$(python -c "
    import pandas as pd
    df = pd.read_csv('data/training_data.csv')
    print(df['timestamp'].max())
    ")
    echo "Latest data point: $LATEST"

Tip: Validate data at every boundary: when it enters your repository, before training, and before serving. CI is the perfect place for pre-training validation — it catches schema changes, corrupt files, and drift before they waste GPU hours on bad training runs.

🤖 Track 7: CI/CD for AI/ML Projects

Lesson 5: Notebook Testing

Jupyter notebooks are popular for ML experimentation but notoriously hard to test. They accumulate stale outputs, hidden state, and import errors that only surface when someone tries to re-run them. CI can automatically execute notebooks and verify they run cleanly.

name: Notebook CI

on:
  push:
    paths:
      - 'notebooks/**'
      - 'src/**'

jobs:
  test-notebooks:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install papermill nbval jupyter

      - name: Execute notebooks with Papermill
        run: |
          for notebook in notebooks/*.ipynb; do
            echo "Executing: $notebook"
            papermill "$notebook" "output_${notebook##*/}" \
              --no-progress-bar \
              -p DATA_PATH data/sample.csv \
              -p EPOCHS 1 \
              -p QUICK_RUN true
          done

      - name: Validate notebook outputs with nbval
        run: |
          pytest --nbval-lax notebooks/ -v

      - name: Upload executed notebooks
        uses: actions/upload-artifact@v4
        if: always()
        with:
          name: executed-notebooks
          path: output_*.ipynb

Papermill executes notebooks with parameterized inputs — you can override variables like data paths, epochs, and flags to create a "quick mode" that runs in CI without the full training time. nbval validates that notebook cells execute without errors.

Tip: Add a QUICK_RUN parameter to every notebook that, when set to True, uses a tiny dataset and minimal iterations. This lets CI verify the notebook runs end-to-end in minutes instead of hours.

🤖 Track 7: CI/CD for AI/ML Projects

Lesson 6: Large File Handling

ML projects often involve large files — model weights (hundreds of MB to GB), datasets, pre-trained embeddings. These files don't belong in regular Git, which is designed for text files and small binaries. Here's how to handle them in CI.

Git LFS (Large File Storage):

# Track large files with Git LFS
git lfs install
git lfs track "*.pt" "*.h5" "*.onnx" "*.bin"
git lfs track "data/*.csv" "data/*.parquet"
git add .gitattributes
git commit -m "Configure Git LFS tracking"

# CI with Git LFS
steps:
  - uses: actions/checkout@v4
    with:
      lfs: true  # Pull LFS files

  # Or selectively pull only needed LFS files:
  - uses: actions/checkout@v4
    with:
      lfs: false
  - run: git lfs pull --include="models/production_model.pt"

DVC (Data Version Control): For teams that need more sophisticated data management, DVC tracks data files separately and stores them in S3, GCS, or Azure Blob Storage.

- name: Pull data with DVC
  run: |
    pip install dvc[s3]
    dvc pull models/classifier.pt.dvc
  env:
    AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
    AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

Warning: Never commit model weights or large datasets directly to Git without LFS. It permanently bloats the repository — even if you delete the file later, it remains in git history. A 500 MB model file means every clone downloads 500 MB of extra data forever.

🤖 Track 7: CI/CD for AI/ML Projects

Lesson 7: GPU Runners

Some ML tests require GPU access — model training validation, GPU-specific inference tests, CUDA compatibility checks. GitHub-hosted runners don't include GPUs, so you need alternative approaches.

Option 1: Self-hosted GPU runners

jobs:
  gpu-test:
    runs-on: [self-hosted, gpu, linux]  # Custom labels
    steps:
      - uses: actions/checkout@v4

      - name: Verify GPU access
        run: nvidia-smi

      - name: Run GPU tests
        run: |
          pytest tests/test_gpu.py -v -k "gpu"
        env:
          CUDA_VISIBLE_DEVICES: "0"

Option 2: Cloud GPU services

jobs:
  gpu-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run GPU tests on cloud
        run: |
          # Use a cloud GPU service API
          # (Lambda Cloud, RunPod, Modal, etc.)
          modal run tests/gpu_test_suite.py
        env:
          MODAL_TOKEN: ${{ secrets.MODAL_TOKEN }}

Cost-saving strategies:

# Only run GPU tests on main branch, not on every PR
jobs:
  gpu-test:
    if: github.ref == 'refs/heads/main' || contains(github.event.pull_request.labels.*.name, 'gpu-test')
    runs-on: [self-hosted, gpu]
    steps:
      - uses: actions/checkout@v4
      - run: pytest tests/test_gpu.py

Tip: Reserve GPU testing for when it matters — merges to main, or PRs explicitly labeled for GPU testing. Run CPU-based tests (correctness, shapes, data pipeline) on every PR, and GPU tests only when GPU-specific code changes.

🤖 Track 7: CI/CD for AI/ML Projects

Lesson 8: Docker for ML

Docker containers provide reproducible ML environments with exact dependency versions — CUDA drivers, PyTorch builds, system libraries, and Python packages. This eliminates "works on my GPU but not yours" problems.

# Dockerfile for ML
FROM nvidia/cuda:12.1-runtime-ubuntu22.04

# System dependencies
RUN apt-get update && apt-get install -y \
    python3.12 python3.12-pip git \
    && rm -rf /var/lib/apt/lists/*

# Python ML dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY src/ /app/src/
COPY models/ /app/models/
WORKDIR /app

CMD ["python", "-m", "src.inference.serve"]

# CI workflow for ML Docker
name: ML Docker Build

on:
  push:
    branches: [main]
    paths:
      - 'Dockerfile'
      - 'requirements.txt'
      - 'src/**'

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Build ML Docker image
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: ghcr.io/${{ github.repository }}/ml-service:${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  test:
    needs: build
    runs-on: ubuntu-latest
    container:
      image: ghcr.io/${{ github.repository }}/ml-service:${{ github.sha }}
    steps:
      - name: Run inference test
        run: python -m pytest tests/ -v

Tip: Use multi-stage Docker builds to keep your final image small. Build dependencies (compilers, build tools) go in the build stage; only runtime dependencies go in the final stage. This can reduce image size by 50-80%.

🤖 Track 7: CI/CD for AI/ML Projects

Lesson 9: Cost Management

ML CI/CD can be expensive — GPU compute, large model downloads, lengthy training runs, and LLM API calls add up quickly. Smart pipeline design can reduce costs dramatically without sacrificing quality.

name: Cost-Optimized ML CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  # Cheap checks — run on every push
  lint-and-type:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install ruff mypy && ruff check . && mypy src/

  # Medium cost — run on PRs, not draft PRs
  unit-tests:
    if: github.event.pull_request.draft == false
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pytest tests/unit/ -v

  # Expensive — run only on main branch
  eval-suite:
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: python -m evals.run_full_suite
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

  # Very expensive — run weekly or on-demand only
  full-training-test:
    if: github.event_name == 'workflow_dispatch'
    runs-on: [self-hosted, gpu]
    timeout-minutes: 60
    steps:
      - uses: actions/checkout@v4
      - run: python train.py --epochs 5 --data data/validation_set.csv

Cost optimization strategies:

Use path filters to skip ML pipeline when non-ML files change
Skip expensive tests on draft PRs (github.event.pull_request.draft == false)
Run LLM evals only on main, not on every feature branch push
Use CPU runners for non-GPU tests (free tier vs. GPU costs)
Cache model downloads and datasets between runs
Use smaller test datasets in CI; full datasets only in nightly runs

Tip: Track your CI costs monthly. Add up runner minutes, API costs, and storage. If costs are growing, look for the biggest contributors — often a single expensive workflow accounts for most of the bill.

🤖 Track 7: CI/CD for AI/ML Projects

Lesson 10: Model Registry

A model registry stores validated model versions with metadata, making it easy to track which model is in production, roll back to a previous version, and audit the model lifecycle. CI can automatically push validated models to the registry.

name: Model Registry Push

on:
  push:
    branches: [main]
    paths:
      - 'models/**'
      - 'training/**'

jobs:
  validate-and-push:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          lfs: true

      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Validate model
        run: |
          pip install -r requirements.txt
          python -m tests.validate_model \
            --model models/classifier_v2.pt \
            --min-accuracy 0.92 \
            --max-size-mb 500

      - name: Push to Hugging Face Hub
        run: |
          pip install huggingface_hub
          python -c "
          from huggingface_hub import HfApi
          api = HfApi()
          api.upload_file(
              path_or_fileobj='models/classifier_v2.pt',
              path_in_repo='classifier_v2.pt',
              repo_id='myorg/my-model',
              commit_message='CI: model update from ${{ github.sha }}'
          )
          "
        env:
          HF_TOKEN: ${{ secrets.HF_TOKEN }}

      - name: Tag model version
        run: |
          VERSION=$(python -c "import json; print(json.load(open('models/metadata.json'))['version'])")
          echo "Published model version: $VERSION"

Tip: Always validate models before pushing to the registry. Check accuracy thresholds, file size, inference speed, and output format. A model that passes validation in CI can be deployed with confidence.

🤖 Track 7: CI/CD for AI/ML Projects

Lesson 11: API Testing

If your ML model is served via an API, testing that API in CI ensures that the serving layer works correctly — request handling, response format, error handling, authentication, and performance.

name: ML API Tests

on:
  push:
    paths:
      - 'src/api/**'
      - 'models/**'

jobs:
  api-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Start API server
        run: |
          python -m src.api.serve --port 8000 &
          sleep 10  # Wait for model loading

      - name: Test API endpoints
        run: |
          # Health check
          curl -sf http://localhost:8000/health | jq .

          # Prediction endpoint
          RESPONSE=$(curl -sf -X POST http://localhost:8000/predict \
            -H "Content-Type: application/json" \
            -d '{"text": "This product is amazing, I love it!"}')

          echo "Response: $RESPONSE"

          # Validate response structure
          echo "$RESPONSE" | jq -e '.prediction' > /dev/null
          echo "$RESPONSE" | jq -e '.confidence' > /dev/null
          CONFIDENCE=$(echo "$RESPONSE" | jq '.confidence')
          echo "Confidence: $CONFIDENCE"

          # Batch endpoint
          curl -sf -X POST http://localhost:8000/predict/batch \
            -H "Content-Type: application/json" \
            -d '{"texts": ["Great!", "Terrible.", "OK"]}' | jq .

          # Error handling
          STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
            -X POST http://localhost:8000/predict \
            -H "Content-Type: application/json" \
            -d '{}')
          [ "$STATUS" = "422" ] && echo "✅ Validation error handled correctly"

Tip: Test edge cases in your ML API: empty inputs, extremely long inputs, special characters, batch size limits, and concurrent requests. These are the cases that cause production incidents but are rarely caught by model evaluation alone.

🤖 Track 7: CI/CD for AI/ML Projects

Lesson 12: The Eval-Gate Pattern

The eval-gate pattern blocks deployment if model evaluation scores drop below a defined threshold. It's the ML equivalent of a test suite — if evals don't pass, the model doesn't ship.

name: Eval Gate

on:
  push:
    branches: [main]

jobs:
  evaluate:
    runs-on: ubuntu-latest
    outputs:
      passed: ${{ steps.gate.outputs.passed }}
      scores: ${{ steps.eval.outputs.scores }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install -r requirements.txt

      - name: Run evaluation suite
        id: eval
        run: |
          SCORES=$(python -m evals.run_suite --output json)
          echo "scores=$SCORES" >> $GITHUB_OUTPUT
          echo "Evaluation scores:"
          echo "$SCORES" | python -m json.tool
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

      - name: Eval gate check
        id: gate
        run: |
          python -c "
          import json, sys
          scores = json.loads('${{ steps.eval.outputs.scores }}')
          thresholds = {
              'accuracy': 0.90,
              'relevance': 0.85,
              'safety': 0.95,
              'latency_p95_ms': 3000
          }
          failed = []
          for metric, threshold in thresholds.items():
              value = scores.get(metric, 0)
              if metric == 'latency_p95_ms':
                  if value > threshold:
                      failed.append(f'{metric}: {value} > {threshold}')
              elif value < threshold:
                  failed.append(f'{metric}: {value} < {threshold}')

          if failed:
              print('❌ EVAL GATE FAILED:')
              for f in failed:
                  print(f'  - {f}')
              print('passed=false')
              sys.exit(1)
          else:
              print('✅ All eval gates passed')
              print('passed=true')
          " | tee -a $GITHUB_OUTPUT

  deploy:
    needs: evaluate
    if: needs.evaluate.outputs.passed == 'true'
    runs-on: ubuntu-latest
    environment: production
    steps:
      - run: echo "Deploying model that passed eval gates..."

The eval-gate pattern creates a formal quality bar that every model change must clear before reaching production. It protects against subtle regressions that might not be caught by traditional unit tests — a prompt change that makes responses slightly less accurate, or a model update that increases latency.

Tip: Set your initial eval thresholds based on your current production model's scores. Then gradually tighten them as you improve. The gate should catch regressions, not block every deployment.

🤖 Track 7: CI/CD for AI/ML Projects

Track 7 Quiz

Test your knowledge — 4 questions, +25 XP each correct answer

Q1. Why should LLM evaluation tests NOT check for exact string matches?

LLM outputs are inherently non-deterministic — the same prompt can produce different wordings. Tests should check structural properties (length, format), required content (key terms), and similarity to baselines rather than exact matches.

Q2. What is the purpose of a QUICK_RUN parameter in Jupyter notebooks?

A QUICK_RUN parameter lets CI execute the notebook with a tiny dataset and minimal iterations, verifying it runs end-to-end in minutes instead of the hours a full training run would take.

Q3. Why shouldn't you commit large model files directly to Git?

Git stores the full history of every file. A 500 MB model committed once means every clone downloads that 500 MB forever, even if the file is later deleted. Use Git LFS or DVC instead.

Q4. What does the eval-gate pattern do?

The eval-gate pattern acts as a quality gate — if model evaluation metrics (accuracy, relevance, safety, latency) fall below defined thresholds, the deployment is blocked. It's the ML equivalent of a failing test suite.

🏠 Track 8: Self-Hosted Runners & Advanced Ops

Lesson 1: Why Self-Hosted

Self-hosted runners are machines you manage that execute GitHub Actions workflows. While GitHub-hosted runners cover most use cases, self-hosted runners are necessary when you need capabilities that managed runners don't provide.

Common reasons to self-host:

Specialized hardware: GPUs for ML, Apple Silicon for iOS builds, high-memory machines for data processing
Network access: Reaching internal services, databases, or APIs behind a firewall
Pre-installed software: Licensed software, custom toolchains, or specific OS configurations
Performance: More CPU/RAM than standard runners, SSDs, faster network
Cost: For very high-volume usage, self-hosted can be cheaper than GitHub-hosted (though note the new $0.002/min platform charge as of March 2026)
Compliance: Data residency requirements, air-gapped environments, specific security controls

Aspect	GitHub-Hosted	Self-Hosted
Setup	Zero — just use it	You provision and maintain
Cost	Included minutes + overage	Your hardware + $0.002/min platform
Clean state	Fresh VM every job	Persistent (unless ephemeral)
Hardware	Standard specs	Whatever you want
Network	Public internet only	Your network
Maintenance	Managed by GitHub	Managed by you

Tip: Start with GitHub-hosted runners. Only add self-hosted runners when you hit a specific limitation. Every self-hosted runner is a machine you need to secure, update, and monitor — that operational overhead has real costs.

🏠 Track 8: Self-Hosted Runners & Advanced Ops

Lesson 2: Setting Up a Runner

Setting up a self-hosted runner involves downloading the runner application, configuring it, and connecting it to your GitHub repository or organization.

Step 1: Go to Settings → Actions → Runners → New self-hosted runner. GitHub shows OS-specific instructions.

Step 2: Download and configure the runner on your machine:

# Download the runner (Linux example)
mkdir actions-runner && cd actions-runner
curl -o actions-runner-linux-x64-2.321.0.tar.gz -L \
  https://github.com/actions/runner/releases/download/v2.321.0/actions-runner-linux-x64-2.321.0.tar.gz
tar xzf actions-runner-linux-x64-2.321.0.tar.gz

# Configure — connects to your repo/org
./config.sh \
  --url https://github.com/YOUR_ORG/YOUR_REPO \
  --token YOUR_REGISTRATION_TOKEN \
  --name my-runner \
  --labels gpu,linux,x64 \
  --work _work

# Start the runner
./run.sh

# Or install as a systemd service (recommended for production)
sudo ./svc.sh install
sudo ./svc.sh start

Step 3: Use the runner in workflows:

jobs:
  build:
    runs-on: [self-hosted, linux, x64]
    steps:
      - uses: actions/checkout@v4
      - run: echo "Running on self-hosted runner!"

  gpu-job:
    runs-on: [self-hosted, gpu]  # Target GPU-labeled runners
    steps:
      - uses: actions/checkout@v4
      - run: nvidia-smi

The runs-on field accepts an array of labels. The job is routed to a runner that matches ALL specified labels. This lets you target specific hardware configurations precisely.

Tip: Install the runner as a systemd service for production use. This ensures it starts automatically on boot, restarts on crashes, and can be managed with standard Linux service commands.

🏠 Track 8: Self-Hosted Runners & Advanced Ops

Lesson 3: Runner Labels

Labels are how you route jobs to specific runners. Every runner has default labels (operating system, architecture) and can have custom labels for your specific needs.

Default labels (assigned automatically): self-hosted, OS label (linux, macos, windows), architecture (X64, ARM64).

Custom labels describe your runner's capabilities:

# Add labels during configuration
./config.sh --url ... --token ... \
  --labels gpu,cuda-12,a100,high-memory

# Or add labels in GitHub UI:
# Settings → Actions → Runners → Select runner → Edit labels

# Routing jobs with labels
jobs:
  # Route to any self-hosted Linux runner
  basic:
    runs-on: [self-hosted, linux]

  # Route to GPU runner with CUDA 12
  ml-training:
    runs-on: [self-hosted, gpu, cuda-12]

  # Route to high-memory runner for data processing
  data-pipeline:
    runs-on: [self-hosted, high-memory]

  # Route to macOS ARM runner for iOS builds
  ios-build:
    runs-on: [self-hosted, macos, ARM64]

  # Route to runner in specific datacenter
  eu-deploy:
    runs-on: [self-hosted, linux, eu-west]

When multiple runners match all the labels, GitHub Actions picks one randomly. This provides basic load balancing across a pool of identical runners.

Tip: Use descriptive, hierarchical labels: gpu, gpu-a100, gpu-a100-80gb. This lets you target broadly (gpu) or specifically (gpu-a100-80gb) depending on the job's requirements.

🏠 Track 8: Self-Hosted Runners & Advanced Ops

Lesson 4: Security

Self-hosted runners introduce significant security considerations. Unlike GitHub-hosted runners (fresh VMs destroyed after each job), self-hosted runners are persistent machines that can accumulate sensitive data between jobs.

Key security risks:

Persistent state: Files, environment variables, and processes from previous jobs may be accessible to subsequent jobs
Fork PRs: Anyone who forks your public repo can submit PRs that execute code on your runner
Privilege escalation: If the runner process has elevated permissions, malicious workflows can exploit them
Network access: Runners on your internal network can reach internal services

Security best practices:

# 1. Use ephemeral runners (fresh for each job)
# Configure runner with --ephemeral flag
# ./config.sh --url ... --token ... --ephemeral

# 2. Restrict to private repos only
# Settings → Actions → General → 
# "Allow GitHub Actions from private repos only"

# 3. Use a dedicated, unprivileged user
# Never run the runner as root

# 4. Limit network access with firewall rules
# Only allow outbound to github.com and required services

# 5. Clean up after each job (if not ephemeral)
jobs:
  build:
    runs-on: self-hosted
    steps:
      - uses: actions/checkout@v4
      - run: ./build.sh
      # Always clean up
      - if: always()
        run: |
          rm -rf $RUNNER_WORKSPACE/*
          docker system prune -f

Warning: Never use self-hosted runners with public repositories unless they're ephemeral. Anyone can fork a public repo, modify the workflow, and execute arbitrary code on your runner — potentially accessing your internal network and secrets.

🏠 Track 8: Self-Hosted Runners & Advanced Ops

Lesson 5: macOS Runners

macOS runners — both GitHub-hosted and self-hosted — are essential for iOS app development, macOS software, and any project requiring Apple's development tools (Xcode, Swift, code signing).

GitHub-hosted macOS runners (available to all plans):

Label	Architecture	CPU	RAM	Minute Multiplier
`macos-13`	Intel x64	4 vCPU	14 GB	10x
`macos-14`	Apple M1	3 CPU	7 GB	10x
`macos-15`	Apple M1	3 CPU	7 GB	10x

name: iOS Build & Test

on:
  push:
    branches: [main]

jobs:
  build-ios:
    runs-on: macos-14  # Apple Silicon
    steps:
      - uses: actions/checkout@v4

      - name: Select Xcode version
        run: sudo xcode-select -s /Applications/Xcode_15.app

      - name: Build
        run: |
          xcodebuild build \
            -workspace MyApp.xcworkspace \
            -scheme MyApp \
            -destination 'platform=iOS Simulator,name=iPhone 15'

      - name: Run tests
        run: |
          xcodebuild test \
            -workspace MyApp.xcworkspace \
            -scheme MyApp \
            -destination 'platform=iOS Simulator,name=iPhone 15' \
            -resultBundlePath TestResults.xcresult

      - name: Upload test results
        uses: actions/upload-artifact@v4
        if: always()
        with:
          name: test-results
          path: TestResults.xcresult

Tip: macOS minutes are 10x more expensive than Linux. Optimize by running only macOS-specific tasks (Xcode builds, code signing, iOS testing) on macOS runners. Run linting, unit tests, and other platform-independent checks on Linux.

🏠 Track 8: Self-Hosted Runners & Advanced Ops

Lesson 6: Docker-in-Docker

Many CI pipelines need to build Docker images or run Docker containers during the build process. On GitHub-hosted runners, Docker is pre-installed. On self-hosted runners or containerized runners, you may need Docker-in-Docker (DinD) or Docker socket mounting.

# Docker is pre-installed on GitHub-hosted runners
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Build Docker image
        run: docker build -t myapp:latest .

      - name: Run tests in container
        run: |
          docker run --rm myapp:latest pytest tests/

      - name: Docker Compose for integration tests
        run: |
          docker compose -f docker-compose.test.yml up -d
          sleep 10
          docker compose -f docker-compose.test.yml exec -T app pytest tests/integration/
          docker compose -f docker-compose.test.yml down

Using service containers (GitHub's built-in approach — no DinD needed):

jobs:
  test:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_PASSWORD: testpass
        ports: ['5432:5432']
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
      redis:
        image: redis:7
        ports: ['6379:6379']

    steps:
      - uses: actions/checkout@v4
      - name: Run tests with services
        run: pytest tests/
        env:
          DATABASE_URL: postgresql://postgres:testpass@localhost:5432/postgres
          REDIS_URL: redis://localhost:6379

Tip: Prefer GitHub's services containers over Docker Compose in CI. They're simpler, have built-in health checks, and integrate natively with the workflow. Use Docker Compose only when you need complex multi-container setups that services can't express.

🏠 Track 8: Self-Hosted Runners & Advanced Ops

Lesson 7: Act — Local Testing

act is an open-source tool that runs GitHub Actions workflows locally using Docker. It's invaluable for testing workflow changes without pushing to GitHub and waiting for runners. Install it with brew install act (macOS) or from the GitHub repository.

# Run the default event (push) for all workflows
act

# Run a specific event
act pull_request

# Run a specific workflow
act -W .github/workflows/ci.yml

# Run a specific job
act -j test

# Dry run (show what would run without executing)
act -n

# Use a specific runner image (default is slim)
act -P ubuntu-latest=catthehacker/ubuntu:full-22.04

# Pass secrets
act -s MY_SECRET=myvalue
act --secret-file .secrets  # From file

# Pass event payload
act -e event.json

Example .secrets file for local testing:

# .secrets (add to .gitignore!)
GITHUB_TOKEN=ghp_xxxxxxxxxxxx
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...

Limitations of act: it doesn't support all GitHub Actions features (services, matrix strategies may have issues), some actions that depend on GitHub-specific APIs won't work, and runner environments differ slightly from GitHub's official images. Despite these limitations, it catches 80%+ of workflow errors before pushing.

Tip: Use act -n (dry run) first to verify your workflow structure and syntax without executing any steps. It's the fastest way to catch YAML errors and job configuration issues.

🏠 Track 8: Self-Hosted Runners & Advanced Ops

Lesson 8: Debugging

When workflows fail, you need effective debugging strategies. GitHub Actions provides several tools for diagnosing issues, from verbose logging to interactive debugging.

Enable debug logging:

# Option 1: Set repository secrets
# ACTIONS_STEP_DEBUG = true   → Verbose step output
# ACTIONS_RUNNER_DEBUG = true → Runner-level diagnostics

# Option 2: Re-run with debug logging
# Go to failed run → "Re-run all jobs" → Check "Enable debug logging"

# Option 3: Add debug output in your workflow
steps:
  - name: Debug information
    run: |
      echo "=== GitHub Context ==="
      echo "Event: ${{ github.event_name }}"
      echo "Ref: ${{ github.ref }}"
      echo "SHA: ${{ github.sha }}"
      echo "Actor: ${{ github.actor }}"

      echo "=== Runner Info ==="
      echo "OS: ${{ runner.os }}"
      echo "Arch: ${{ runner.arch }}"
      echo "Temp: ${{ runner.temp }}"

      echo "=== Environment ==="
      env | sort

      echo "=== Disk Space ==="
      df -h

      echo "=== Installed Tools ==="
      python --version 2>/dev/null || echo "No Python"
      node --version 2>/dev/null || echo "No Node"
      docker --version 2>/dev/null || echo "No Docker"

Common debugging patterns:

# Upload logs/artifacts on failure for inspection
- name: Upload debug logs
  if: failure()
  uses: actions/upload-artifact@v4
  with:
    name: debug-logs
    path: |
      **/*.log
      **/test-results/

# SSH into a runner for interactive debugging (using tmate)
- name: Setup tmate session
  if: failure()
  uses: mxschmitt/action-tmate@v3
  timeout-minutes: 15

Tip: The most common debugging strategy: read the failed step's logs from the bottom up, add an env | sort debug step to verify environment variables, and use if: failure() to upload artifacts and logs only when things go wrong.

🏠 Track 8: Self-Hosted Runners & Advanced Ops

Lesson 9: Performance

Optimizing CI performance means faster feedback loops and lower costs. Here are the most impactful techniques, ordered by effort-to-impact ratio.

1. Cache aggressively (biggest impact, easiest fix):

- uses: actions/cache@v4
  with:
    path: |
      ~/.cache/pip
      ~/.npm
      ~/.cargo
      node_modules/
    key: deps-${{ runner.os }}-${{ hashFiles('**/lockfile') }}
    restore-keys: deps-${{ runner.os }}-

2. Parallelize jobs:

# Instead of one sequential job doing everything:
jobs:
  lint:       { runs-on: ubuntu-latest, steps: [...] }  # 30s
  typecheck:  { runs-on: ubuntu-latest, steps: [...] }  # 45s
  unit-test:  { runs-on: ubuntu-latest, steps: [...] }  # 2m
  e2e-test:   { runs-on: ubuntu-latest, steps: [...] }  # 3m
  # Total wall time: ~3 minutes (not 6.25 minutes)

3. Use path filters: Skip CI when only docs or non-code files change.

4. Cancel redundant runs:

concurrency:
  group: ci-${{ github.ref }}
  cancel-in-progress: true

5. Optimize Docker builds:

# Use layer caching
- uses: docker/build-push-action@v6
  with:
    cache-from: type=gha
    cache-to: type=gha,mode=max

Performance benchmarks to aim for:

Pipeline Stage	Target Time
Lint + Format check	< 30 seconds
Type checking	< 1 minute
Unit tests	< 3 minutes
Integration tests	< 5 minutes
Docker build (cached)	< 2 minutes
Total pipeline	< 10 minutes

Tip: Track your pipeline duration over time. Add a step that records the total duration to a metrics service. When CI time creeps up (it always does), you'll know immediately and can investigate before it becomes painful.

🏠 Track 8: Self-Hosted Runners & Advanced Ops

Lesson 10: Monitoring

Monitoring your CI/CD system is just as important as monitoring your production application. Without monitoring, you won't know when pipelines slow down, runners become unhealthy, or failure rates spike.

Key metrics to track:

Pipeline duration: How long does CI take? Is it trending up?
Queue time: How long do jobs wait before a runner picks them up?
Success rate: What percentage of runs pass? What's the flake rate?
Runner utilization: Are self-hosted runners busy or idle?
Cost: Total runner minutes consumed, broken down by workflow

GitHub's built-in monitoring:

# Use the GitHub API to track metrics
- name: Report pipeline metrics
  if: always()
  run: |
    DURATION=$(( $(date +%s) - ${{ github.event.workflow_run.run_started_at || '0' }} ))
    curl -X POST https://your-metrics-service.com/ci-metrics \
      -H "Content-Type: application/json" \
      -d "{
        "repository": "${{ github.repository }}",
        "workflow": "${{ github.workflow }}",
        "status": "${{ job.status }}",
        "duration_seconds": $DURATION,
        "run_id": ${{ github.run_id }},
        "branch": "${{ github.ref_name }}"
      }"

2026 Update — Actions Data Stream: GitHub's 2026 security roadmap includes Actions Data Stream, which provides real-time telemetry for workflow runs, job execution, and runner activity. This will enable native integration with monitoring platforms (Datadog, Grafana, Splunk) without custom reporting steps.

Tip: Set up alerts for: pipeline duration exceeding 2x the median, success rate dropping below 90%, and queue time exceeding 5 minutes. These alerts catch problems before they significantly impact developer productivity.

🏠 Track 8: Self-Hosted Runners & Advanced Ops

Lesson 11: GitHub Apps

GitHub Apps provide a more powerful alternative to personal access tokens (PATs) for CI/CD automation. Apps can be granted fine-grained permissions, installed on specific repositories, and generate short-lived tokens — similar to OIDC but for GitHub API access.

name: Automated PR Creation

on:
  schedule:
    - cron: '0 6 * * 1'  # Weekly

jobs:
  update-deps:
    runs-on: ubuntu-latest
    steps:
      - name: Generate app token
        id: app-token
        uses: actions/create-github-app-token@v1
        with:
          app-id: ${{ secrets.APP_ID }}
          private-key: ${{ secrets.APP_PRIVATE_KEY }}

      - uses: actions/checkout@v4
        with:
          token: ${{ steps.app-token.outputs.token }}

      - name: Update dependencies
        run: |
          pip install pip-tools
          pip-compile --upgrade requirements.in -o requirements.txt

      - name: Create PR
        run: |
          git checkout -b deps/weekly-update
          git add requirements.txt
          git commit -m "chore: weekly dependency update"
          git push origin deps/weekly-update
          gh pr create \
            --title "chore: weekly dependency update" \
            --body "Automated weekly dependency update" \
            --base main
        env:
          GH_TOKEN: ${{ steps.app-token.outputs.token }}

GitHub Apps are preferred over PATs because: they have granular permissions (only access what's needed), they generate short-lived tokens, they act as their own identity (not tied to a user account), and they can be installed org-wide with consistent permissions.

Tip: Use a GitHub App whenever your workflow needs to create PRs, push commits, or interact with the GitHub API beyond what GITHUB_TOKEN allows (like triggering other workflows or accessing other repos). It's more secure than a personal access token.

🏠 Track 8: Self-Hosted Runners & Advanced Ops

Lesson 12: Alternative Platforms

While this course focuses on GitHub Actions, understanding alternatives helps you make informed decisions and migrate if needed. Here's a detailed comparison of the major CI/CD platforms in 2026:

Feature	GitHub Actions	GitLab CI	CircleCI	Jenkins
Configuration	YAML (per-repo)	YAML (per-repo)	YAML (per-repo)	Groovy / Declarative
Hosting	Cloud + self-hosted	Cloud + self-hosted	Cloud	Self-hosted only
Free tier	2,000 min/mo (private)	400 min/mo	6,000 min/mo	Free (self-hosted)
Marketplace	15,000+ actions	Templates	Orbs	1,800+ plugins
Docker support	Native	Native (best)	Native	Plugin
Matrix builds	Yes (256 max)	Yes	Yes	Yes (plugin)
Secrets mgmt	Built-in + OIDC	Built-in	Built-in	Plugin-based
Learning curve	Low	Low-Medium	Medium	High
Best for	GitHub projects	GitLab projects	Complex pipelines	Max customization

GitLab CI is the strongest alternative, especially if your code is on GitLab. It has excellent Docker integration, built-in container registry, and a comprehensive DevOps platform. Its CI/CD is arguably more mature than GitHub Actions for complex pipelines.

CircleCI excels at complex pipeline orchestration and has a generous free tier. Its "orbs" (reusable pipeline packages) are comparable to GitHub Actions marketplace.

Jenkins is the legacy choice — infinitely customizable but requires significant operational expertise. New projects rarely choose Jenkins, but it remains widely used in enterprises.

Tip: The best CI/CD platform is the one that integrates with your source control. If you're on GitHub, use GitHub Actions. If you're on GitLab, use GitLab CI. Fighting the native integration creates more friction than any feature advantage of an alternative platform.

🏠 Track 8: Self-Hosted Runners & Advanced Ops

Lesson 13: The Ideal Setup

After covering all the tools and techniques, here's what a mature, production-ready CI/CD setup looks like. This is the "ideal" configuration that balances speed, safety, cost, and maintainability.

# The ideal CI/CD architecture:
#
# GitHub-hosted runners: Standard CI/CD
#   - Linting, testing, building
#   - Docker image builds
#   - Cloud deployments (via OIDC)
#   - Scheduled maintenance tasks
#
# Self-hosted runners: Specialized needs
#   - GPU-intensive ML workloads
#   - iOS builds (macOS)
#   - Internal network access
#   - Custom hardware requirements

# .github/workflows/ci.yml — The main pipeline
name: CI/CD Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

concurrency:
  group: ci-${{ github.ref }}
  cancel-in-progress: true

permissions:
  contents: read

jobs:
  # Stage 1: Fast checks (parallel) — ~30 seconds
  lint:
    runs-on: ubuntu-latest
    timeout-minutes: 5
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '3.12', cache: 'pip' }
      - run: pip install ruff && ruff check . && ruff format --check .

  typecheck:
    runs-on: ubuntu-latest
    timeout-minutes: 5
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '3.12', cache: 'pip' }
      - run: pip install -r requirements.txt mypy && mypy src/

  # Stage 2: Tests (parallel) — ~3 minutes
  test:
    needs: [lint, typecheck]
    runs-on: ubuntu-latest
    timeout-minutes: 15
    strategy:
      matrix:
        python: ['3.11', '3.12']
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '${{ matrix.python }}', cache: 'pip' }
      - run: pip install -r requirements.txt && pytest tests/ -v --cov=src

  # Stage 3: Build — ~2 minutes
  build:
    needs: test
    runs-on: ubuntu-latest
    timeout-minutes: 10
    permissions: { contents: read, packages: write }
    steps:
      - uses: actions/checkout@v4
      - uses: docker/build-push-action@v6
        with:
          push: ${{ github.ref == 'refs/heads/main' }}
          tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  # Stage 4: Deploy (main only) — ~3 minutes
  deploy:
    needs: build
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    timeout-minutes: 10
    environment: production
    permissions: { id-token: write, contents: read }
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with: { role-to-assume: 'arn:aws:iam::123:role/Deploy', aws-region: 'us-east-1' }
      - run: ./deploy.sh

  # Stage 5: Verify
  smoke-test:
    needs: deploy
    runs-on: ubuntu-latest
    timeout-minutes: 5
    steps:
      - run: curl -sf https://myapp.example.com/health

This pipeline embodies all the best practices: fast checks first, parallel execution, caching, concurrency control, minimal permissions, OIDC authentication, environment protection, explicit timeouts, and post-deployment verification. Total wall-clock time: approximately 8-10 minutes from push to production.

Tip: This is a target, not a starting point. Begin with the simplest pipeline that provides value (lint + test on push), then add stages incrementally. A simple pipeline that runs reliably is infinitely better than a complex pipeline that's broken or ignored.

🏠 Track 8: Self-Hosted Runners & Advanced Ops

Track 8 Quiz

Test your knowledge — 5 questions, +25 XP each correct answer

Q1. What is the main security risk of using self-hosted runners with public repositories?

In a public repo, anyone can fork it, modify the workflow file, and submit a PR. If self-hosted runners are configured, the PR's workflow executes on your runner, potentially accessing your internal network and resources.

Q2. What does the `--ephemeral` flag do when configuring a self-hosted runner?

Ephemeral runners handle exactly one job and then automatically de-register. This ensures a clean state for every job, similar to GitHub-hosted runners, and is the recommended security practice.

Q3. According to the performance targets, what should the total CI pipeline time be?

The target total pipeline time is under 10 minutes. This provides fast feedback while allowing thorough testing. Individual stages should be even faster: linting under 30 seconds, unit tests under 3 minutes.

Q4. Why are GitHub Apps preferred over personal access tokens (PATs) for CI automation?

GitHub Apps offer fine-grained permissions, short-lived tokens, and act as their own identity rather than being tied to a human user. If a team member leaves, a PAT tied to their account stops working; an App continues operating.

Q5. What is the recommended approach for most teams' runner strategy?

The ideal setup uses GitHub-hosted runners for standard CI/CD (testing, building, deploying) and self-hosted runners only for specialized needs like GPU compute, internal network access, or custom hardware requirements.