Data Centre Network Automation with APSTRA and Terraform: A CI/CD Approach
Photo by Jon Tyson / Unsplash

Data Centre Network Automation with APSTRA and Terraform: A CI/CD Approach

Table of Contents

  1. Introduction
  2. Terraform Overview
  3. Terraform with APSTRA
  4. GitLab CI/CD Pipeline Architecture
  5. Pipeline Implementation Scenarios
  6. State Management
  7. Hybrid Management Approach
  8. Scheduled Validation
  9. Examples
  10. Best Practices

Introduction

This document provides a comprehensive guide to implementing network automation in a data centre environment using APSTRA as the network orchestration platform and Terraform as the infrastructure as code tool, all managed through GitLab CI/CD pipelines.

APSTRA serves as the Single Source of Truth (SSOT) for network configuration, while Terraform provides a declarative approach to managing infrastructure. GitLab CI/CD enables automated testing, validation, and deployment of network changes.

This approach allows for:

  • Version-controlled network changes
  • Automated testing and validation
  • Controlled approvals and deployments
  • Configuration drift detection
  • Blend of manual and automated management where appropriate

Terraform Overview

What is Terraform?

Terraform is an infrastructure as code tool that allows you to define both cloud and on-premises resources in human-readable configuration files that you can version, reuse, and share. Terraform creates and manages resources through their APIs.

Key Terraform Concepts

  • HCL (HashiCorp Configuration Language): The language used to define infrastructure
  • Resources: The infrastructure components to manage
  • Providers: Plugins that interface with specific platforms (like APSTRA)
  • State: A record of what infrastructure Terraform manages
  • Plan: A preview of changes Terraform will make
  • Apply: The execution of planned changes

Terraform Workflow

The standard Terraform workflow consists of:

  1. Write: Define resources in Terraform configuration files
  2. Plan: Preview changes before applying
  3. Apply: Apply the changes to create/modify/delete resources
  4. Destroy: Remove resources when no longer needed

Terraform with APSTRA

Juniper Networks provides an official Terraform provider for APSTRA, allowing you to manage various aspects of your data centre network through Terraform.

What Can Be Managed?

The APSTRA provider allows management of:

  • Blueprints (network designs)
  • Rack types and templates
  • IP pools and VNI pools
  • Security zones and routing policies
  • Device configurations and interfaces
  • Virtual networks (VLANs)

Provider Configuration

Basic provider configuration in Terraform:

terraform {
  required_providers {
    apstra = {
      source  = "Juniper/apstra"
      version = "~> 0.x.0"
    }
  }
}

provider "apstra" {
  url       = var.apstra_url
  username  = var.apstra_username
  password  = var.apstra_password
  cert_validation = true
}

GitLab CI/CD Pipeline Architecture

GitLab CI/CD provides a framework for automating the build, test, and deployment of infrastructure changes. Here's an overview of the key components in GitLab terminology:

Pipeline Components

Diagram: GitLab CI/CD Pipeline Components

  • Pipeline: The top-level component that contains jobs and stages
  • Stages: Sequential groupings of jobs (e.g., validate, plan, deploy)
  • Jobs: The actual work units that run scripts in the CI/CD environment
  • Runners: Servers that execute the jobs
  • Artifacts: Files generated during jobs that can be passed between stages
  • Cache: Speeds up jobs by reusing content from previous runs

Pipeline Flow

Diagram: Standard Network Change Pipeline Flow

  1. Validate Stage:
    • Terraform validation job: Checks syntax and basic structure
    • Linting jobs: Ensure code quality standards
  2. Plan Stage:
    • Terraform plan job: Creates execution plan
    • Plan analysis job: Evaluates the impact of changes
  3. Approve Stage:
    • Manual approval job: Requires human review and approval
  4. Apply Stage:
    • Terraform apply job: Implements the approved changes
    • Post-deployment validation job: Verifies successful implementation
  5. Verify Stage:
    • Network testing job: Tests actual network functionality
    • Documentation job: Updates documentation based on changes

Pipeline Implementation Scenarios

Creating New Network Components

Diagram: Pipeline for New Components

  1. Network engineer creates configuration in Git repository
  2. Pipeline validates the syntax and structure
  3. Terraform plan creates a change plan
  4. Team reviews and approves the changes
  5. Terraform applies changes to APSTRA
  6. APSTRA implements changes on the network
  7. Pipeline verifies the changes were applied correctly

Example .gitlab-ci.yml for Creating Components:

stages:
  - validate
  - plan
  - approve
  - apply
  - verify

validate:
  stage: validate
  script:
    - terraform validate

plan:
  stage: plan
  script:
    - terraform plan -out=tfplan
  artifacts:
    paths:
      - tfplan

approve:
  stage: approve
  script:
    - echo "Review the plan and approve"
  when: manual

apply:
  stage: apply
  script:
    - terraform apply -auto-approve tfplan
  dependencies:
    - plan
  when: manual

verify:
  stage: verify
  script:
    - python scripts/test_connectivity.py

Modifying Existing Components

Diagram: Pipeline for Modifying Components

Modification follows a similar flow to creation but includes:

  • Generation of "diff" to show exact changes
  • Impact analysis before approval
  • Rollback plans in case of issues

Deleting Components

Diagram: Pipeline for Deleting Components

Deletion requires special handling:

  1. Identification of components to be removed
  2. Impact analysis to ensure no dependencies are broken
  3. Additional approval gates for critical components
  4. Staged deletion with testing between stages
  5. Post-deletion network validation

State Management

Terraform uses state files to track what infrastructure it manages and how it's configured.

What is Terraform State?

State is a JSON document that maps Terraform resources to real-world infrastructure and tracks metadata like resource dependencies.

Diagram: Terraform State Lifecycle

Example of a State File

{
  "version": 4,
  "terraform_version": "1.5.7",
  "serial": 15,
  "lineage": "f8b632c5-9876-5432-a123-4567890abcde",
  "outputs": {
    "blueprint_id": {
      "value": "blueprint-dc1-prod",
      "type": "string"
    }
  },
  "resources": [
    {
      "mode": "managed",
      "type": "apstra_datacenter_blueprint",
      "name": "dc1",
      "provider": "provider[\"registry.terraform.io/juniper/apstra\"]",
      "instances": [
        {
          "schema_version": 0,
          "attributes": {
            "id": "blueprint-dc1-prod",
            "name": "dc1-prod",
            "template_id": "datacenter-template"
          }
        }
      ]
    },
    {
      "mode": "managed",
      "type": "apstra_datacenter_device",
      "name": "leaf01",
      "provider": "provider[\"registry.terraform.io/juniper/apstra\"]",
      "instances": [
        {
          "schema_version": 0,
          "attributes": {
            "id": "device-leaf01",
            "blueprint_id": "blueprint-dc1-prod",
            "name": "leaf01",
            "system_id": "DCS-7050CX3-32S-D"
          }
        }
      ]
    }
  ]
}

GitLab as a State Backend

GitLab can store Terraform state files securely with versioning and locking to prevent concurrent modifications.

Diagram: GitLab Terraform State Backend

Configuration example:

terraform {
  backend "http" {
    address        = "https://gitlab.example.com/api/v4/projects/<PROJECT_ID>/terraform/state/<STATE_NAME>"
    lock_address   = "https://gitlab.example.com/api/v4/projects/<PROJECT_ID>/terraform/state/<STATE_NAME>/lock"
    unlock_address = "https://gitlab.example.com/api/v4/projects/<PROJECT_ID>/terraform/state/<STATE_NAME>/lock"
    username       = "gitlab-ci-token"
    password       = "${CI_JOB_TOKEN}"
    lock_method    = "POST"
    unlock_method  = "DELETE"
  }
}

Hybrid Management Approach

A pragmatic approach is to use a combination of manual management and Terraform automation, particularly for brownfield environments.

Diagram: Hybrid Management Model

Core Infrastructure (Manual Management)

Some infrastructure components are better managed manually:

  • Physical switch installation and initial configuration
  • Core network architecture changes
  • Software upgrades on network devices
  • Critical security patches
  • Emergency changes requiring immediate action

Project-Specific Infrastructure (Terraform)

Project-related network elements are ideal for Terraform:

  • Application-specific VLANs and subnets
  • Security zones for applications
  • Routing policies for new services
  • Non-critical network changes

Importing Existing Resources

Terraform can import existing resources to bring them under management:

# Import an existing VLAN into Terraform management
terraform import apstra_vlan.existing_vlan vlan-123

Coexistence of Manual and Automated Changes

It's important to understand that implementing a CI/CD pipeline with Terraform does not prevent manual changes through the APSTRA interface. This is both a benefit and a challenge:

Benefits of Allowing Manual Changes

  • Emergency Fixes: Network engineers can quickly respond to outages without waiting for a pipeline
  • Learning Curve: Teams can gradually adopt automation while still using familiar tools
  • Complex Operations: Some operations might be easier to perform manually than to script
  • Flexibility: Some situations require human judgment and manual intervention

Challenges of Mixed Management

  • Configuration Drift: Manual changes can cause the actual state to differ from Terraform state
  • Visibility Issues: Changes made outside the pipeline may not be properly documented
  • Potential Conflicts: Manual changes might be overwritten by subsequent pipeline runs
  • Accountability: It may be harder to track who made what changes and why

Example Scenario: Combining Manual and Automated Changes

Consider a scenario where your team manages a data centre network with both approaches:

  1. Tuesday: An urgent security patch needs to be applied to core switches:
    • Network engineer logs into APSTRA directly
    • Updates firmware on core switches
    • Implements security policy changes manually
    • Documents changes in the change management system
  2. Wednesday: Scheduled pipeline detects configuration drift:
    • Drift detection job identifies differences between Terraform state and actual network
    • Creates a GitLab issue with details
    • Team reviews changes and decides how to handle them
    • For temporary changes: Allow them to remain until they're no longer needed
    • For permanent changes: Import them into Terraform configuration
    • Update Terraform configuration to match the manual changes
  3. Friday: New deployment combines both managed elements:
    • Pipeline deploys changes that are aware of both the original automated components and the newly imported manual changes
    • Both sets of configuration are now tracked in Terraform

Thursday: Reconciliation of manual changes:

# Import the manually created security policy
terraform import apstra_security_policy.emergency_policy policy-123
# Add to Terraform to track the previously manual change
resource "apstra_security_policy" "emergency_policy" {
  blueprint_id = data.apstra_datacenter_blueprint.main.id
  name         = "emergency-security-policy"
  # Policy details matching the manual configuration
}

Monday: A new application deployment is initiated through the GitLab pipeline:

# GitLab-managed Terraform configuration for new application
resource "apstra_vlan" "app_vlan" {
  blueprint_id = data.apstra_datacenter_blueprint.main.id
  name         = "finance-app"
  vlan_id      = 300
}

resource "apstra_security_zone" "app_zone" {
  blueprint_id = data.apstra_datacenter_blueprint.main.id
  name         = "finance-secure"
  # Security policies...
}

This flexible approach recognizes that real-world network operations require both automation and manual intervention. The key is to have processes that help these approaches coexist and eventually converge.

Scheduled Validation

Regular validation ensures the network remains in sync with the intended configuration.

Diagram: Configuration Drift Detection

Drift Detection Pipeline

A scheduled pipeline runs periodically to:

  1. Fetch current state from APSTRA
  2. Compare with Terraform state
  3. Identify discrepancies (drift)
  4. Report findings
  5. Optionally generate remediation plans

Example .gitlab-ci.yml for Drift Detection:

verify_state:
  stage: verify
  script:
    - terraform init
    - terraform plan -detailed-exitcode
    # 0: No changes, 1: Error, 2: Changes present
    - |
      if [ $? -eq 2 ]; then
        echo "Drift detected! Generating report..."
        # Create report and issue
      else
        echo "No drift detected. Configuration is in sync."
      fi
  only:
    - schedules

Examples

Example 1: Deploying a New Application Infrastructure

This example shows how to deploy network infrastructure for a new HR application.

# HR application network infrastructure
resource "apstra_vlan" "hr_app" {
  blueprint_id = data.apstra_datacenter_blueprint.existing.id
  name = "hr-application"
  vlan_id = 234
}

resource "apstra_security_zone" "hr_zone" {
  blueprint_id = data.apstra_datacenter_blueprint.existing.id
  name = "hr-secure-zone"
}

resource "apstra_datacenter_routing_policy" "hr_policy" {
  blueprint_id = data.apstra_datacenter_blueprint.existing.id
  name = "hr-routing-policy"
  # Policy details...
}

Example 2: Modifying IP Address Pools

# Expand IP address pools for growing network
resource "apstra_ip_pool" "expanded_pool" {
  name = "Expanded-Loopback-Pool"
  subnets = [
    {
      network = "10.1.0.0/24"
      description = "Existing Spine Loopbacks"
    },
    {
      network = "10.1.1.0/24"
      description = "Existing Leaf Loopbacks"
    },
    {
      network = "10.1.2.0/24"
      description = "New Expansion Loopbacks"
    }
  ]
}

Example 3: Mixed Management - Manual Core with Terraform Segments

# Referencing manually-managed infrastructure
data "apstra_datacenter_blueprint" "core" {
  name = "core-infrastructure"
}

# Managing project-specific segments with Terraform
resource "apstra_vlan" "project_vlans" {
  for_each = var.project_networks
  
  blueprint_id = data.apstra_datacenter_blueprint.core.id
  name = each.key
  vlan_id = each.value.vlan_id
}

Best Practices

Repository Organization

  • Modular Structure: Organize configurations into logical modules
  • Environment Separation: Use different directories or repositories for dev/test/prod
  • Documentation: Include README files and comments
  • Consistent Naming: Establish naming conventions for all resources

State Management

  • Remote State: Always use GitLab or another remote backend, never local state
  • State Locking: Ensure locking is enabled to prevent concurrent modifications
  • State Backup: Regularly backup state files
  • Sensitive Data: Never store credentials in state (use variables)

Pipeline Configuration

  • Idempotency: Ensure all pipeline steps can be run multiple times safely
  • Timeouts: Set appropriate timeouts for network operations
  • Retry Logic: Implement retries for transient network issues
  • Notifications: Configure alerts for pipeline failures and drift detection

Hybrid Management

  • Clear Boundaries: Clearly define what's managed manually vs. by Terraform
  • Documentation: Document manual procedures for components not in Terraform
  • Import Strategy: Plan for gradually importing manual resources into Terraform
  • Emergency Procedures: Define processes for bypassing automation in emergencies

Security

  • Least Privilege: Use role-based access control for CI/CD pipeline
  • Credential Management: Store secrets in GitLab CI/CD variables or a vault
  • Audit Trail: Maintain logs of all changes
  • Review Process: Require peer review for all network changes