Shameless plug: This is related to a EuroPython 2022 talk I am giving, My Journey Using Docker as a Development Tool.

For most of my common dev tasks, I’ve started to rely on docker/docker compose to run commands locally. I have also started using vscode’s .devcontainers, to provide a consistent environment for all developers using a project.

The main reason for this is to avoid needing to install dependencies on my host machine. In theory, all I should need is a Docker daemon and a CLI (docker CLI) to interact with that Daemon. This also makes it far easier for any new developer to start working on my project and get set up.

What inspired me to do this change now (in my banter bus project) was I wanted to upgrade to python 3:10 to use some of the new typing features released. However when I tried to upgrade my CI pipeline started failing, after hours of trying to debug it. I ended up using Docker and everything ran smoothly.

Now to have a more consistent environment between my local environment and CI. So in theory, it means less chance of something passing locally but failing in CI.

Now we know why we want to do it. let’s look at how we do it.

Before

Let’s take a look at what a typical CI pipeline may look for a Python project (using banter bus). In this example, we will be using a FastAPI web service which uses Poetry to manage its dependencies.

image: python:3.10.5

variables:
  DOCKER_DRIVER: overlay2
  PIP_CACHE_DIR: "${CI_PROJECT_DIR}/.cache/pip"
  PIP_DOWNLOAD_DIR: ".pip"
  DOCKER_HOST: tcp://docker:2375

cache:
  key: "${CI_JOB_NAME}"
  paths:
    - .cache/pip
    - .venv

.test:
  services:
    - name: mongo:4.4.4
      alias: banter-bus-database
    - name: redis:6.2.4
      alias: banter-bus-message-queue
    - name: registry.gitlab.com/banter-bus/banter-bus-management-api:test
      alias: banter-bus-management-api
    - name: registry.gitlab.com/banter-bus/banter-bus-management-api/database-seed:latest
      alias: banter-bus-database-seed
  variables:
    MONGO_INITDB_ROOT_USERNAME: banterbus
    MONGO_INITDB_ROOT_PASSWORD: banterbus
    MONGO_INITDB_DATABASE: test
    BANTER_BUS_MANAGEMENT_API_DB_USERNAME: banterbus
    BANTER_BUS_MANAGEMENT_API_DB_PASSWORD: banterbus
    BANTER_BUS_MANAGEMENT_API_DB_HOST: banter-bus-database
    BANTER_BUS_MANAGEMENT_API_DB_PORT: 27017
    BANTER_BUS_MANAGEMENT_API_DB_NAME: test
    BANTER_BUS_MANAGEMENT_API_WEB_PORT: 8090
    BANTER_BUS_MANAGEMENT_API_CLIENT_ID: client_id
    BANTER_BUS_MANAGEMENT_API_USE_AUTH: "False"
    MONGO_HOSTNAME: banter-bus-database:27017
    BANTER_BUS_CORE_API_DB_USERNAME: banterbus
    BANTER_BUS_CORE_API_DB_PASSWORD: banterbus
    BANTER_BUS_CORE_API_DB_HOST: banter-bus-database
    BANTER_BUS_CORE_API_DB_PORT: 27017
    BANTER_BUS_CORE_API_DB_NAME: test
    BANTER_BUS_CORE_API_MANAGEMENT_API_URL: http://banter-bus-management-api
    BANTER_BUS_CORE_API_MANAGEMENT_API_PORT: 8090
    BANTER_BUS_CORE_API_CLIENT_ID: client_id
    BANTER_BUS_CORE_API_USE_AUTH: "False"
    BANTER_BUS_CORE_API_MESSAGE_QUEUE_HOST: banter-bus-message-queue
    BANTER_BUS_CORE_API_MESSAGE_QUEUE_PORT: 6379

stages:
  - test

before_script:
  - pip download --dest=${PIP_DOWNLOAD_DIR} poetry
  - pip install --find-links=${PIP_DOWNLOAD_DIR} poetry
  - poetry config virtualenvs.in-project true
  - poetry install -vv

test:lint:
  stage: test
  only:
    - merge_request
  script:
    - poetry run pre-commit run --all-files

test:unit-tests:
  stage: test
  only:
    - merge_request
  script:
    - poetry run pytest -v tests/unit

test:integration-tests:
  stage: test
  only:
    - merge_request
  extends:
    - .test
  script:
    - poetry run pytest -v tests/integration

The above looks quite complicated, but very simply we install our dependencies for each job the before_script section is used in all jobs. All jobs also use python:3.9.8 image, this is where our code is cloned into the CI pipeline.

Where our .pre-commit-config.yaml looks something like this:

repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v3.3.0
    hooks:
      - id: check-yaml
        args: ["--allow-multiple-documents"]
  - repo: local
    hooks:
      - id: forbidden-files
        name: forbidden files
        entry: found copier update rejection files; review them and remove them
        language: fail
        files: "\\.rej$"
      - id: black
        name: black
        entry: poetry run black
        language: system
        types: [python]
      - id: flake8
        name: flake8
        entry: poetry run flake8
        language: system
        types: [python]
      - id: isort
        name: isort
        entry: poetry run isort --settings-path=.
        language: system
        types: [python]
      - id: pyupgrade
        name: pyupgrade
        entry: poetry run pyupgrade
        language: system
        types: [python]
        args: [--py310-plus]
      - id: mypy
        name: mypy
        description: Check python types.
        entry: poetry run mypy
        language: system
        types: [python]

pre-commit is a library we can use to add pre-commit hooks before we commit our code to git. Adding some checks that the code is consistent with the rules we defined. We can also just use it as a lint job, multiple linting tools together. Simplified. Hence here we are checking for code formatting, linting, import sorting etc. The details don’t matter but at the moment we need to have a virtualenv locally to run this.

Integration Tests

A slightly more interesting job is integration tests, it requires other docker containers, as our tests need Postgres and Redis to run. We can define these as services and then reference them in our job like so:

test:integration-tests:
  stage: test
  only:
    - merge_request
  extends:
    - .test
  script:
    - poetry run pytest -v tests/integration

Note the extends clause, which essentially merges the .test section with our job so it will look something like:

test:integration-tests:
  stage: test
  only:
    - merge_request
  services:
    - name: mongo:4.4.4
      alias: banter-bus-database
    - name: redis:6.2.4
      alias: banter-bus-message-queue
    - name: registry.gitlab.com/banter-bus/banter-bus-management-api:test
      alias: banter-bus-management-api
    - name: registry.gitlab.com/banter-bus/banter-bus-management-api/database-seed:latest
      alias: banter-bus-database-seed
  variables:
    MONGO_INITDB_ROOT_USERNAME: banterbus
    MONGO_INITDB_ROOT_PASSWORD: banterbus
    MONGO_INITDB_DATABASE: test
    BANTER_BUS_MANAGEMENT_API_DB_USERNAME: banterbus
    BANTER_BUS_MANAGEMENT_API_DB_PASSWORD: banterbus
    BANTER_BUS_MANAGEMENT_API_DB_HOST: banter-bus-database
    BANTER_BUS_MANAGEMENT_API_DB_PORT: 27017
    BANTER_BUS_MANAGEMENT_API_DB_NAME: test
    BANTER_BUS_MANAGEMENT_API_WEB_PORT: 8090
    BANTER_BUS_MANAGEMENT_API_CLIENT_ID: client_id
    BANTER_BUS_MANAGEMENT_API_USE_AUTH: "False"
    MONGO_HOSTNAME: banter-bus-database:27017
    BANTER_BUS_CORE_API_DB_USERNAME: banterbus
  	BANTER_BUS_CORE_API_DB_PASSWORD: banterbus
  	BANTER_BUS_CORE_API_DB_HOST: banter-bus-database
  	BANTER_BUS_CORE_API_DB_PORT: 27017
  	BANTER_BUS_CORE_API_DB_NAME: test
  	BANTER_BUS_CORE_API_MANAGEMENT_API_URL: http://banter-bus-management-api
  	BANTER_BUS_CORE_API_MANAGEMENT_API_PORT: 8090
  	BANTER_BUS_CORE_API_CLIENT_ID: client_id
  	BANTER_BUS_CORE_API_USE_AUTH: "False"
  	BANTER_BUS_CORE_API_MESSAGE_QUEUE_HOST: banter-bus-message-queue
  	BANTER_BUS_CORE_API_MESSAGE_QUEUE_PORT: 6379
  script:
    - poetry run pytest -v tests/integration

We also need to define a bunch of environment variables in this case so our containers can communicate with each other. Now, these are of course specific to my apps. But you can imagine a real-life project also needing a bunch of environment variables. As you can see this can get a bit messy and what is running locally may differ slightly from what is running in CI.

I have been caught out by these env variables in the past. Note variables like BANTER_BUS_CORE_API_MANAGEMENT_API_URL: http://banter-bus-management-api. The name of the container must match the URL we have provided

    - name: registry.gitlab.com/banter-bus/banter-bus-management-api:test
      alias: banter-bus-management-api

Docker DNS (link to DNS) is clever enough to work out the IP address. This is also different now to how we are running it locally.

After

Now we are running all our dev tasks in docker. We will use docker-compose to manage all of the containers, docker-compose makes managing multiple containers a lot easier. We define all of them in our docker-compose.yml file.

services:
  app:
    container_name: banter-bus-core-api
    build:
      context: .
      dockerfile: Dockerfile
      target: development
      cache_from:
        - registry.gitlab.com/banter-bus/banter-bus-core-api:development
    environment:
      XDG_DATA_HOME: /commandhistory/
      BANTER_BUS_CORE_API_DB_USERNAME: banterbus
      BANTER_BUS_CORE_API_DB_PASSWORD: banterbus
      BANTER_BUS_CORE_API_DB_HOST: banter-bus-database
      BANTER_BUS_CORE_API_DB_PORT: 27017
      BANTER_BUS_CORE_API_DB_NAME: test
      BANTER_BUS_CORE_API_MANAGEMENT_API_URL: http://banter-bus-management-api
      BANTER_BUS_CORE_API_MANAGEMENT_API_PORT: 8090
      BANTER_BUS_CORE_API_CLIENT_ID: client_id
      BANTER_BUS_CORE_API_USE_AUTH: "False"
      BANTER_BUS_CORE_API_MESSAGE_QUEUE_HOST: banter-bus-message-queue
      BANTER_BUS_CORE_API_MESSAGE_QUEUE_PORT: 6379
    ports:
      - 127.0.0.1:8080:8080
    volumes:
      - ./:/app
      - /app/.venv/ # This stops local .venv getting mounted
    depends_on:
      - database
      - management-api
      - message-queue
      - database-seed

  management-api:
    container_name: banter-bus-management-api
    image: registry.gitlab.com/banter-bus/banter-bus-management-api:test
    environment:
      BANTER_BUS_MANAGEMENT_API_DB_USERNAME: banterbus
      BANTER_BUS_MANAGEMENT_API_DB_PASSWORD: banterbus
      BANTER_BUS_MANAGEMENT_API_DB_HOST: banter-bus-database
      BANTER_BUS_MANAGEMENT_API_DB_NAME: banter_bus_management_api
      BANTER_BUS_MANAGEMENT_API_WEB_PORT: 8090
      BANTER_BUS_MANAGEMENT_API_CLIENT_ID: client_id
      BANTER_BUS_MANAGEMENT_API_USE_AUTH: "False"
    ports:
      - 127.0.0.1:8090:8090
    depends_on:
      - database

  database:
    container_name: banter-bus-database
    image: mongo:4.4.4
    environment:
      MONGO_INITDB_ROOT_USERNAME: banterbus
      MONGO_INITDB_ROOT_PASSWORD: banterbus
      MONGO_INITDB_DATABASE: banterbus
    volumes:
      - /data/db
    ports:
      - 27017:27017

  database-gui:
    container_name: banter-bus-database-gui
    image: mongoclient/mongoclient:4.0.1
    depends_on:
      - database
    environment:
      - MONGOCLIENT_DEFAULT_CONNECTION_URL=mongodb://banterbus:banterbus@banter-bus-database:27017
    volumes:
      - /data/db mongoclient/mongoclient
    ports:
      - 127.0.0.1:4000:3000

  database-seed:
    container_name: banter-bus-database-seed
    image: registry.gitlab.com/banter-bus/banter-bus-management-api/database-seed:latest
    environment:
      MONGO_INITDB_ROOT_USERNAME: banterbus
      MONGO_INITDB_ROOT_PASSWORD: banterbus
      MONGO_INITDB_DATABASE: banter_bus_management_api
      MONGO_HOSTNAME: banter-bus-database:27017
    depends_on:
      - database

  message-queue:
    container_name: banter-bus-message-queue
    image: redis:6.2.4
    volumes:
      - /data/datastore /data
    ports:
      - 127.0.0.1:6379:6379

Note: This file was already defined just not used in CI because I wanted to provide an easy way to start up my “tech stack”. So the file had gone unused.

How do run our dev tasks?

  • lint: docker compose run app poetry run pre-commit run --all-files
  • integration tests: docker compose run app poetry run pytest -v tests/integration

Then our CI pipelines could look simply like this:

image: docker

services:
  - docker:dind

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_HOST: tcp://docker:2375

before_script:
  - docker compose build

stages:
  - test

test:lint:
  stage: test
  only:
    - merge_request
  script:
    - docker compose run app poetry run pre-commit run --all-files

test:unit-tests:
  stage: test
  only:
    - merge_request
  script:
    - docker compose run app poetry run pytest -v tests/unit

test:integration:
  stage: test
  only:
    - merge_request
  script:
    -  docker compose run app poetry run pytest -v tests/integration

Now before job we build our docker images, docker compose build. Then to run the dev task we do something like:

docker compose run app <command to run>

So to run unit tests we could do:

docker compose run app poetry run pytest -v tests/unit

Aside

We could simplify this if we use makefile and make the target be poetry run pytest -v tests/unit.

.PHONY: unit_tests
unit_tests: ## Run all the unit tests
	@poetry run pytest -v tests/unit

Then our ci job would look something like:

test:unit-tests:
  stage: test
  only:
    - merge_request
  script:
    - make unit_tests

Which I think is a lot more readable and a lot easier to type. We can also leverage auto-complete on the terminal and add help targets. So a user can see all the targets they can run.