Lab 04 — Portable Data Pipelines with Containers

Deadlines:

End of lab session (GitHub checkpoint): commit & push your progress to your team repository.
Before next lab (eClass submission): upload (1) a .zip with your code and (2) a PDF export of labs/lab04/README.md.

Submission contents:

(1) a .zip with your code, and
(2) a PDF export of labs/lab04/README.md.

Intro to what you need to do

In this lab you will take the pipeline you built in Lab 03 and package it inside a Docker container. The pipeline logic does not change ( no new application code). What you will learn is how to describe an environment, build an image, run a container, persist data with volumes, and manage the whole thing with Docker Compose. These are skills you can use for everything you deploy from now on, including the final project.

Copy your Lab 03 pipeline code into the new lab folder. You need run_pipeline.py, the pirlib/ directory, and requirements.txt.

Create (adding files during the current lab as well when requested) the following structure:

/
├── README.md
├── labs/
│   ├── lab01/
│   ├── lab02/
│   ├── lab03/
│   └── lab04/
│       ├── README.md
│       ├── Dockerfile
│       ├── docker-compose.yml
│       ├── .dockerignore
│       ├── requirements.txt
│       ├── run_pipeline.py
│       └── pirlib/
│           ├── __init__.py
│           ├── sampler.py
│           └── interpreter.py

Make sure Docker is installed

Before anything else, verify that Docker and Docker Compose are both available on your Pi. Open a terminal and run:

docker --version
docker compose version

If it is installed, you should see version numbers printed for both. If either command returns command not found, Docker isn’t installed yet so follow the steps below.

The fastest way to install Docker on a Raspberry Pi running Raspberry Pi OS (or any Debian-based system) is the official convenience script:

curl -fsSL https://get.docker.com | sh

This downloads and runs the official Docker installation script. It will detect your OS and architecture (the Pi 5 uses arm64) and install the correct packages automatically. It takes a minute or two.

By default, the Docker daemon runs as root, so every docker command would need sudo. To fix that, add your user to the docker group:

sudo usermod -aG docker $USER

$USER automatically expands to your current username so you don’t need to type it manually.
-aG docker means append your user to the docker group (the -a is important; without it you’d be removed from all other groups).

This change only takes effect in new login sessions. You must fully log out and log back in (or reboot) before the new group membership is recognised. Simply opening a new terminal tab is not enough.

To confirm it worked, after logging back in run:

groups

You should see docker listed among your groups. Then run:

docker run hello-world

If you see a success message, Docker is installed and your user has the correct permissions. You’re ready to continue.

Write the Dockerfile

Create a file called Dockerfile in labs/lab04/. This is the recipe Docker follows to build an image. Think of it as a script that sets up a fresh machine from scratch, installs everything your code needs, copies your files in, and defines how to start the application.

Recall from the lecture that every instruction in a Dockerfile creates a layer. Layers are cached. This means the order of instructions matters a lot for build speed. If you copy your code first and then install packages, any small code change forces Docker to reinstall all packages from scratch. If you install packages first and copy code second, a code change only rebuilds the last layer. On a Raspberry Pi where pip install can take minutes, this difference is very noticeable.

Here is a starting point (HERE ARE INSTRUCTIONS ONLY NOT A VALID DOCKEFILE!):

<choose a slim Python base image from Docker Hub> (e.g python:3.11-slim)
<set the working directory inside the container>
<copy your requirements file> .
<RUN pip install --no-cache-dir -r <your requirements file>>
<copy your library folder>/ <your library folder>/
<default command when container starts>
COPY <your pipeline entry point script> 
<run a default command when the container starts, for the output path use a >

Go through it line by line:

Your requirements.txt must include `rpi-lgpio’, it helps with the gpio-access within the container. The library gets installed inside the image.

.dockerignore

Create a .dockerignore file (HERE ARE INSTRUCTIONS ONLY NOT A VALID FILE, you need to fill it with the proper elements requested (e.g *.jsonl)):

<bytecode cache directories>
<compiled Python bytecode files>
<virtual environment directories>
<output data files you do not want baked into the image>
<version control directories>

When Docker builds an image, it sends the entire directory (the “build context”) to the Docker daemon. Without .dockerignore, it copies your virtual environment, bytecode cache, old output files, and the git history. That slows down every build and makes the image unnecessarily large.

Build the image

cd labs/lab04/
docker build -t motion-pipeline .

Docker reads the Dockerfile, executes each instruction, and produces an image tagged motion-pipeline. The first build will be slow because it pulls the base image and installs packages. Rebuild after a code change and watch how much faster it is. That is the layer cache at work.

Check what you built:

docker images

Note the image size. We will come back to this later.

Run the container

Before running, create a local directory for the output:

mkdir -p output

docker run –rm –privileged –device /dev/gpiomem0:/dev/gpiomem –device /dev/gpiochip0:/dev/gpiochip0 -v $(pwd)/output:/data motion-pipeline python run_pipeline.py –device-id pir-docker-01 –pin 4 –sample-interval 0.1 –cooldown 5 –min-high 0.2 –queue-size 50 –consumer-delay 0.5 –duration 6000 –out /data/motion_pipeline.jsonl –verbose Now run:

docker run --rm \
  --privileged \
  --device /dev/gpiomem0:/dev/gpiomem \
  --device /dev/gpiochip0:/dev/gpiochip0 \
  -v $(pwd)/output:/data \
  motion-pipeline

Three flags to understand:

--device /dev/gpiomem — containers are isolated from the host by default. They cannot see hardware devices. This flag passes the GPIO memory device into the container so your sampler can read the PIR sensor. Without it, you will get a permission or file-not-found error when the code tries to access GPIO.

-v $(pwd)/output:/data — this is a bind mount. It maps your local output/ directory to /data inside the container. When the pipeline writes to /data/motion_pipeline.jsonl, the file actually appears in output/motion_pipeline.jsonl on the host. This is one way to have data survive after the container stops. Remember, a container’s own writable layer is thrown away when it exits.

--rm — removes the container automatically after it exits. During development this keeps things tidy; without it you accumulate stopped containers that sit around doing nothing.

Trigger some motion events and verify that output/motion_pipeline.jsonl contains valid records. Open the file and check that the format matches your Lab 03 output.

Overriding the default command

The CMD in the Dockerfile provides defaults, but anything you write after the image name replaces it. This is useful for running experiments without rebuilding:

run with values that make sense with both the normal and slow consumer experiments.

docker run \
  --rm \
  --privileged \
  --device /dev/gpiomem0:/dev/gpiomem \
  --device /dev/gpiochip0:/dev/gpiochip0 \
  -v $(pwd)/output:/data \
  motion-pipeline \
  python run_pipeline.py \
    --device-id pir-docker-01 \
    --pin 4 \
    --sample-interval 0.1 \
    --cooldown 5 \
    --min-high 0.2 \
    --queue-size 50 \
    --consumer-delay 0.5 \
    --duration 6000 \
    --out /data/motion_pipeline.jsonl \
    --verbose

Run both a normal and a slow-consumer experiment inside Docker, the same way you did in Lab 03. Verify the output makes sense.

Inspecting a running container

While the container is running (in another terminal, remember that it runs for the “duration” specified), try these:

docker ps                        # see running containers
docker stats                     # live CPU/memory usage
docker logs <container-id>       # see stdout/stderr output
docker exec -it <container-id> /bin/bash   # open a shell inside the container

That last command is very useful for debugging. You can look around the filesystem, check if files exist, see what is installed. Type exit to leave.

Try resource limits

The lecture covered how Docker can cap CPU and memory per container. On a Raspberry Pi this is a real concern a runaway process can eat all the memory and crash the system.

Try:

docker run --rm \
  --device /dev/gpiomem \
  -v $(pwd)/output:/data \
  --memory=64m \
  motion-pipeline

Does the memory limitation work on the pi?

Then try --cpus=0.5 and --cpus=0.01. Does the pipeline still work? Does it get killed? You can watch what is happening with docker stats in another terminal.

When a container exceeds its memory limit (you will see that you cannot limit it here), Docker kills it you will see an exit code 137 (which means “killed by signal 9”). This is important to understand: the limit is a hard wall.

Use Docker Compose

Up to now you have been typing long docker run commands with several flags. That is fine for quick tests, but it gets old fast. Docker Compose lets you write the entire configuration in a YAML file and run it with a single command.

Create docker-compose.yml (AGAIN THESE ARE INSTRUCTIONS YOU NEED TO FILL THEM WITH THE CORRECT COMMANDS):

<define your services>:
  <name your service>:
    <specify how to build the image>:
      <directory where Docker should look for the Dockerfile>
      <name of your Dockerfile>
    <grant the container elevated hardware privileges>
    <list the hardware devices to pass through from host to container>:
      - "<host path to GPIO memory device>:<where to expose it inside the container>"
      - "<host path to GPIO chip device>:<where to expose it inside the container>"
    <list the volumes to mount>:
      - <pick a name for your named volume>:<path inside the container where data will be written>
    <resource limits section>:
      <limits subsection>:
        <maximum RAM the container is allowed to use>
        <maximum share of a CPU core the container may consume>
    <when Docker should automatically restart the container after it stops>
<declare named volumes so Docker creates and manages them>:
  <same volume name as above>:

build the image from the Dockerfile in the current directory, give it access to the GPIO device, mount a named volume called pipeline-data at /data, limit memory to 128MB (put it in even if it doesn’t work on the pi) and CPU to half a core, and restart the container if it crashes.

The volume here is a named volume managed by Docker, not a bind mount to a local directory. Docker decides where to store it on disk. The advantage is that it survives docker compose down and works the same way regardless of your current directory. The trade-off is that you cannot just ls a local folder to see the files you need to either use docker volume inspect pipeline-data to find the path, or exec into the container to look.

Start it:

docker compose up --build

--build forces a rebuild of the image. Without it, Compose uses the last built image, which might be stale if you changed your code.

To stop:

docker compose down

To verify data persistence, stop and start again:

docker compose down
docker compose up

The JSONL file should still be there from the previous run, because the named volume was not deleted. If you want to start clean, use docker compose down -v — the -v flag removes volumes too.

Slow-consumer experiment in Compose

Override the command in the Compose file to run the slow-consumer or the fast consumer experiment

Your final docker compose should have either experiment running when docker starts.


The `command` field overrides the `CMD` from the Dockerfile, just like passing arguments after the image name in `docker run`.

### Useful Compose commands

```bash
docker compose up --build        # build and start
docker compose up -d             # start in background (detached)
docker compose down              # stop and remove containers
docker compose down -v           # also remove volumes
docker compose logs              # see output from all services
docker compose logs -f           # follow logs in real time
docker compose ps                # see running services

Clean up

After you are done experimenting:

docker compose down -v           # stop and remove volumes
docker images                    # see what images exist
docker rmi motion-pipeline       # remove a specific image
docker system prune              # clean up all unused images, containers, volumes

Storage on a Pi is limited, so get in the habit of cleaning up.

Docker vs. virtual environments — a discussion

Up to Lab 03, your deployment strategy was a virtual environment and a requirements.txt. That works, and it is a good practice. But it has limits.

Docker isolates everything from the operating system up. The base image defines the OS (Debian, Alpine, whatever you choose), the system libraries, and the Python version. Your Dockerfile installs packages in a controlled environment. The result is an image that runs the same way on your Pi, on a classmate’s Pi, on a CI server, and on a cloud VM as long as the architecture matches.

There is a cost. Docker images are larger than a requirements.txt. Building is slower than pip install. You need Docker installed on the machine (which itself takes resources). And for simple scripts that only need a few pip packages and no system dependencies, a venv is perfectly fine.

The question is not “which one is better.” It is “when do you need which one.” For a quick prototype on your own machine, a venv is faster and simpler. For anything that needs to run on more than one machine, or survive an OS update, or be deployed by someone who is not you, a container is more reliable. In production edge systems containers are the standard.

It is also worth noting that these are not mutually exclusive. You can (and many people do) develop locally in a venv for fast iteration, and then package into Docker for deployment. The venv is your development tool; the Docker image is your deployment artifact.

Report questions

Answer the following in your labs/lab04/README.md after the implementation and experiments are complete.

Dockerfile and images

RQ1: What base image did you use and why?
RQ2: How many layers does your Dockerfile create? Which instructions produce new layers?
RQ3: What is the size of your built image? RQ4: Why do we copy requirements.txt and install dependencies before copying the rest of the code? What would happen if we reversed the order?

Running containers

RQ5: What does --device /dev/gpiomem do and why is it needed?
RQ6: What happens to the JSONL output if you run the container without a volume mount (-v)?
RQ7: Did the pipeline behave the same inside Docker as it did running directly on the Pi in Lab 03? Any differences?

Resource limits

RQ8: What happened when you set --memory=32m? Does this work on the PI? Why yes, why not?
RQ9: Why are resource limits important on edge devices in general?

Docker Compose

RQ10: What is the advantage of writing a docker-compose.yml instead of using docker run with flags?
RQ11: What is the difference between a bind mount (-v $(pwd)/output:/data) and a named volume (pipeline-data:/data)?
RQ12: What does restart: unless-stopped do and why does it matter for an edge device?

Docker vs. virtual environments

RQ13: What does a virtual environment isolate, and what does it not isolate?
RQ14: Give one concrete example where a requirements.txt and a venv would not be enough to reproduce your Lab 03 setup on a different machine.
RQ15: Give one scenario where a virtual environment is perhaps a better choice than Docker.
RQ16: In the context of the Smart Wastebin project, which approach (venv or Docker) would you prefer to use for a final deployment, and why?

Project hint: Smart Wastebin

The Smart Wastebin will have multiple components running together, sensor pipelines, maybe an MQTT broker, storage, a dashboard. Each of these will be a service in a docker-compose.yml. Now that you know how to containerize one service, you have the foundation for the whole thing.

Start thinking about which parts of the system should be separate containers and which should live together. A good rule of thumb: things with different dependencies, different scaling needs, or different development cycles should be separate. The sensor pipeline might be stable while you are still iterating on the dashboard. If they are separate containers you can redeploy one without touching the other.

What should be finished before you leave the lab

Before the end of the session you should have: copied your Lab 03 pipeline code, written and built a working Dockerfile with a .dockerignore, run the pipeline inside Docker with GPIO access and a volume mount, run both normal and slow-consumer experiments inside Docker, tested resource limits, written and used a docker-compose.yml, verified output persists across container restarts, updated labs/lab04/README.md with code and report answers, and pushed to GitHub.

Final checklist (Lab 04)

Lab 03 pipeline code copied into labs/lab04/
Dockerfile builds successfully
.dockerignore created
Pipeline runs in Docker with GPIO access (--device /dev/gpiomem)
JSONL output persists via volume
Container can be stopped and output file survives
Resource limits tested (--memory)
docker-compose.yml written and working
Normal run completed in Docker
Slow-consumer run completed in Docker
labs/lab04/README.md contains code, run steps, and report answers
Commit and push completed

Deliverables and submission

What must exist in the repository (by end of lab)

/
├── README.md
├── labs/
│   ├── lab01/
│   ├── lab02/
│   ├── lab03/
│   └── lab04/
│       ├── README.md
│       ├── Dockerfile
│       ├── docker-compose.yml
│       ├── .dockerignore
│       ├── requirements.txt
│       ├── run_pipeline.py
│       └── pirlib/
│           ├── __init__.py
│           ├── sampler.py
│           └── interpreter.py

Do not include:

venv/
__pycache__/
*.pyc
output/ or *.jsonl
large temporary files unless explicitly requested

What `labs/lab04/README.md` must contain

Two clearly separated parts:

Code / runbook
Answers to report questions

Same style as previous labs.

End of lab session — GitHub checkpoint

Before leaving:

commit your progress
push to your team GitHub repository

Minimum expectation:

all deliverables tracked by Git
latest commit pushed
commit message is clear

Before next lab — eClass submission

Submit both:

Code archive (.zip)
PDF export of labs/lab04/README.md

Required PDF filename format:

lab04_REPORT_<team>.pdf

What follows is a greek version of the same lab

Εργαστήριο 04 — Φορητές Data Pipelines με Containers

Προθεσμίες:

Τέλος εργαστηριακής συνεδρίας (GitHub checkpoint): κάντε commit & push την πρόοδό σας στο team repository.
Πριν το επόμενο εργαστήριο (υποβολή στο eClass): ανεβάστε (1) ένα .zip με τον κώδικά σας και (2) ένα PDF export του labs/lab04/README.md.

Περιεχόμενα υποβολής:

(1) ένα .zip με τον κώδικά σας, και
(2) ένα PDF export του labs/lab04/README.md.

Εισαγωγή στο τι πρέπει να κάνετε

Σε αυτό το εργαστήριο θα πάρετε το pipeline που φτιάξατε στο Lab 03 και θα το συσκευάσετε μέσα σε ένα Docker container. Η λογική του pipeline δεν αλλάζει (δεν υπάρχει νέος κώδικας). Αυτό που θα μάθετε είναι πώς να περιγράφετε ένα περιβάλλον, να κτίζετε ένα image, να τρέχετε ένα container, να διατηρείτε δεδομένα με volumes, και να διαχειρίζεστε τα πάντα με Docker Compose. Αυτές είναι δεξιότητες που μπορείτε να χρησιμοποιείτε σε ό,τι κι αν κάνετε deploy από εδώ και πέρα, συμπεριλαμβανομένου του τελικού project.

Αντιγράψτε τον κώδικα του pipeline από το Lab 03 στον νέο φάκελο. Χρειάζεστε το run_pipeline.py, τον κατάλογο pirlib/, και το requirements.txt.

Δημιουργήστε (προσθέτοντας αρχεία κατά τη διάρκεια του εργαστηρίου όπως ζητείται) την ακόλουθη δομή:

/
├── README.md
├── labs/
│   ├── lab01/
│   ├── lab02/
│   ├── lab03/
│   └── lab04/
│       ├── README.md
│       ├── Dockerfile
│       ├── docker-compose.yml
│       ├── .dockerignore
│       ├── requirements.txt
│       ├── run_pipeline.py
│       └── pirlib/
│           ├── __init__.py
│           ├── sampler.py
│           └── interpreter.py

Βεβαιωθείτε ότι το Docker είναι εγκατεστημένο

Πριν από οτιδήποτε άλλο, επαληθεύστε ότι τόσο το Docker όσο και το Docker Compose είναι διαθέσιμα στο Pi σας. Ανοίξτε ένα terminal και τρέξτε:

docker --version
docker compose version

Αν είναι εγκατεστημένο, θα δείτε εκτυπωμένους αριθμούς έκδοσης και για τα δύο. Αν κάποια εντολή επιστρέψει command not found, το Docker δεν είναι εγκατεστημένο ακόμα, οπότε ακολουθήστε τα παρακάτω βήματα.

Ο πιο γρήγορος τρόπος εγκατάστασης του Docker σε ένα Raspberry Pi με Raspberry Pi OS (ή οποιοδήποτε Debian-based σύστημα) είναι το επίσημο convenience script:

curl -fsSL https://get.docker.com | sh

Αυτό κατεβάζει και εκτελεί το επίσημο installation script του Docker. Θα εντοπίσει αυτόματα το OS και την αρχιτεκτονική σας (το Pi 5 χρησιμοποιεί arm64) και θα εγκαταστήσει τα σωστά packages. Χρειάζεται ένα-δύο λεπτά.

Από προεπιλογή, το Docker daemon τρέχει ως root, οπότε κάθε εντολή docker θα χρειαζόταν sudo. Για να το διορθώσετε, προσθέστε τον χρήστη σας στο group docker:

sudo usermod -aG docker $USER

Το $USER επεκτείνεται αυτόματα στο τρέχον username σας, οπότε δεν χρειάζεται να το πληκτρολογήσετε χειροκίνητα.
Το -aG docker σημαίνει append — προσθήκη του χρήστη σας στο group docker (το -a είναι σημαντικό· χωρίς αυτό θα αφαιρεθείτε από όλα τα άλλα groups).

Αυτή η αλλαγή ισχύει μόνο σε νέες login sessions. Πρέπει να κάνετε πλήρη logout και login ξανά (ή reboot) πριν αναγνωριστεί η νέα ιδιότητα μέλους του group. Το άνοιγμα νέας καρτέλας terminal δεν αρκεί.

Για να επιβεβαιώσετε ότι λειτούργησε, αφού συνδεθείτε ξανά τρέξτε:

groups

Θα πρέπει να βλέπετε το docker στη λίστα των groups σας. Στη συνέχεια τρέξτε:

docker run hello-world

Αν δείτε μήνυμα επιτυχίας, το Docker είναι εγκατεστημένο και ο χρήστης σας έχει τα σωστά permissions. Είστε έτοιμοι να συνεχίσετε.

Γράψτε το Dockerfile

Δημιουργήστε ένα αρχείο με το όνομα Dockerfile μέσα στο labs/lab04/. Αυτή είναι η συνταγή που ακολουθεί το Docker για να κτίσει ένα image. Σκεφτείτε το σαν ένα script που στήνει ένα φρέσκο μηχάνημα από μηδέν, εγκαθιστά ό,τι χρειάζεται ο κώδικάς σας, αντιγράφει τα αρχεία σας μέσα, και ορίζει τον τρόπο εκκίνησης της εφαρμογής.

Θυμηθείτε από τη διάλεξη ότι κάθε εντολή σε ένα Dockerfile δημιουργεί ένα layer. Τα layers αποθηκεύονται στη cache. Αυτό σημαίνει ότι η σειρά των εντολών έχει μεγάλη σημασία για την ταχύτητα του build. Αν αντιγράψετε πρώτα τον κώδικά σας και μετά εγκαταστήσετε packages, οποιαδήποτε μικρή αλλαγή κώδικα αναγκάζει το Docker να επανεγκαταστήσει όλα τα packages από μηδέν. Αν εγκαταστήσετε πρώτα packages και αντιγράψετε δεύτερο τον κώδικα, μια αλλαγή κώδικα ξαναχτίζει μόνο το τελευταίο layer. Σε ένα Raspberry Pi όπου η pip install μπορεί να πάρει λεπτά, αυτή η διαφορά είναι πολύ αισθητή.

Ακολουθεί ένα σημείο εκκίνησης (ΑΥΤΕΣ ΕΙΝΑΙ ΜΟΝΟ ΟΔΗΓΙΕΣ, ΟΧΙ ΕΓΚΥΡΟ DOCKERFILE!):

<επιλέξτε ένα slim Python base image από το Docker Hub> (π.χ. python:3.11-slim)
<ορίστε τον working directory μέσα στο container>
<αντιγράψτε το requirements file> .
<RUN pip install --no-cache-dir -r <το requirements file σας>>
<αντιγράψτε τον φάκελο της βιβλιοθήκης σας>/ <φάκελος βιβλιοθήκης>/
<default command όταν ξεκινά το container>
COPY <το entry point script του pipeline σας>
<τρέξτε ένα default command κατά την εκκίνηση του container, για το output path χρησιμοποιήστε >

Διαβάστε γραμμή-γραμμή:

Το requirements.txt σας πρέπει να περιλαμβάνει το rpi-lgpio — βοηθά με την πρόσβαση στο GPIO μέσα στο container. Η βιβλιοθήκη εγκαθίσταται μέσα στο image.

.dockerignore

Δημιουργήστε ένα αρχείο .dockerignore (ΠΑΛΙ ΑΥΤΕΣ ΕΙΝΑΙ ΟΔΗΓΙΕΣ, ΠΡΕΠΕΙ ΝΑ ΤΟ ΣΥΜΠΛΗΡΩΣΕΤΕ ΜΕ ΤΑ ΚΑΤAΛΛΗΛΑ ΣΤΟΙΧΕΙΑ, π.χ. *.jsonl):

<κατάλογοι bytecode cache>
<compiled Python bytecode αρχεία>
<κατάλογοι virtual environment>
<output data αρχεία που δεν θέλετε να είναι baked μέσα στο image>
<κατάλογοι version control>

Όταν το Docker κτίζει ένα image, στέλνει ολόκληρο τον κατάλογο (το “build context”) στο Docker daemon. Χωρίς .dockerignore, αντιγράφει το virtual environment, τη bytecode cache, παλιά output αρχεία, και το git history. Αυτό επιβραδύνει κάθε build και κάνει το image περιττά μεγάλο.

Κτίστε το image

cd labs/lab04/
docker build -t motion-pipeline .

Το Docker διαβάζει το Dockerfile, εκτελεί κάθε εντολή, και παράγει ένα image με το tag motion-pipeline. Το πρώτο build θα είναι αργό γιατί κατεβάζει το base image και εγκαθιστά packages. Ξαναχτίστε μετά από μια αλλαγή κώδικα και παρατηρήστε πόσο πιο γρήγορο είναι. Αυτή είναι η layer cache σε δράση.

Ελέγξτε τι κτίσατε:

docker images

Σημειώστε το μέγεθος του image. Θα επανέλθουμε σε αυτό αργότερα.

Τρέξτε το container

Πριν εκτελέσετε, δημιουργήστε έναν τοπικό κατάλογο για το output:

mkdir -p output

Τώρα τρέξτε:

docker run --rm \
  --privileged \
  --device /dev/gpiomem0:/dev/gpiomem \
  --device /dev/gpiochip0:/dev/gpiochip0 \
  -v $(pwd)/output:/data \
  motion-pipeline

Τρία flags που πρέπει να κατανοήσετε:

--device /dev/gpiomem — τα containers είναι απομονωμένα από τον host από προεπιλογή. Δεν μπορούν να δουν hardware devices. Αυτό το flag περνά το GPIO memory device μέσα στο container ώστε ο sampler να μπορεί να διαβάσει τον PIR αισθητήρα. Χωρίς αυτό, θα λάβετε σφάλμα permission ή file-not-found όταν ο κώδικας προσπαθήσει να έχει πρόσβαση στο GPIO.

-v $(pwd)/output:/data — αυτό είναι ένα bind mount. Αντιστοιχεί τον τοπικό κατάλογο output/ στο /data μέσα στο container. Όταν το pipeline γράφει στο /data/motion_pipeline.jsonl, το αρχείο εμφανίζεται στο output/motion_pipeline.jsonl στον host. Αυτός είναι ένας τρόπος να επιβιώνουν τα δεδομένα αφού σταματήσει το container. Θυμηθείτε, το δικό του writable layer ενός container διαγράφεται όταν αυτό τερματίζει.

--rm — αφαιρεί αυτόματα το container μετά τον τερματισμό του. Κατά την ανάπτυξη αυτό διατηρεί τα πράγματα τακτοποιημένα· χωρίς αυτό συσσωρεύετε σταματημένα containers που απλώς κάθονται εκεί.

Προκαλέστε μερικά motion events και επαληθεύστε ότι το output/motion_pipeline.jsonl περιέχει έγκυρες εγγραφές. Ανοίξτε το αρχείο και ελέγξτε ότι η μορφή ταιριάζει με το output του Lab 03.

Αντικατάσταση της default εντολής

Το CMD στο Dockerfile παρέχει τιμές default, αλλά ό,τι γράφετε μετά το όνομα του image το αντικαθιστά. Αυτό είναι χρήσιμο για να τρέχετε πειράματα χωρίς να ξαναχτίζετε:

Τρέξτε με τιμές που έχουν νόημα τόσο για το κανονικό όσο και για το slow consumer πείραμα.

docker run \
  --rm \
  --privileged \
  --device /dev/gpiomem0:/dev/gpiomem \
  --device /dev/gpiochip0:/dev/gpiochip0 \
  -v $(pwd)/output:/data \
  motion-pipeline \
  python run_pipeline.py \
    --device-id pir-docker-01 \
    --pin 4 \
    --sample-interval 0.1 \
    --cooldown 5 \
    --min-high 0.2 \
    --queue-size 50 \
    --consumer-delay 0.5 \
    --duration 6000 \
    --out /data/motion_pipeline.jsonl \
    --verbose

Τρέξτε και ένα κανονικό και ένα slow-consumer πείραμα μέσα στο Docker, όπως κάνατε στο Lab 03. Επαληθεύστε ότι το output έχει νόημα.

Επιθεώρηση ενός τρέχοντος container

Ενώ το container τρέχει (σε άλλο terminal, θυμηθείτε ότι τρέχει για τη “duration” που ορίσατε), δοκιμάστε τα εξής:

docker ps                        # δείτε τα τρέχοντα containers
docker stats                     # live CPU/memory usage
docker logs <container-id>       # δείτε stdout/stderr output
docker exec -it <container-id> /bin/bash   # ανοίξτε ένα shell μέσα στο container

Η τελευταία εντολή είναι πολύ χρήσιμη για debugging. Μπορείτε να περιηγηθείτε στο filesystem, να ελέγξετε αν υπάρχουν αρχεία, να δείτε τι είναι εγκατεστημένο. Πληκτρολογήστε exit για να φύγετε.

Δοκιμάστε resource limits

Η διάλεξη κάλυψε πώς το Docker μπορεί να περιορίσει CPU και memory ανά container. Σε ένα Raspberry Pi αυτό είναι πραγματική ανάγκη — μια διαδικασία που ξεφεύγει μπορεί να καταναλώσει όλη τη μνήμη και να καταρρεύσει το σύστημα.

Δοκιμάστε:

docker run --rm \
  --device /dev/gpiomem \
  -v $(pwd)/output:/data \
  --memory=64m \
  motion-pipeline

Λειτουργεί ο περιορισμός μνήμης στο Pi;

Στη συνέχεια δοκιμάστε --cpus=0.5 και --cpus=0.01. Συνεχίζει να λειτουργεί το pipeline; Τερματίζεται βίαια; Μπορείτε να παρακολουθείτε τι συμβαίνει με το docker stats σε άλλο terminal.

Όταν ένα container υπερβεί το memory limit (θα δείτε ότι δεν μπορείτε να το περιορίσετε εδώ), το Docker το τερματίζει — θα δείτε exit code 137 (που σημαίνει “killed by signal 9”). Αυτό είναι σημαντικό να κατανοήσετε: το limit είναι σκληρό όριο.

Χρησιμοποιήστε Docker Compose

Μέχρι τώρα πληκτρολογούσατε μακριές εντολές docker run με πολλά flags. Είναι εντάξει για γρήγορες δοκιμές, αλλά κουράζει γρήγορα. Το Docker Compose σας επιτρέπει να γράψετε ολόκληρη τη διαμόρφωση σε ένα YAML αρχείο και να την εκτελέσετε με μία μόνο εντολή.

Δημιουργήστε το docker-compose.yml (ΠΑΛΙ ΑΥΤΕΣ ΕΙΝΑΙ ΟΔΗΓΙΕΣ ΠΟΥ ΠΡΕΠΕΙ ΝΑ ΣΥΜΠΛΗΡΩΣΕΤΕ ΜΕ ΤΙΣ ΣΩΣΤΕΣ ΕΝΤΟΛΕΣ):

<ορίστε τα services σας>:
  <ονομάστε το service σας>:
    <ορίστε πώς να κτιστεί το image>:
      <κατάλογος όπου το Docker θα ψάξει για το Dockerfile>
      <όνομα του Dockerfile>
    <δώστε στο container elevated hardware privileges>
    <λίστα hardware devices για πέρασμα από τον host στο container>:
      - "<host path στο GPIO memory device>:<που να εκτεθεί μέσα στο container>"
      - "<host path στο GPIO chip device>:<που να εκτεθεί μέσα στο container>"
    <λίστα volumes για mount>:
      - <επιλέξτε ένα όνομα για το named volume>:<path μέσα στο container όπου θα γράφονται δεδομένα>
    <ενότητα resource limits>:
      <υποενότητα limits>:
        <μέγιστη RAM που επιτρέπεται στο container>
        <μέγιστο ποσοστό CPU core που μπορεί να καταναλώσει το container>
    <πότε το Docker να κάνει αυτόματα restart το container αφού σταματήσει>
<δηλώστε named volumes ώστε το Docker να τα δημιουργεί και να τα διαχειρίζεται>:
  <ίδιο όνομα volume με παραπάνω>:

Κτίστε το image από το Dockerfile στον τρέχοντα κατάλογο, δώστε πρόσβαση στο GPIO device, κάντε mount ένα named volume με το όνομα pipeline-data στο /data, περιορίστε τη μνήμη στα 128MB (βάλτε το ακόμα και αν δεν λειτουργεί στο Pi) και την CPU στα μισά ενός core, και κάντε restart το container αν κρασάρει.

Το volume εδώ είναι ένα named volume που διαχειρίζεται το Docker, όχι ένα bind mount σε τοπικό κατάλογο. Το Docker αποφασίζει πού να το αποθηκεύσει στο δίσκο. Το πλεονέκτημα είναι ότι επιβιώνει μετά το docker compose down και λειτουργεί με τον ίδιο τρόπο ανεξάρτητα από τον τρέχοντα κατάλογό σας. Το μειονέκτημα είναι ότι δεν μπορείτε απλώς να κάνετε ls σε έναν τοπικό φάκελο για να δείτε τα αρχεία — χρειάζεται είτε να χρησιμοποιήσετε docker volume inspect pipeline-data για να βρείτε το path, ή να κάνετε exec μέσα στο container για να ρίξετε μια ματιά.

Εκκινήστε:

docker compose up --build

Το --build αναγκάζει rebuild του image. Χωρίς αυτό, το Compose χρησιμοποιεί το τελευταίο built image, που μπορεί να είναι παλιό αν αλλάξατε τον κώδικά σας.

Για διακοπή:

docker compose down

Για να επαληθεύσετε την persistence των δεδομένων, σταματήστε και ξεκινήστε ξανά:

docker compose down
docker compose up

Το JSONL αρχείο θα πρέπει να είναι ακόμα εκεί από την προηγούμενη εκτέλεση, γιατί το named volume δεν διαγράφηκε. Αν θέλετε να ξεκινήσετε από μηδέν, χρησιμοποιήστε docker compose down -v — το flag -v αφαιρεί και τα volumes.

Πείραμα slow-consumer στο Compose

Αντικαταστήστε την εντολή στο Compose file για να τρέξετε το slow-consumer ή το fast consumer πείραμα.

Το τελικό σας docker compose θα πρέπει να έχει κάποιο από τα δύο πειράματα να τρέχει όταν ξεκινά το Docker.

Το πεδίο command αντικαθιστά το CMD από το Dockerfile, ακριβώς όπως η μεταβίβαση ορισμάτων μετά το όνομα image στο docker run.

Χρήσιμες εντολές Compose

docker compose up --build        # build και εκκίνηση
docker compose up -d             # εκκίνηση στο background (detached)
docker compose down              # διακοπή και αφαίρεση containers
docker compose down -v           # αφαίρεση και volumes
docker compose logs              # εμφάνιση output από όλα τα services
docker compose logs -f           # παρακολούθηση logs σε πραγματικό χρόνο
docker compose ps                # εμφάνιση τρεχόντων services

Καθαρισμός

Αφού τελειώσετε με τα πειράματα:

docker compose down -v           # διακοπή και αφαίρεση volumes
docker images                    # εμφάνιση υπαρχόντων images
docker rmi motion-pipeline       # αφαίρεση συγκεκριμένου image
docker system prune              # καθαρισμός όλων των αχρησιμοποίητων images, containers, volumes

Ο αποθηκευτικός χώρος σε ένα Pi είναι περιορισμένος, οπότε αποκτήστε τη συνήθεια να καθαρίζετε.

Docker vs. virtual environments — συζήτηση

Μέχρι το Lab 03, η στρατηγική deployment σας ήταν ένα virtual environment και ένα requirements.txt. Αυτό λειτουργεί και είναι καλή πρακτική. Αλλά έχει όρια.

Το Docker απομονώνει τα πάντα από το λειτουργικό σύστημα και πάνω. Το base image ορίζει το OS (Debian, Alpine, ό,τι επιλέξετε), τις system libraries, και την έκδοση Python. Το Dockerfile σας εγκαθιστά packages σε ένα ελεγχόμενο περιβάλλον. Το αποτέλεσμα είναι ένα image που τρέχει με τον ίδιο τρόπο στο Pi σας, στο Pi ενός συμφοιτητή, σε έναν CI server, και σε ένα cloud VM — εφόσον η αρχιτεκτονική ταιριάζει.

Υπάρχει ένα κόστος. Τα Docker images είναι μεγαλύτερα από ένα requirements.txt. Το build είναι πιο αργό από την pip install. Χρειάζεστε εγκατεστημένο Docker στο μηχάνημα (που και αυτό καταναλώνει πόρους). Και για απλά scripts που χρειάζονται μόνο μερικά pip packages και καθόλου system dependencies, ένα venv είναι απολύτως αρκετό.

Το ερώτημα δεν είναι “ποιο είναι καλύτερο.” Είναι “πότε χρειάζεστε ποιο.” Για ένα γρήγορο prototype στο δικό σας μηχάνημα, ένα venv είναι πιο γρήγορο και απλό. Για οτιδήποτε πρέπει να τρέξει σε περισσότερα από ένα μηχανήματα, ή να επιβιώσει σε ενημέρωση OS, ή να deployed από κάποιον που δεν είστε εσείς, ένα container είναι πιο αξιόπιστο. Σε production edge συστήματα, τα containers είναι το πρότυπο.

Αξίζει επίσης να σημειωθεί ότι αυτά δεν αλληλοαποκλείονται. Μπορείτε (και πολλοί το κάνουν) να αναπτύσσετε τοπικά σε venv για γρήγορη επανάληψη, και στη συνέχεια να συσκευάζετε σε Docker για deployment. Το venv είναι το development tool σας· το Docker image είναι το deployment artifact σας.

Ερωτήσεις αναφοράς

Απαντήστε τα παρακάτω στο labs/lab04/README.md αφού ολοκληρωθεί η υλοποίηση και τα πειράματα.

Dockerfile και images

RQ1: Ποιο base image χρησιμοποιήσατε και γιατί;
RQ2: Πόσα layers δημιουργεί το Dockerfile σας; Ποιες εντολές παράγουν νέα layers;
RQ3: Ποιο είναι το μέγεθος του built image σας;
RQ4: Γιατί αντιγράφουμε πρώτα το requirements.txt και εγκαθιστούμε dependencies πριν αντιγράψουμε τον υπόλοιπο κώδικα; Τι θα συνέβαινε αν αντιστρέφαμε τη σειρά;

Εκτέλεση containers

RQ5: Τι κάνει το --device /dev/gpiomem και γιατί χρειάζεται;
RQ6: Τι συμβαίνει στο JSONL output αν τρέξετε το container χωρίς volume mount (-v);
RQ7: Συμπεριφέρθηκε το pipeline με τον ίδιο τρόπο μέσα στο Docker όπως και απευθείας στο Pi στο Lab 03; Υπήρχαν διαφορές;

Resource limits

RQ8: Τι συνέβη όταν ορίσατε --memory=32m; Λειτουργεί αυτό στο Pi; Γιατί ναι ή γιατί όχι;
RQ9: Γιατί είναι σημαντικά τα resource limits στα edge devices γενικότερα;

Docker Compose

RQ10: Ποιο είναι το πλεονέκτημα του να γράψετε ένα docker-compose.yml αντί να χρησιμοποιείτε docker run με flags;
RQ11: Ποια είναι η διαφορά μεταξύ ενός bind mount (-v $(pwd)/output:/data) και ενός named volume (pipeline-data:/data);
RQ12: Τι κάνει το restart: unless-stopped και γιατί έχει σημασία για ένα edge device;

Docker vs. virtual environments

RQ13: Τι απομονώνει ένα virtual environment και τι δεν απομονώνει;
RQ14: Δώστε ένα συγκεκριμένο παράδειγμα όπου ένα requirements.txt και ένα venv δεν θα αρκούσαν για να αναπαραγάγουν το setup του Lab 03 σε άλλο μηχάνημα.
RQ15: Δώστε ένα σενάριο όπου ένα virtual environment είναι ίσως καλύτερη επιλογή από το Docker.
RQ16: Στο πλαίσιο του project Smart Wastebin, ποια προσέγγιση (venv ή Docker) θα προτιμούσατε για ένα τελικό deployment, και γιατί;

Project hint: Smart Wastebin

Το Smart Wastebin θα έχει πολλά components να τρέχουν μαζί — sensor pipelines, ίσως ένα MQTT broker, αποθήκευση, ένα dashboard. Το καθένα από αυτά θα είναι ένα service σε ένα docker-compose.yml. Τώρα που ξέρετε πώς να containerize ένα service, έχετε τη βάση για το σύνολο.

Αρχίστε να σκέφτεστε ποια τμήματα του συστήματος πρέπει να είναι ξεχωριστά containers και ποια να συγκατοικούν. Ένας καλός εμπειρικός κανόνας: πράγματα με διαφορετικά dependencies, διαφορετικές ανάγκες scaling, ή διαφορετικούς κύκλους ανάπτυξης πρέπει να είναι ξεχωριστά. Το sensor pipeline μπορεί να είναι σταθερό ενώ εξακολουθείτε να επαναλαμβάνεστε στο dashboard. Αν είναι ξεχωριστά containers, μπορείτε να κάνετε redeploy το ένα χωρίς να αγγίξετε το άλλο.

Τι πρέπει να έχει ολοκληρωθεί πριν φύγετε από το εργαστήριο

Πριν το τέλος της συνεδρίας θα πρέπει να έχετε: αντιγράψει τον κώδικα του pipeline από το Lab 03, γράψει και κτίσει ένα λειτουργικό Dockerfile με .dockerignore, τρέξει το pipeline μέσα στο Docker με GPIO access και volume mount, τρέξει και τα κανονικά και τα slow-consumer πειράματα μέσα στο Docker, δοκιμάσει resource limits, γράψει και χρησιμοποιήσει ένα docker-compose.yml, επαληθεύσει ότι το output διατηρείται μεταξύ restarts του container, ενημερώσει το labs/lab04/README.md με κώδικα και απαντήσεις αναφοράς, και κάνει push στο GitHub.

Τελική λίστα ελέγχου (Lab 04)

Κώδικας pipeline από Lab 03 αντιγραμμένος στο labs/lab04/
Το Dockerfile κτίζεται επιτυχώς
Δημιουργήθηκε το .dockerignore
Το pipeline τρέχει στο Docker με GPIO access (--device /dev/gpiomem)
Το JSONL output διατηρείται μέσω volume
Το container μπορεί να σταματήσει και το output αρχείο να επιβιώνει
Δοκιμάστηκαν resource limits (--memory)
Το docker-compose.yml γράφτηκε και λειτουργεί
Ολοκληρώθηκε κανονική εκτέλεση στο Docker
Ολοκληρώθηκε slow-consumer εκτέλεση στο Docker
Το labs/lab04/README.md περιέχει κώδικα, βήματα εκτέλεσης, και απαντήσεις αναφοράς
Commit και push ολοκληρώθηκαν

Παραδοτέα και υποβολή

Τι πρέπει να υπάρχει στο repository (έως το τέλος του εργαστηρίου)

/
├── README.md
├── labs/
│   ├── lab01/
│   ├── lab02/
│   ├── lab03/
│   └── lab04/
│       ├── README.md
│       ├── Dockerfile
│       ├── docker-compose.yml
│       ├── .dockerignore
│       ├── requirements.txt
│       ├── run_pipeline.py
│       └── pirlib/
│           ├── __init__.py
│           ├── sampler.py
│           └── interpreter.py

Μην συμπεριλάβετε:

venv/
__pycache__/
*.pyc
output/ ή *.jsonl
μεγάλα προσωρινά αρχεία εκτός αν ζητηθεί ρητά

Τι πρέπει να περιέχει το `labs/lab04/README.md`

Δύο σαφώς διαχωρισμένα μέρη:

Κώδικας / runbook
Απαντήσεις στις ερωτήσεις αναφοράς

Ίδιο στυλ με τα προηγούμενα labs.

Τέλος εργαστηριακής συνεδρίας — GitHub checkpoint

Πριν φύγετε:

κάντε commit την πρόοδό σας
κάντε push στο team GitHub repository

Ελάχιστη προσδοκία:

όλα τα παραδοτέα tracked από το Git
τελευταίο commit pushed
το commit message είναι σαφές

Πριν το επόμενο εργαστήριο — υποβολή στο eClass

Υποβάλλετε και τα δύο:

Code archive (.zip)
PDF export του labs/lab04/README.md

Απαιτούμενη μορφή ονόματος PDF:

lab04_REPORT_<team>.pdf

Portable Data Pipelines

Lab 04 — Portable Data Pipelines with Containers

Intro to what you need to do

Make sure Docker is installed

Write the Dockerfile

.dockerignore

Build the image

Run the container

Overriding the default command

Inspecting a running container

Try resource limits

Use Docker Compose

Slow-consumer experiment in Compose

Clean up

Docker vs. virtual environments — a discussion

Report questions

Dockerfile and images

Running containers

Resource limits

Docker Compose

Docker vs. virtual environments

Project hint: Smart Wastebin

What should be finished before you leave the lab

Final checklist (Lab 04)

Deliverables and submission

What must exist in the repository (by end of lab)

What labs/lab04/README.md must contain

End of lab session — GitHub checkpoint

Before next lab — eClass submission

What follows is a greek version of the same lab

Εργαστήριο 04 — Φορητές Data Pipelines με Containers

Εισαγωγή στο τι πρέπει να κάνετε

Βεβαιωθείτε ότι το Docker είναι εγκατεστημένο

Γράψτε το Dockerfile

.dockerignore

Κτίστε το image

Τρέξτε το container

Αντικατάσταση της default εντολής

Επιθεώρηση ενός τρέχοντος container

Δοκιμάστε resource limits

Χρησιμοποιήστε Docker Compose

Πείραμα slow-consumer στο Compose

Χρήσιμες εντολές Compose

Καθαρισμός

Docker vs. virtual environments — συζήτηση

Ερωτήσεις αναφοράς

Dockerfile και images

Εκτέλεση containers

Resource limits

Docker Compose

Docker vs. virtual environments

Project hint: Smart Wastebin

Τι πρέπει να έχει ολοκληρωθεί πριν φύγετε από το εργαστήριο

Τελική λίστα ελέγχου (Lab 04)

Παραδοτέα και υποβολή

Τι πρέπει να υπάρχει στο repository (έως το τέλος του εργαστηρίου)

Τι πρέπει να περιέχει το labs/lab04/README.md

Τέλος εργαστηριακής συνεδρίας — GitHub checkpoint

Πριν το επόμενο εργαστήριο — υποβολή στο eClass

What `labs/lab04/README.md` must contain

Τι πρέπει να περιέχει το `labs/lab04/README.md`