Lab 05 — From Raw Sensor Data to Structured Models

Deadlines:

End of lab session (GitHub checkpoint): commit & push your progress to your team repository.
Before next lab (eClass submission): upload (1) a .zip with your code and (2) a PDF export of labs/lab05/README.md.

Submission contents:

(1) a .zip with your code, and
(2) a PDF export of labs/lab05/README.md.

Intro to what you need to do

Look at a record from your pipeline output:

{"event_time": "2026-04-10T14:32:01.123Z", "device_id": "pir-01", "event_type": "motion", "motion_state": "detected", "seq": 7, "run_id": "abc123", "ingest_time": "2026-04-10T14:32:01.130Z", "pipeline_latency_ms": 7.0}

This works for your pipeline, but it is completely opaque to anyone else. What is pir-01? A product code? A room number (How did you decide for the device_id)? An IP address? What does motion_state: detected mean, did something move, or did the sensor detect that nothing is moving? What units is pipeline_latency_ms in, is that obvious, or are you relying on the field name to carry that information?

In this lab you will model the things your system is made of (the sensor, the wastebin it is mounted on, the space it sits in, and the observations it produces) using JSON-LD. Then you will modify your pipeline so its output events carry references back to these models. The result is data that describes itself: anyone reading your output can follow the links and understand what device produced it, where it was deployed, what it is attached to, and what the values mean.

You will work through five stages: model the sensor, model the wastebin and environment, design a semantic @context, connect your pipeline output to those models, and finally produce a diagram showing entities, properties, and relationships.

Create the following structure:

/
├── README.md
├── docs/
│   └── ontology.md
├── labs/
│   ├── lab01/
│   ├── lab02/
│   ├── lab03/
│   ├── lab04/
│   └── lab05/
│       ├── README.md
│       ├── models/
│       │   ├── sensor.jsonld
│       │   ├── wastebin.jsonld
│       │   ├── environment.jsonld
│       │   └── context.jsonld
│       ├── requirements.txt
│       ├── run_pipeline.py
│       └── pirlib/
│           ├── __init__.py
│           ├── sampler.py
│           └── interpreter.py

Copy your pipeline code from Lab 04 (or Lab 03 if you prefer since the pipeline logic is the same). The changes you make will be in the output format and the new model files.

Model the sensor

Start by describing the sensor itself. Not the data it produces just the physical device. What is it? What does it measure? What are its characteristics? What is it attached to?

Create models/sensor.jsonld. Think about what someone who has never seen your project would need to know about this device.

Here is a minimal starting point you need to fill (or remove things you don’t use) it and expand it significantly:

{
  "@context": {
    "@vocab": "https://schema.org/",
    "sosa": "http://www.w3.org/ns/sosa/",
    "ssn": "http://www.w3.org/ns/ssn/",
    "lab801": "your own context?"
  },
  "@id": "urn:your_sensor_id",
  "@type": "Something",
  "name": "name for the sensor",
  "description": "HC-SR501 passive infrared motion sensor ...",
  ...
}

A good sensor description should include much more. Go through the datasheet for your sensor and think about what matters:

Identity: what is its unique ID? What is the manufacturer model name/number?
Sensing principle: what physical phenomenon does it detect? (passive infrared radiation from warm bodies)
What it observes: motion? presence? occupancy? Be precise, these are different things.
Hardware connection: which GPIO pin? What voltage does it operate at?
Detection characteristics: what is its detection range? Detection angle? What is the minimum time between detections (the cooldown you implemented in Lab 02)?
Operating conditions: what temperature range does it work in? Indoor only or also outdoor?
Deployment: what is it mounted on? Where is it deployed? (You will link to the wastebin and environment models here.)
Status: is it currently active? When was it installed?

Not all of these will map to existing vocabulary terms, and that is fine, you can come back to it when you define your own terms for project-specific properties later in the lab. For now, include them using whatever field names make sense.

A few things about the JSON-LD structure:

@context defines the vocabularies you are drawing from. You are not locked into one of course, you can mix (look like the example where we have multiple mixed).

Look at what else is available (some examples here you can find others online as well):

schema.org — general-purpose, widely used, good for names, descriptions, locations
SOSA/SSN (https://www.w3.org/TR/vocab-ssn/) — purpose-built for sensors and observations
SAREF (https://saref.etsi.org/) — ETSI standard for smart appliances and IoT
Smart Data Models (https://smartdatamodels.org/) — ready-made models for many IoT domains

Browse these. See what fits. Pick what makes sense for your sensor and justify your choices in the report.

@id is the unique identifier for this entity. Using a URN like urn:dev:team-05:pir-01 makes it globally unambiguous. This is the ID that your observation records will reference later, when a motion event says “I was produced by urn:dev:team-05:pir-01”, anyone can look up that ID and find the full sensor description.

Model the wastebin and the environment

Your sensor does not exist in a vacuum. It is mounted on something (a wastebin) and that something is located somewhere (a room, a building, a campus zone). These are separate entities with their own properties and their own relationships.

The wastebin

Create models/wastebin.jsonld. Think about what a wastebin entity needs to describe:

Identity: unique ID, a human-readable name or label
Physical properties: capacity (in liters), material, dimensions, color
Operational properties: what type of waste does it accept? What collection zone does it belong to? What is its current status (active, full, maintenance)?
Sensors mounted on it: link to the PIR sensor (imagine that you would have it in a way that supports others as well [in a real project you might have more sensors in the future])
Location: where is it? (link to the environment)

{
  "@context": {
    "@vocab": "https://schema.org/",
    "sosa": "http://www.w3.org/ns/sosa/",
    etc... 
  },
  "@id": "urn:something",
  "@type": "what type?",
  "name": "what name",
  "description": "describe it"
}

Expand it. Think about what a waste collection service would need to know about this bin. Think about what a campus management system would need. The more properties you add, the more useful the model becomes.

The environment

Create models/environment.jsonld. This describes the physical space where the wastebin is deployed.

{
  "@context": {
    "@vocab": "https://schema.org/",
    "bot": "https://w3id.org/bot#",
    etc... 
  },
  "@id": "urn:someid",
  "@type": "the type selected",
  "name": "its name",
  "description": ""
}

Think about:

Spatial hierarchy: is this a room inside a floor inside a building inside a campus? You can model that as nested entities or as properties.
Location coordinates: latitude/longitude if outdoors, room number if indoors
What is in this space: which wastebins are deployed here? Which sensors?
Zone information: is this a high-traffic area? What collection route covers it?
Environmental conditions: indoor/outdoor? Covered/uncovered?

The bot vocabulary (Building Topology Ontology, https://w3id.org/bot) is designed for describing spaces and their relationships. schema.org/Place is another option. There are others as well. Pick what you prefer as long as it makes sense.

Relationships matter

The real value of these models is in how the entities connect:

The sensor is mountedOn the wastebin and deployedIn the environment
The wastebin hasSensor the sensor (and will have more sensors later) and locatedIn the environment
The environment contains the wastebin

These links should appear in the actual JSON-LD files. For example, in sensor.jsonld:

"mountedOn": "urn:wastebin:bin-01",
"deployedIn": "urn:env:lab-room-101"

And in wastebin.jsonld:

"hasSensor": ["urn:dev:pir-01"],
"locatedIn": "urn:env:lab-room-101"

Make these bidirectional where it makes sense. When you look at any one entity, you should be able to follow links to find the others.

Design the context for observations

Your pipeline currently outputs events like this:

{"event_time": "2026-04-10T14:32:01.123Z", "device_id": "pir-01", "event_type": "motion", "motion_state": "detected", "seq": 7, "run_id": "abc123", "ingest_time": "2026-04-10T14:32:01.130Z", "pipeline_latency_ms": 7.0}

These field names are meaningful to you, but they carry no formal semantics. event_time could mean anything, the time the event happened? The time it was received? The time it was processed?

Create models/context.jsonld a JSON-LD context that maps your pipeline’s field names to well-defined terms (like you did for the others):

{
  "@context": {
    "@vocab": "https://schema.org/",
    "sosa": "http://www.w3.org/ns/sosa/",
    "xsd": "http://www.w3.org/2001/XMLSchema#",

    "event_time": {
      "@id": "something",
      "@type": "something"
    },
    "device_id": "something",
    "event_type": "@type",
    "motion_state": "something"
  }
}

Now anyone who reads this context knows exactly what event_time means, not because of the field name, but because it points to a published, documented standard term.

Go through every field in your pipeline output and map it. Some fields have clear standard equivalents. Others do not. For the ones that do not, you need to define your own terms.

Creating your own namespace

Not everything fits a standard vocabulary. pipeline_latency_ms, seq, and run_id (and perhaps others you can think of) are pipeline-internal concepts, SOSA and schema.org have nothing to say about them. For these, you create your own namespace.

Your GitHub repository is a good base for this. Since your repo has a URL, you can use it as a namespace that other people can actually follow to find documentation. (A small caveat: if your repo is private, someone without access would not be able to read the definitions. For this exercise that is fine, the practice of documenting your terms is what matters.)

You will create a simple Markdown file in your repo that describes every custom term, so the namespace is not just a naming convention but real documentation that anyone with access can read.

Step 1 — Pick your namespace URL

Use this pattern:

https://github.com/<your-org>/<your-repo>/blob/main/docs/ontology.md#

For example, if your team’s repo is https://github.com/iot-team-05/smart-wastebin, your namespace becomes:

https://github.com/iot-team-05/smart-wastebin/blob/main/docs/ontology.md#

The # at the end is important as it means individual terms get appended as fragments. So pipeline:latencyMs expands to:

https://github.com/iot-team-05/smart-wastebin/blob/main/docs/ontology.md#latencyMs

That is a real URL someone can open in a browser. If you use anchor headings in the document (which GitHub Markdown does automatically), it will even scroll to the right section.

Step 2 — Create the documentation file

Create docs/ontology.md in your repository. This file documents every custom term your project defines, the ones that do not come from standard vocabularies like SOSA or schema.org.

Here is an example:

# Smart Wastebin — Custom Ontology Terms

Base namespace: `https://github.com/iot-team-05/smart-wastebin/blob/main/docs/ontology.md#`

Prefix used in JSON-LD: `pipeline`

## Pipeline Terms example

### latencyMs

- **Type:** `xsd:float`
- **Description:** Time in milliseconds between event creation by the producer
  and ingestion by the consumer. Measures how long a record spent in the queue.

Add a section for each custom term you define. As your project grows and you add new terms (e.g for fill level etc.), add them here.

Step 3 — Use the namespace in your context

In models/context.jsonld, declare the prefix pointing to your documentation file and map all your fields, both standard and custom:

EXAMPLE PLEASE FILL IT YOUR ALL YOUR FIELDS

{
  "@context": {
    "sosa": "http://www.w3.org/ns/sosa/",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "pipeline": "https://github.com/iot-team-05/smart-wastebin/blob/main/docs/ontology.md#",
    "device_id": "sosa:madeBySensor",

    "seq": {
      "@id": "pipeline:sequenceNumber",
      "@type": "xsd:integer"
    },
  }
}

Now "seq": 7 is not a mystery number but a pipeline:sequenceNumber of type xsd:integer, and the pipeline: prefix resolves to a document in your repo where someone can read exactly what that means.

Step 4 — Update your repo structure

Your repository should now include:

/
├── docs/
│   └── ontology.md
├── labs/
│   └── lab05/
│       ├── models/
│       │   ├── context.jsonld    ← references the namespace
│       │   ├── sensor.jsonld
│       │   ├── wastebin.jsonld
│       │   └── environment.jsonld
│       └── ...

Commit and push docs/ontology.md. Once it is on GitHub, the namespace URL is live, anyone (with access) can follow it and read your term definitions. That is the difference between a namespace that is just a naming trick and one that actually works as documentation.

Tips for defining good custom terms

Naming: pipeline:latencyMs is clearer than pipeline:plm. Be descriptive.
Types: always declare the datatype (xsd:float, xsd:dateTime, xsd:integer, xsd:string). This removes ambiguity about whether "7.0" is a string or a number.
Documentation: every custom term you use in your context should have a matching entry in docs/ontology.md. If a term is not documented, it is not really defined — it is just a label.

The point is not to force every field into an existing vocabulary. Standard vocabularies give you interoperability for free, if you use sosa:resultTime, any SOSA-aware tool already knows what that is. Custom terms give you precision for things that are specific to your system. Use both.

Connect the pipeline output

Now modify run_pipeline.py so that the events it produces reference your models and include the context.

Each event should:

Reference the sensor that produced it (by @id)
Reference the wastebin or environment it relates to (by @id)
Include or reference the @context so the data is self-describing
Use @type to declare what kind of thing this record is

You have a choice about how to handle @context in a streaming JSONL pipeline:

Inline it in every record, simple, every line is self-contained
Reference a file ("@context": "models/context.jsonld")
Put it once at the top of the output file and keep lines without it, practical but not standard JSONL

Think about which makes sense. There is no single right answer, each has trade-offs for file size, portability, and ease of parsing. Discuss this in your report.

Run the pipeline and verify the output. Each line should now reference a known sensor, a known environment, and carry enough context that someone seeing it for the first time can understand what it describes.

Draw the entity diagram

Create a visual diagram showing all your entities, their properties, and the relationships between them. This is the kind of diagram you would put in project documentation or show in a presentation.

The diagram should include:

Every entity: the sensor, the wastebin, the environment (and any sub-entities you created, like a building or a campus zone)
Key properties of each entity: not every single field, but the important ones, ID, type, name, and the domain-specific properties that define what the entity is
Every relationship: which entity links to which, and what the relationship is called (mountedOn, locatedIn, hasSensor, deployedIn, contains, etc.)
The observation: show how an observation record connects to the sensor and the environment

You can draw this however you like, a hand-drawn sketch that you photograph, a diagram tool (draw.io, Mermaid, PlantUML, Lucidchart), or even ASCII art. What matters is that the diagram is clear and complete. Someone should be able to look at it and understand your entire data model without reading the JSON-LD files.

Think of this as one of the initial data models, the kind we discussed in the lecture. It shows the big picture: what the entities are, how they relate, and what describes them.

Include the diagram in your labs/lab05/README.md.

Validate your thinking

Once you have the models, the updated pipeline, and the diagram, take a step back:

Can someone understand a single event without reading your code? Open one JSONL line. Following the @context, and your relationships (e.gmadeBySensor, observedIn links), can you trace back to the sensor description, the wastebin, and the environment? If yes, your model works.
What if you add a second sensor? Say you add an ultrasonic distance sensor to measure bin fill level. What new model files would you need? What changes in the existing ones? What stays the same? (Hint: the structure of observations does not change you just add a new sensor entity and maybe new terms in the context.)
What if another team uses a completely different sensor but follows the same context structure? Could a downstream application process both teams’ data without modification?
What properties did you not include that a real deployment would need? Think about maintenance schedules, battery levels, firmware versions, calibration dates. You do not need to add them all — but identifying what is missing is part of good modeling.

You do not need to implement these, just think about them.

Report questions

Answer the following in your labs/lab05/README.md after the implementation is complete.

Modeling decisions

RQ1: Which vocabularies/ontologies did you use across your models? Why did you choose them over alternatives?
RQ2: What properties did you include in your sensor description? Which ones came from standard vocabularies and which ones did you define yourself?
RQ3: What properties did you include in your wastebin description? How did you decide what to include and what to leave out?
RQ4: How did you model the relationships between sensor, wastebin, and environment? Show the relevant @id references from each JSON-LD file.
RQ5: Were there properties you wanted to include but could not find a standard term for? How did you handle them?

Context and namespace

RQ6: Show your complete @context and explain each mapping. For each field, why did you choose that particular standard term (or why did you define a custom one)?
RQ7: How did you define your custom namespace? What URL did you use and why?
RQ8: Take one field from your old pipeline output (e.g., event_time). What did it mean before? What does it mean now that it is mapped to a standard term? What is the practical difference?
RQ9: What is the role of @context in JSON-LD? What happens if you remove it is the JSON still valid? Is it still self-describing?
RQ10: How did you handle the @context in your streaming JSONL pipeline, inline, external reference, or something else? What are the trade-offs of your choice?

The diagram

RQ11: Include your entity-relationship diagram in the report. Explain the diagram briefly, what are the entities, what are the key relationships, and how does an observation connect to the rest of the model?

Interoperability and extensibility

RQ12: Another team uses a different motion sensor (e.g., microwave radar instead of PIR) but follows the same JSON-LD context. Could a downstream application process both teams’ data without modification? Why or why not?
RQ13: You need to add an ultrasonic distance sensor to measure bin fill level. What new JSON-LD files would you create? What would you change in existing files? What would stay the same?
RQ14: What properties are missing from your models that a real-world deployment would need? Name at least three and explain why they matter.
RQ15: Look at one domain-specific data model repository (e.g., SAREF, Smart Data Models, SSN). Find a model related to waste management, sensors, or smart buildings. How does it compare to what you built?

Reflection

RQ16: In the DIKW pyramid from the lecture, where does your raw Lab 03 JSONL output sit? Where does the JSON-LD version sit? What moved it up the pyramid?
RQ17: In your own words, what is the difference between data that works and data that communicates information?
RQ18: If you had to explain to a non-technical person why your pipeline now produces “better” data, what would you say?

Project hint: Smart Wastebin

The models you built in this lab are the foundation for the project. The wastebin model you started here will grow. Think about what else it needs: a fill level property? , a lid state (open/closed)?, a collection history (when was it last emptied?)?, an alert threshold (send a notification when threshold% full)?. What does it make sense to include in your data model?

The entity diagram you drew is the beginning of your project’s data architecture. As you add components, update the diagram. It is much cheaper to change a diagram than to refactor a whole schema.

What should be finished before you finish this lab

Before the end you should have: created sensor.jsonld with detailed sensor properties, created wastebin.jsonld with bin properties and sensor links, created environment.jsonld with location properties and wastebin/sensor links, created context.jsonld with field mappings including a custom namespace, modified run_pipeline.py so output events reference the models, run the pipeline and verified the output, drawn an entity-relationship diagram, updated labs/lab05/README.md with code, models, diagram, and report answers, and pushed to GitHub.

Final checklist (Lab 05)

models/sensor.jsonld created with detailed properties
models/wastebin.jsonld created with bin description and sensor links
models/environment.jsonld created with location and spatial context
All three models reference each other via @id links
models/context.jsonld maps all pipeline fields to vocabulary terms
Custom namespace defined for project-specific terms (using team GitHub URL or similar)
At least one standard vocabulary used (schema.org, SOSA/SSN, SAREF, or other)
run_pipeline.py modified — output events reference sensor and environment by @id
Output events include @context (inline or reference)
Pipeline runs and produces valid JSONL output
Entity-relationship diagram created and included in README
labs/lab05/README.md contains code, models, diagram, and report answers
Commit and push completed

Deliverables and submission

What must exist in the repository (by end of lab)

/
├── README.md
├── docs/
│   └── ontology.md
├── labs/
│   ├── lab01/
│   ├── lab02/
│   ├── lab03/
│   ├── lab04/
│   └── lab05/
│       ├── README.md
│       ├── models/
│       │   ├── sensor.jsonld
│       │   ├── wastebin.jsonld
│       │   ├── environment.jsonld
│       │   └── context.jsonld
│       ├── requirements.txt
│       ├── run_pipeline.py
│       └── pirlib/
│           ├── __init__.py
│           ├── sampler.py
│           └── interpreter.py

Do not include:

venv/
__pycache__/
*.pyc
output/ or *.jsonl
large temporary files unless explicitly requested

What `labs/lab05/README.md` must contain

Two clearly separated parts:

Code / runbook — include your JSON-LD models (or relevant excerpts) and the entity diagram directly in the README so the report is self-contained
Answers to report questions

Same style as previous labs.

End of lab session — GitHub checkpoint

Before leaving:

commit your progress
push to your team GitHub repository

Minimum expectation:

all deliverables tracked by Git
latest commit pushed
commit message is clear

Before next lab — eClass submission

Submit both:

Code archive (.zip)
PDF export of labs/lab05/README.md

Required PDF filename format:

lab05_REPORT_<team>.pdf

What follows is a greek version of the same lab

Εργαστήριο 05 — Από Ακατέργαστα Δεδομένα Αισθητήρα σε Δομημένα Μοντέλα

Προθεσμίες:

Τέλος εργαστηριακής συνεδρίας (GitHub checkpoint): κάντε commit & push την πρόοδό σας στο repo της ομάδας σας.
Πριν το επόμενο εργαστήριο (υποβολή eClass): ανεβάστε (1) ένα .zip με τον κώδικά σας και (2) ένα PDF export του labs/lab05/README.md.

Περιεχόμενα υποβολής:

(1) ένα .zip με τον κώδικά σας, και
(2) ένα PDF export του labs/lab05/README.md.

Εισαγωγή στο τι πρέπει να κάνετε

Δείτε μια εγγραφή από την έξοδο του pipeline σας:

{"event_time": "2026-04-10T14:32:01.123Z", "device_id": "pir-01", "event_type": "motion", "motion_state": "detected", "seq": 7, "run_id": "abc123", "ingest_time": "2026-04-10T14:32:01.130Z", "pipeline_latency_ms": 7.0}

Αυτό λειτουργεί για το pipeline σας, αλλά είναι εντελώς αδιαφανές για οποιονδήποτε άλλο. Τι είναι το pir-01; Κωδικός προϊόντος; Αριθμός δωματίου (Πώς αποφασίσατε για το device_id); Διεύθυνση IP; Τι σημαίνει motion_state: detected, κινήθηκε κάτι ή ανίχνευσε ο αισθητήρας ότι δεν κινείται τίποτα; Σε ποιες μονάδες είναι το pipeline_latency_ms, είναι αυτό προφανές ή βασίζεστε στο όνομα του πεδίου για να μεταφέρει αυτή την πληροφορία;

Σε αυτό το εργαστήριο θα μοντελοποιήσετε τα στοιχεία από τα οποία αποτελείται το σύστημά σας (τον αισθητήρα, τον κάδο απορριμμάτων στον οποίο είναι τοποθετημένος, τον χώρο στον οποίο βρίσκεται, και τις παρατηρήσεις που παράγει) χρησιμοποιώντας JSON-LD. Στη συνέχεια θα τροποποιήσετε το pipeline σας έτσι ώστε τα εξερχόμενα events να περιέχουν αναφορές σε αυτά τα μοντέλα. Το αποτέλεσμα είναι δεδομένα που περιγράφουν τον εαυτό τους: οποιοσδήποτε διαβάζει την έξοδό σας μπορεί να ακολουθήσει τους συνδέσμους και να καταλάβει ποια συσκευή τα παρήγαγε, πού ήταν αναπτυγμένη, σε τι είναι συνδεδεμένη, και τι σημαίνουν οι τιμές.

Θα εργαστείτε σε πέντε στάδια: μοντελοποίηση του αισθητήρα, μοντελοποίηση του κάδου και του περιβάλλοντος, σχεδιασμός ενός σημασιολογικού @context, σύνδεση της εξόδου του pipeline με αυτά τα μοντέλα, και τέλος παραγωγή ενός διαγράμματος που δείχνει οντότητες, ιδιότητες και σχέσεις.

Δημιουργήστε την ακόλουθη δομή:

/
├── README.md
├── docs/
│   └── ontology.md
├── labs/
│   ├── lab01/
│   ├── lab02/
│   ├── lab03/
│   ├── lab04/
│   └── lab05/
│       ├── README.md
│       ├── models/
│       │   ├── sensor.jsonld
│       │   ├── wastebin.jsonld
│       │   ├── environment.jsonld
│       │   └── context.jsonld
│       ├── requirements.txt
│       ├── run_pipeline.py
│       └── pirlib/
│           ├── __init__.py
│           ├── sampler.py
│           └── interpreter.py

Αντιγράψτε τον κώδικα του pipeline σας από το Εργαστήριο 04 (ή το Εργαστήριο 03 αν προτιμάτε, καθώς η λογική του pipeline είναι η ίδια). Οι αλλαγές που θα κάνετε θα αφορούν τη μορφή εξόδου και τα νέα αρχεία μοντέλων.

Μοντελοποίηση του αισθητήρα

Ξεκινήστε περιγράφοντας τον ίδιο τον αισθητήρα. Όχι τα δεδομένα που παράγει, αλλά τη φυσική συσκευή. Τι είναι; Τι μετρά; Ποια είναι τα χαρακτηριστικά του; Σε τι είναι συνδεδεμένος;

Δημιουργήστε το models/sensor.jsonld. Σκεφτείτε τι θα χρειαζόταν να γνωρίζει κάποιος που δεν έχει δει ποτέ το project σας για αυτή τη συσκευή.

Παρακάτω είναι ένα ελάχιστο σημείο εκκίνησης που πρέπει να συμπληρώσετε (ή να αφαιρέσετε ό,τι δεν χρησιμοποιείτε) και να επεκτείνετε σημαντικά:

{
  "@context": {
    "@vocab": "https://schema.org/",
    "sosa": "http://www.w3.org/ns/sosa/",
    "ssn": "http://www.w3.org/ns/ssn/",
    "lab801": "your own context?"
  },
  "@id": "urn:your_sensor_id",
  "@type": "Something",
  "name": "name for the sensor",
  "description": "HC-SR501 passive infrared motion sensor ...",
  ...
}

Μια καλή περιγραφή αισθητήρα πρέπει να περιλαμβάνει πολύ περισσότερα. Διαβάστε το datasheet του αισθητήρα σας και σκεφτείτε τι είναι σημαντικό:

Ταυτότητα: ποιο είναι το μοναδικό του ID; Ποιο είναι το όνομα/αριθμός μοντέλου του κατασκευαστή;
Αρχή αίσθησης: ποιο φυσικό φαινόμενο ανιχνεύει; (παθητική υπέρυθρη ακτινοβολία από θερμά σώματα)
Τι παρατηρεί: κίνηση; παρουσία; κατάληψη; Να είστε ακριβείς, αυτά είναι διαφορετικά πράγματα.
Σύνδεση υλικού: ποιο pin GPIO; Σε ποια τάση λειτουργεί;
Χαρακτηριστικά ανίχνευσης: ποια είναι η εμβέλεια ανίχνευσης; Γωνία ανίχνευσης; Ποιος είναι ο ελάχιστος χρόνος μεταξύ ανιχνεύσεων (το cooldown που υλοποιήσατε στο Εργαστήριο 02);
Συνθήκες λειτουργίας: σε ποιο εύρος θερμοκρασίας λειτουργεί; Μόνο εσωτερικός χώρος ή και εξωτερικός;
Ανάπτυξη: σε τι είναι τοποθετημένος; Πού είναι αναπτυγμένος; (Θα συνδέσετε εδώ με τα μοντέλα κάδου και περιβάλλοντος.)
Κατάσταση: είναι αυτή τη στιγμή ενεργός; Πότε εγκαταστάθηκε;

Δεν θα αντιστοιχούν όλα αυτά σε υπάρχοντες όρους λεξιλογίου, και αυτό είναι εντάξει — μπορείτε να επιστρέψετε σε αυτό όταν ορίσετε τους δικούς σας όρους για ιδιότητες ειδικές στο project αργότερα στο εργαστήριο. Προς το παρόν, συμπεριλάβετέ τα χρησιμοποιώντας ό,τι ονόματα πεδίων έχουν νόημα.

Μερικά σχόλια για τη δομή JSON-LD:

Το @context ορίζει τα λεξιλόγια από τα οποία αντλείτε. Φυσικά δεν είστε περιορισμένοι σε ένα μόνο — μπορείτε να αναμίξετε (δείτε το παράδειγμα όπου έχουμε πολλαπλά αναμεμιγμένα).

Δείτε τι άλλο είναι διαθέσιμο (μερικά παραδείγματα εδώ, μπορείτε να βρείτε και άλλα online):

schema.org — γενικής χρήσης, ευρέως χρησιμοποιούμενο, κατάλληλο για ονόματα, περιγραφές, τοποθεσίες
SOSA/SSN (https://www.w3.org/TR/vocab-ssn/) — σχεδιασμένο ειδικά για αισθητήρες και παρατηρήσεις
SAREF (https://saref.etsi.org/) — πρότυπο ETSI για έξυπνες συσκευές και IoT
Smart Data Models (https://smartdatamodels.org/) — έτοιμα μοντέλα για πολλούς τομείς IoT

Περιηγηθείτε σε αυτά. Δείτε τι ταιριάζει. Επιλέξτε αυτό που έχει νόημα για τον αισθητήρα σας και αιτιολογήστε τις επιλογές σας στην αναφορά.

Το @id είναι το μοναδικό αναγνωριστικό για αυτή την οντότητα. Η χρήση ενός URN όπως urn:dev:team-05:pir-01 το καθιστά παγκοσμίως μονοσήμαντο. Αυτό είναι το ID που θα αναφέρουν οι εγγραφές observations σας αργότερα — όταν ένα motion event λέει “παράχθηκα από το urn:dev:team-05:pir-01”, οποιοσδήποτε μπορεί να αναζητήσει αυτό το ID και να βρει την πλήρη περιγραφή του αισθητήρα.

Μοντελοποίηση του κάδου απορριμμάτων και του περιβάλλοντος

Ο αισθητήρας σας δεν υπάρχει στο κενό. Είναι τοποθετημένος σε κάτι (έναν κάδο απορριμμάτων) και αυτό το κάτι βρίσκεται κάπου (σε ένα δωμάτιο, ένα κτίριο, μια ζώνη campus). Αυτές είναι ξεχωριστές οντότητες με τις δικές τους ιδιότητες και τις δικές τους σχέσεις.

Ο κάδος απορριμμάτων

Δημιουργήστε το models/wastebin.jsonld. Σκεφτείτε τι χρειάζεται να περιγράψει μια οντότητα κάδου απορριμμάτων:

Ταυτότητα: μοναδικό ID, ένα αναγνώσιμο από άνθρωπο όνομα ή ετικέτα
Φυσικές ιδιότητες: χωρητικότητα (σε λίτρα), υλικό, διαστάσεις, χρώμα
Λειτουργικές ιδιότητες: τι είδος αποβλήτων δέχεται; Σε ποια ζώνη συλλογής ανήκει; Ποια είναι η τρέχουσα κατάστασή του (ενεργό, γεμάτο, συντήρηση);
Αισθητήρες που είναι τοποθετημένοι σε αυτό: σύνδεσμος προς τον αισθητήρα PIR (φανταστείτε ότι το έχετε με τρόπο που υποστηρίζει και άλλους [σε ένα πραγματικό project ίσως έχετε περισσότερους αισθητήρες στο μέλλον])
Τοποθεσία: πού βρίσκεται; (σύνδεσμος προς το περιβάλλον)

{
  "@context": {
    "@vocab": "https://schema.org/",
    "sosa": "http://www.w3.org/ns/sosa/",
    etc... 
  },
  "@id": "urn:something",
  "@type": "what type?",
  "name": "what name",
  "description": "describe it"
}

Επεκτείνετέ το. Σκεφτείτε τι θα χρειαζόταν να γνωρίζει μια υπηρεσία αποκομιδής αποβλήτων για αυτόν τον κάδο. Σκεφτείτε τι θα χρειαζόταν ένα σύστημα διαχείρισης του campus. Όσες περισσότερες ιδιότητες προσθέτετε, τόσο πιο χρήσιμο γίνεται το μοντέλο.

Το περιβάλλον

Δημιουργήστε το models/environment.jsonld. Αυτό περιγράφει τον φυσικό χώρο όπου αναπτύσσεται ο κάδος απορριμμάτων.

{
  "@context": {
    "@vocab": "https://schema.org/",
    "bot": "https://w3id.org/bot#",
    etc... 
  },
  "@id": "urn:someid",
  "@type": "the type selected",
  "name": "its name",
  "description": ""
}

Σκεφτείτε:

Χωρική ιεραρχία: είναι αυτό ένα δωμάτιο μέσα σε έναν όροφο μέσα σε ένα κτίριο μέσα σε ένα campus; Μπορείτε να το μοντελοποιήσετε ως φωλιασμένες οντότητες ή ως ιδιότητες.
Συντεταγμένες τοποθεσίας: γεωγραφικό πλάτος/μήκος αν είναι εξωτερικό, αριθμός δωματίου αν είναι εσωτερικό
Τι υπάρχει σε αυτόν τον χώρο: ποιοι κάδοι απορριμμάτων αναπτύσσονται εδώ; Ποιοι αισθητήρες;
Πληροφορίες ζώνης: είναι αυτή μια περιοχή υψηλής κυκλοφορίας; Ποια διαδρομή συλλογής την καλύπτει;
Περιβαλλοντικές συνθήκες: εσωτερικός/εξωτερικός χώρος; Στεγασμένος/ακάλυπτος;

Το λεξιλόγιο bot (Building Topology Ontology, https://w3id.org/bot) είναι σχεδιασμένο για την περιγραφή χώρων και των σχέσεών τους. Το schema.org/Place είναι μια άλλη επιλογή. Υπάρχουν και άλλες. Επιλέξτε αυτό που προτιμάτε αρκεί να έχει νόημα.

Οι σχέσεις έχουν σημασία

Η πραγματική αξία αυτών των μοντέλων βρίσκεται στο πώς συνδέονται οι οντότητες:

Ο αισθητήρας είναι mountedOn (τοποθετημένος σε) τον κάδο και deployedIn (αναπτυγμένος σε) το περιβάλλον
Ο κάδος hasSensor (έχει αισθητήρα) τον αισθητήρα (και θα έχει περισσότερους αισθητήρες αργότερα) και locatedIn (βρίσκεται σε) το περιβάλλον
Το περιβάλλον contains (περιέχει) τον κάδο

Αυτοί οι σύνδεσμοι πρέπει να εμφανίζονται στα πραγματικά αρχεία JSON-LD. Για παράδειγμα, στο sensor.jsonld:

"mountedOn": "urn:wastebin:bin-01",
"deployedIn": "urn:env:lab-room-101"

Και στο wastebin.jsonld:

"hasSensor": ["urn:dev:pir-01"],
"locatedIn": "urn:env:lab-room-101"

Κάντε αυτές τις συνδέσεις αμφίδρομες όπου έχει νόημα. Όταν κοιτάζετε οποιαδήποτε οντότητα, θα πρέπει να μπορείτε να ακολουθήσετε συνδέσμους για να βρείτε τις άλλες.

Σχεδιασμός του context για παρατηρήσεις

Το pipeline σας παράγει αυτή τη στιγμή events ως εξής:

{"event_time": "2026-04-10T14:32:01.123Z", "device_id": "pir-01", "event_type": "motion", "motion_state": "detected", "seq": 7, "run_id": "abc123", "ingest_time": "2026-04-10T14:32:01.130Z", "pipeline_latency_ms": 7.0}

Αυτά τα ονόματα πεδίων είναι σημαντικά για εσάς, αλλά δεν φέρουν επίσημη σημασιολογία. Το event_time θα μπορούσε να σημαίνει οτιδήποτε — η ώρα που συνέβη το event; Η ώρα που ελήφθη; Η ώρα που επεξεργάστηκε;

Δημιουργήστε το models/context.jsonld — ένα JSON-LD context που αντιστοιχίζει τα ονόματα πεδίων του pipeline σας σε καλά ορισμένους όρους (όπως κάνατε για τα άλλα):

{
  "@context": {
    "@vocab": "https://schema.org/",
    "sosa": "http://www.w3.org/ns/sosa/",
    "xsd": "http://www.w3.org/2001/XMLSchema#",

    "event_time": {
      "@id": "something",
      "@type": "something"
    },
    "device_id": "something",
    "event_type": "@type",
    "motion_state": "something"
  }
}

Τώρα οποιοσδήποτε διαβάζει αυτό το context γνωρίζει ακριβώς τι σημαίνει το event_time — όχι λόγω του ονόματος του πεδίου, αλλά επειδή δείχνει σε έναν δημοσιευμένο, τεκμηριωμένο τυπικό όρο.

Περάστε από κάθε πεδίο στην έξοδο του pipeline σας και αντιστοιχίστε το. Ορισμένα πεδία έχουν σαφείς τυπικές αντιστοιχίσεις. Άλλα όχι. Για αυτά που δεν έχουν, πρέπει να ορίσετε τους δικούς σας όρους.

Δημιουργία του δικού σας namespace

Δεν ταιριάζει τα πάντα σε ένα τυπικό λεξιλόγιο. Τα pipeline_latency_ms, seq και run_id (και ίσως άλλα που μπορείτε να σκεφτείτε) είναι εσωτερικές έννοιες του pipeline — τα SOSA και schema.org δεν έχουν τίποτα να πουν γι’ αυτά. Γι’ αυτά, δημιουργείτε το δικό σας namespace.

Το repository σας στο GitHub είναι μια καλή βάση γι’ αυτό. Εφόσον το repository σας έχει URL, μπορείτε να το χρησιμοποιήσετε ως namespace που άλλοι άνθρωποι μπορούν πραγματικά να ακολουθήσουν για να βρουν τεκμηρίωση. (Μια μικρή επιφύλαξη: αν το repository σας είναι ιδιωτικό, κάποιος χωρίς πρόσβαση δεν θα μπορεί να διαβάσει τους ορισμούς. Για αυτή την άσκηση αυτό είναι εντάξει — αυτό που έχει σημασία είναι η πρακτική της τεκμηρίωσης των όρων σας.)

Θα δημιουργήσετε ένα απλό αρχείο Markdown στο repository σας που περιγράφει κάθε προσαρμοσμένο όρο, ώστε το namespace να μην είναι απλώς μια σύμβαση ονοματολογίας αλλά πραγματική τεκμηρίωση που μπορεί να διαβάσει οποιοσδήποτε έχει πρόσβαση.

Βήμα 1 — Επιλέξτε το URL του namespace σας

Χρησιμοποιήστε αυτό το μοτίβο:

https://github.com/<your-org>/<your-repo>/blob/main/docs/ontology.md#

Για παράδειγμα, αν το repository της ομάδας σας είναι https://github.com/iot-team-05/smart-wastebin, το namespace σας γίνεται:

https://github.com/iot-team-05/smart-wastebin/blob/main/docs/ontology.md#

Το # στο τέλος είναι σημαντικό καθώς σημαίνει ότι οι επιμέρους όροι προσαρτώνται ως fragments. Έτσι το pipeline:latencyMs επεκτείνεται σε:

https://github.com/iot-team-05/smart-wastebin/blob/main/docs/ontology.md#latencyMs

Αυτό είναι ένα πραγματικό URL που κάποιος μπορεί να ανοίξει σε πρόγραμμα περιήγησης. Αν χρησιμοποιείτε anchor headings στο έγγραφο (τα οποία το GitHub Markdown δημιουργεί αυτόματα), θα μεταβεί ακόμη και στη σωστή ενότητα.

Βήμα 2 — Δημιουργήστε το αρχείο τεκμηρίωσης

Δημιουργήστε το docs/ontology.md στο αποθετήριό σας. Αυτό το αρχείο τεκμηριώνει κάθε προσαρμοσμένο όρο που ορίζει το project σας — αυτούς που δεν προέρχονται από τυπικά λεξιλόγια όπως το SOSA ή το schema.org.

Ακολουθεί ένα παράδειγμα:

# Smart Wastebin — Προσαρμοσμένοι Όροι Οντολογίας

Βασικό namespace: `https://github.com/iot-team-05/smart-wastebin/blob/main/docs/ontology.md#`

Πρόθεμα που χρησιμοποιείται στο JSON-LD: `pipeline`

## Παράδειγμα Όρων Pipeline

### latencyMs

- **Τύπος:** `xsd:float`
- **Περιγραφή:** Χρόνος σε milliseconds μεταξύ της δημιουργίας του event από τον παραγωγό
  και της εισαγωγής από τον καταναλωτή. Μετρά πόσο χρόνο πέρασε μια εγγραφή στην ουρά.

Προσθέστε μια ενότητα για κάθε προσαρμοσμένο όρο που ορίζετε. Καθώς το project σας αναπτύσσεται και προσθέτετε νέους όρους (π.χ. για επίπεδο πληρότητας κ.λπ.), προσθέστε τους εδώ.

Βήμα 3 — Χρησιμοποιήστε το namespace στο context σας

Στο models/context.jsonld, δηλώστε το πρόθεμα που δείχνει στο αρχείο τεκμηρίωσής σας και αντιστοιχίστε όλα τα πεδία σας, τόσο τυπικά όσο και προσαρμοσμένα:

ΠΑΡΑΔΕΙΓΜΑ — ΠΑΡΑΚΑΛΩ ΣΥΜΠΛΗΡΩΣΤΕ ΟΛΑ ΤΑ ΠΕΔΙΑ ΣΑΣ

{
  "@context": {
    "sosa": "http://www.w3.org/ns/sosa/",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "pipeline": "https://github.com/iot-team-05/smart-wastebin/blob/main/docs/ontology.md#",
    "device_id": "sosa:madeBySensor",

    "seq": {
      "@id": "pipeline:sequenceNumber",
      "@type": "xsd:integer"
    },
  }
}

Τώρα το "seq": 7 δεν είναι ένας μυστηριώδης αριθμός αλλά ένα pipeline:sequenceNumber τύπου xsd:integer, και το πρόθεμα pipeline: επιλύεται σε ένα έγγραφο στο αποθετήριό σας όπου κάποιος μπορεί να διαβάσει ακριβώς τι σημαίνει αυτό.

Βήμα 4 — Ενημερώστε τη δομή του αποθετηρίου σας

Το αποθετήριό σας πρέπει τώρα να περιλαμβάνει:

/
├── docs/
│   └── ontology.md
├── labs/
│   └── lab05/
│       ├── models/
│       │   ├── context.jsonld    ← αναφέρεται στο namespace
│       │   ├── sensor.jsonld
│       │   ├── wastebin.jsonld
│       │   └── environment.jsonld
│       └── ...

Κάντε commit και push το docs/ontology.md. Μόλις βρίσκεται στο GitHub, το URL του namespace είναι ζωντανό — οποιοσδήποτε (με πρόσβαση) μπορεί να το ακολουθήσει και να διαβάσει τους ορισμούς των όρων σας. Αυτή είναι η διαφορά μεταξύ ενός namespace που είναι απλώς ένα κόλπο ονοματολογίας και ενός που λειτουργεί πραγματικά ως τεκμηρίωση.

Συμβουλές για τον ορισμό καλών προσαρμοσμένων όρων

Ονοματολογία: το pipeline:latencyMs είναι πιο σαφές από το pipeline:plm. Να είστε περιγραφικοί.
Τύποι: πάντα να δηλώνετε τον τύπο δεδομένων (xsd:float, xsd:dateTime, xsd:integer, xsd:string). Αυτό αφαιρεί την αμφισημία για το αν το "7.0" είναι string ή αριθμός.
Τεκμηρίωση: κάθε προσαρμοσμένος όρος που χρησιμοποιείτε στο context σας πρέπει να έχει αντίστοιχη καταχώριση στο docs/ontology.md. Αν ένας όρος δεν είναι τεκμηριωμένος, δεν είναι πραγματικά ορισμένος — είναι απλώς μια ετικέτα.

Ο στόχος δεν είναι να αναγκάσετε κάθε πεδίο σε ένα υπάρχον λεξιλόγιο. Τα τυπικά λεξιλόγια σας δίνουν διαλειτουργικότητα δωρεάν — αν χρησιμοποιείτε sosa:resultTime, οποιοδήποτε εργαλείο που υποστηρίζει SOSA γνωρίζει ήδη τι είναι αυτό. Οι προσαρμοσμένοι όροι σας δίνουν ακρίβεια για πράγματα ειδικά στο σύστημά σας. Χρησιμοποιήστε και τα δύο.

Σύνδεση της εξόδου του pipeline

Τώρα τροποποιήστε το run_pipeline.py έτσι ώστε τα events που παράγει να αναφέρουν τα μοντέλα σας και να συμπεριλαμβάνουν το context.

Κάθε event πρέπει να:

Αναφέρει τον αισθητήρα που το παρήγαγε (μέσω @id)
Αναφέρει τον κάδο ή το περιβάλλον στο οποίο αναφέρεται (μέσω @id)
Συμπεριλαμβάνει ή αναφέρει το @context ώστε τα δεδομένα να είναι αυτοπεριγραφικά
Χρησιμοποιεί @type για να δηλώσει τι είδους πράγμα είναι αυτή η εγγραφή

Έχετε επιλογή ως προς τον τρόπο χειρισμού του @context σε ένα pipeline JSONL ροής:

Ενσωματώστε το σε κάθε εγγραφή — απλό, κάθε γραμμή είναι αυτόνομη
Αναφερθείτε σε αρχείο ("@context": "models/context.jsonld")
Βάλτε το μια φορά στην αρχή του αρχείου εξόδου και κρατήστε τις γραμμές χωρίς αυτό — πρακτικό αλλά όχι τυπικό JSONL

Σκεφτείτε ποιο έχει νόημα. Δεν υπάρχει μία σωστή απάντηση — κάθε επιλογή έχει ανταλλαγές ως προς το μέγεθος αρχείου, τη φορητότητα και την ευκολία ανάλυσης. Συζητήστε αυτό στην αναφορά σας.

Εκτελέστε το pipeline και επαληθεύστε την έξοδο. Κάθε γραμμή πρέπει τώρα να αναφέρει έναν γνωστό αισθητήρα, ένα γνωστό περιβάλλον, και να φέρει αρκετό context ώστε κάποιος που το βλέπει για πρώτη φορά να μπορεί να καταλάβει τι περιγράφει.

Σχεδιάστε το διάγραμμα οντοτήτων

Δημιουργήστε ένα οπτικό διάγραμμα που δείχνει όλες τις οντότητές σας, τις ιδιότητές τους και τις σχέσεις μεταξύ τους. Αυτό είναι το είδος διαγράμματος που θα βάζατε στην τεκμηρίωση ενός project ή θα δείχνατε σε μια παρουσίαση.

Το διάγραμμα πρέπει να περιλαμβάνει:

Κάθε οντότητα: τον αισθητήρα, τον κάδο, το περιβάλλον (και οποιεσδήποτε υπο-οντότητες δημιουργήσατε, όπως κτίριο ή ζώνη campus)
Βασικές ιδιότητες κάθε οντότητας: όχι κάθε μεμονωμένο πεδίο, αλλά τα σημαντικά — ID, τύπος, όνομα, και οι domain-specific ιδιότητες που ορίζουν τι είναι η οντότητα
Κάθε σχέση: ποια οντότητα συνδέεται με ποια, και πώς ονομάζεται η σχέση (mountedOn, locatedIn, hasSensor, deployedIn, contains, κ.λπ.)
Η παρατήρηση: δείξτε πώς μια εγγραφή παρατήρησης συνδέεται με τον αισθητήρα και το περιβάλλον

Μπορείτε να το σχεδιάσετε όπως θέλετε — ένα χειρόγραφο σκίτσο που φωτογραφίζετε, ένα εργαλείο διαγραμμάτων (draw.io, Mermaid, PlantUML, Lucidchart), ή ακόμη και ASCII art. Αυτό που έχει σημασία είναι ότι το διάγραμμα είναι σαφές και πλήρες. Κάποιος θα πρέπει να μπορεί να το κοιτάξει και να κατανοήσει ολόκληρο το μοντέλο δεδομένων σας χωρίς να διαβάσει τα αρχεία JSON-LD.

Σκεφτείτε αυτό ως ένα από τα αρχικά μοντέλα δεδομένων, το είδος που συζητήσαμε στη διάλεξη. Δείχνει τη μεγάλη εικόνα: ποιες είναι οι οντότητες, πώς σχετίζονται, και τι τις περιγράφει.

Συμπεριλάβετε το διάγραμμα στο labs/lab05/README.md σας.

Επαλήθευση της σκέψης σας

Αφού έχετε τα μοντέλα, το ενημερωμένο pipeline και το διάγραμμα, κάντε ένα βήμα πίσω:

Μπορεί κάποιος να κατανοήσει ένα μεμονωμένο event χωρίς να διαβάσει τον κώδικά σας; Ανοίξτε μια γραμμή JSONL. Ακολουθώντας το @context και τις σχέσεις σας (π.χ. τους συνδέσμους madeBySensor, observedIn), μπορείτε να ανιχνεύσετε πίσω στην περιγραφή αισθητήρα, τον κάδο και το περιβάλλον; Αν ναι, το μοντέλο σας λειτουργεί.
Τι αν προσθέσετε έναν δεύτερο αισθητήρα; Ας πούμε ότι προσθέτετε έναν υπερηχητικό αισθητήρα απόστασης για τη μέτρηση του επιπέδου πληρότητας κάδου. Ποια νέα αρχεία μοντέλων θα χρειαζόσαστε; Τι θα αλλάζατε στα υπάρχοντα; Τι θα παρέμενε το ίδιο; (Υπόδειξη: η δομή των παρατηρήσεων δεν αλλάζει — απλώς προσθέτετε μια νέα οντότητα αισθητήρα και ίσως νέους όρους στο context.)
Τι αν μια άλλη ομάδα χρησιμοποιεί ένα εντελώς διαφορετικό αισθητήρα αλλά ακολουθεί την ίδια δομή JSON-LD context; Θα μπορούσε μια εφαρμογή downstream να επεξεργαστεί τα δεδομένα και των δύο ομάδων χωρίς τροποποίηση;
Ποιες ιδιότητες δεν συμπεριλάβατε που θα χρειαζόταν μια πραγματική ανάπτυξη; Σκεφτείτε χρονοδιαγράμματα συντήρησης, επίπεδα μπαταρίας, εκδόσεις firmware, ημερομηνίες βαθμονόμησης. Δεν χρειάζεται να τα προσθέσετε όλα — αλλά η αναγνώριση τι λείπει είναι μέρος της καλής μοντελοποίησης.

Δεν χρειάζεται να τα υλοποιήσετε, απλώς σκεφτείτε τα.

Ερωτήσεις αναφοράς

Απαντήστε στα παρακάτω στο labs/lab05/README.md σας αφού ολοκληρωθεί η υλοποίηση.

Αποφάσεις μοντελοποίησης

RQ1: Ποια λεξιλόγια/οντολογίες χρησιμοποιήσατε στα μοντέλα σας; Γιατί τα επιλέξατε έναντι εναλλακτικών;
RQ2: Ποιες ιδιότητες συμπεριλάβατε στην περιγραφή αισθητήρα σας; Ποιες προήλθαν από τυπικά λεξιλόγια και ποιες ορίσατε μόνοι σας;
RQ3: Ποιες ιδιότητες συμπεριλάβατε στην περιγραφή κάδου σας; Πώς αποφασίσατε τι να συμπεριλάβετε και τι να παραλείψετε;
RQ4: Πώς μοντελοποιήσατε τις σχέσεις μεταξύ αισθητήρα, κάδου και περιβάλλοντος; Δείξτε τις σχετικές αναφορές @id από κάθε αρχείο JSON-LD.
RQ5: Υπήρχαν ιδιότητες που θέλατε να συμπεριλάβετε αλλά δεν μπορέσατε να βρείτε τυπικό όρο; Πώς τις χειριστήκατε;

Context και namespace

RQ6: Δείξτε το πλήρες @context σας και εξηγήστε κάθε αντιστοίχιση. Για κάθε πεδίο, γιατί επιλέξατε αυτόν τον συγκεκριμένο τυπικό όρο (ή γιατί ορίσατε έναν προσαρμοσμένο);
RQ7: Πώς ορίσατε το προσαρμοσμένο namespace σας; Ποιο URL χρησιμοποιήσατε και γιατί;
RQ8: Πάρτε ένα πεδίο από την παλιά έξοδο pipeline σας (π.χ. event_time). Τι σήμαινε πριν; Τι σημαίνει τώρα που είναι αντιστοιχισμένο σε έναν τυπικό όρο; Ποια είναι η πρακτική διαφορά;
RQ9: Ποιος είναι ο ρόλος του @context στο JSON-LD; Τι συμβαίνει αν το αφαιρέσετε — είναι το JSON ακόμα έγκυρο; Είναι ακόμα αυτοπεριγραφικό;
RQ10: Πώς χειριστήκατε το @context στο pipeline JSONL ροής σας — inline, εξωτερική αναφορά ή κάτι άλλο; Ποιες είναι οι ανταλλαγές της επιλογής σας;

Το διάγραμμα

RQ11: Συμπεριλάβετε το διάγραμμα οντοτήτων-σχέσεών σας στην αναφορά. Εξηγήστε το διάγραμμα συνοπτικά — ποιες είναι οι οντότητες, ποιες είναι οι βασικές σχέσεις, και πώς συνδέεται μια παρατήρηση με το υπόλοιπο μοντέλο;

Διαλειτουργικότητα και επεκτασιμότητα

RQ12: Μια άλλη ομάδα χρησιμοποιεί διαφορετικό αισθητήρα κίνησης (π.χ. radar μικροκυμάτων αντί PIR) αλλά ακολουθεί την ίδια δομή JSON-LD context. Θα μπορούσε μια εφαρμογή downstream να επεξεργαστεί τα δεδομένα και των δύο ομάδων χωρίς τροποποίηση; Γιατί ή γιατί όχι;
RQ13: Χρειάζεται να προσθέσετε έναν υπερηχητικό αισθητήρα απόστασης για τη μέτρηση του επιπέδου πληρότητας κάδου. Ποια νέα αρχεία JSON-LD θα δημιουργούσατε; Τι θα αλλάζατε στα υπάρχοντα αρχεία; Τι θα παρέμενε το ίδιο;
RQ14: Ποιες ιδιότητες λείπουν από τα μοντέλα σας που θα χρειαζόταν μια πραγματική ανάπτυξη; Ονομάστε τουλάχιστον τρεις και εξηγήστε γιατί έχουν σημασία.
RQ15: Κοιτάξτε ένα αποθετήριο μοντέλων δεδομένων ειδικών τομέων (π.χ. SAREF, Smart Data Models, SSN). Βρείτε ένα μοντέλο σχετικό με διαχείριση αποβλήτων, αισθητήρες ή έξυπνα κτίρια. Πώς συγκρίνεται με αυτό που δημιουργήσατε;

Αναστοχασμός

RQ16: Στην πυραμίδα DIKW από τη διάλεξη, πού βρίσκεται η ακατέργαστη έξοδος JSONL του Εργαστηρίου 03; Πού βρίσκεται η έκδοση JSON-LD; Τι τη μετέφερε ψηλότερα στην πυραμίδα;
RQ17: Με δικά σας λόγια, ποια είναι η διαφορά μεταξύ δεδομένων που λειτουργούν και δεδομένων που επικοινωνούν πληροφορίες;
RQ18: Αν έπρεπε να εξηγήσετε σε ένα μη-τεχνικό άτομο γιατί το pipeline σας παράγει τώρα “καλύτερα” δεδομένα, τι θα λέγατε;

Υπόδειξη project: Έξυπνος Κάδος Απορριμμάτων

Τα μοντέλα που δημιουργήσατε σε αυτό το εργαστήριο αποτελούν τα θεμέλια για το project. Το μοντέλο κάδου που ξεκινήσατε εδώ θα αναπτυχθεί. Σκεφτείτε τι άλλο χρειάζεται: μια ιδιότητα επιπέδου πληρότητας; μια κατάσταση καπακιού (ανοιχτό/κλειστό); ένα ιστορικό συλλογής (πότε αδειάστηκε τελευταία φορά); ένα κατώφλι ειδοποίησης (αποστολή ειδοποίησης όταν το κατώφλι% είναι γεμάτο); Τι έχει νόημα να συμπεριλάβετε στο μοντέλο δεδομένων σας;

Το διάγραμμα οντοτήτων που σχεδιάσατε είναι η αρχή της αρχιτεκτονικής δεδομένων του project σας. Καθώς προσθέτετε στοιχεία, ενημερώνετε το διάγραμμα. Είναι πολύ φθηνότερο να αλλάξεις ένα διάγραμμα από το να αναδομήσεις ένα ολόκληρο schema.

Τι πρέπει να έχει ολοκληρωθεί πριν τελειώσετε αυτό το εργαστήριο

Πριν το τέλος θα πρέπει να έχετε: δημιουργήσει το sensor.jsonld με λεπτομερείς ιδιότητες αισθητήρα, δημιουργήσει το wastebin.jsonld με ιδιότητες κάδου και συνδέσμους αισθητήρα, δημιουργήσει το environment.jsonld με ιδιότητες τοποθεσίας και συνδέσμους κάδου/αισθητήρα, δημιουργήσει το context.jsonld με αντιστοιχίσεις πεδίων συμπεριλαμβανομένου ενός προσαρμοσμένου namespace, τροποποιήσει το run_pipeline.py ώστε τα εξερχόμενα events να αναφέρουν τα μοντέλα, εκτελέσει το pipeline και επαληθεύσει την έξοδο, σχεδιάσει ένα διάγραμμα οντοτήτων-σχέσεων, ενημερώσει το labs/lab05/README.md με κώδικα, μοντέλα, διάγραμμα και απαντήσεις αναφοράς, και κάνει push στο GitHub.

Τελική λίστα ελέγχου (Εργαστήριο 05)

Δημιουργήθηκε το models/sensor.jsonld με λεπτομερείς ιδιότητες
Δημιουργήθηκε το models/wastebin.jsonld με περιγραφή κάδου και συνδέσμους αισθητήρα
Δημιουργήθηκε το models/environment.jsonld με τοποθεσία και χωρικό context
Και τα τρία μοντέλα αναφέρονται μεταξύ τους μέσω συνδέσμων @id
Το models/context.jsonld αντιστοιχίζει όλα τα πεδία pipeline σε όρους λεξιλογίου
Ορίστηκε προσαρμοσμένο namespace για project-specific όρους (χρησιμοποιώντας GitHub URL ομάδας ή παρόμοιο)
Χρησιμοποιήθηκε τουλάχιστον ένα τυπικό λεξιλόγιο (schema.org, SOSA/SSN, SAREF ή άλλο)
Τροποποιήθηκε το run_pipeline.py — τα εξερχόμενα events αναφέρουν αισθητήρα και περιβάλλον μέσω @id
Τα εξερχόμενα events συμπεριλαμβάνουν @context (inline ή αναφορά)
Το pipeline εκτελείται και παράγει έγκυρη έξοδο JSONL
Δημιουργήθηκε διάγραμμα οντοτήτων-σχέσεων και συμπεριλήφθηκε στο README
Το labs/lab05/README.md περιέχει κώδικα, μοντέλα, διάγραμμα και απαντήσεις αναφοράς
Ολοκληρώθηκε commit και push

Παραδοτέα και υποβολή

Τι πρέπει να υπάρχει στο αποθετήριο (μέχρι το τέλος του εργαστηρίου)

/
├── README.md
├── docs/
│   └── ontology.md
├── labs/
│   ├── lab01/
│   ├── lab02/
│   ├── lab03/
│   ├── lab04/
│   └── lab05/
│       ├── README.md
│       ├── models/
│       │   ├── sensor.jsonld
│       │   ├── wastebin.jsonld
│       │   ├── environment.jsonld
│       │   └── context.jsonld
│       ├── requirements.txt
│       ├── run_pipeline.py
│       └── pirlib/
│           ├── __init__.py
│           ├── sampler.py
│           └── interpreter.py

Μην συμπεριλάβετε:

venv/
__pycache__/
*.pyc
output/ ή *.jsonl
μεγάλα προσωρινά αρχεία εκτός αν ζητηθεί ρητά

Τι πρέπει να περιέχει το `labs/lab05/README.md`

Δύο σαφώς διαχωρισμένα μέρη:

Κώδικας / runbook — συμπεριλάβετε τα μοντέλα JSON-LD σας (ή σχετικά αποσπάσματα) και το διάγραμμα οντοτήτων απευθείας στο README ώστε η αναφορά να είναι αυτόνομη
Απαντήσεις στις ερωτήσεις αναφοράς

Ίδιο στυλ με τα προηγούμενα εργαστήρια.

Τέλος εργαστηριακής συνεδρίας — GitHub checkpoint

Πριν φύγετε:

κάντε commit την πρόοδό σας
κάντε push στο repository GitHub της ομάδας σας

Ελάχιστη προσδοκία:

όλα τα παραδοτέα να παρακολουθούνται από το Git
το τελευταίο commit να έχει γίνει push
το commit message να είναι σαφές

Πριν το επόμενο εργαστήριο — υποβολή eClass

Υποβάλλετε και τα δύο:

Αρχείο κώδικα (.zip)
PDF export του labs/lab05/README.md

Απαιτούμενη μορφή ονόματος αρχείου PDF:

lab05_REPORT_<team>.pdf

Context-aware Data Modeling