Preprocessor
A Preprocessor is a pluggable service that semantically annotates raw ePIs using standard terminologies (like SNOMED-CT or ICPC-2) to generate a Preprocessed ePI (p(ePI)).
Purpose
Preprocessors transform raw ePI into p(ePI) by:
- Reading ePI narrative text
- Identifying medically relevant sections
- Annotating with standard terminology codes
- Embedding annotations as HTML class attributes
Technical Implementation
Required Endpoint
Must implement /preprocess REST endpoint:
- Input: ePI Bundle (FHIR JSON)
- Output: p(ePI) Bundle (FHIR JSON)
- Specification: OpenAPI definition
Service Discovery
Must be deployed with Kubernetes label:
eu.gravitate-health.fosps.preprocessing=true
This enables Focusing Manager auto-discovery.
Containerization
- Packaged as Docker containers
- Deployed via Kubernetes
- Language/framework agnostic
Annotation Mechanism
1. Add HtmlElementLink Extensions
Preprocessors add HtmlElementLink extensions to the Composition:
{
"extension": [{
"url": "elementClass",
"valueString": "pregnancy-warning"
}, {
"url": "concept",
"valueCodeableConcept": {
"coding": [{
"system": "http://snomed.info/sct",
"code": "77386006"
}]
}
}]
}
2. Modify HTML Narrative
Add class attributes to HTML tags in Composition.text.div:
<p class="pregnancy-warning">
Do not use during pregnancy.
</p>
Stacking Support
Preprocessors must handle stacking:
- Input may already be p(ePI) from previous preprocessor
- Avoid duplicate annotations
- Append to existing class attributes (don't nest tags)
- Check for existing HtmlElementLink extensions
HTML Modification Rules
Allowed:
- Adding HTML class attributes
- Wrapping text in
<span>tags - Adding classes to existing tags (
<p>,<h1>,<li>, etc.)
Prohibited:
- Removing content
- Changing narrative text
- Deleting HTML elements
- Modifying approved wording
Performance Considerations
Preprocessors should handle:
- Heavy computation: NLP models, semantic analysis
- Slow operations: External API calls, database lookups
This allows Lenses to remain lightweight and fast.
Related Concepts
- p(ePI) - Output format
- ePI - Input content
- Standard Terminologies - Annotation codes
- Focusing Manager - Orchestrator
- Lens - Consumes annotations
- Focusing - Overall workflow