Annotation Tool
The Annotation Tool is a proposed tool to allow experts to semantically annotate free text in ePIs with standard terminology labels, serving as manual preprocessing or generating training data for NLP models.
Purpose
Enables human experts to:
- Manually annotate ePI text
- Tag sections with standard terminologies
- Generate training datasets for NLP services
- Validate Preprocessor output
- Create gold-standard annotations
Features
Text Selection
- Highlight text spans in ePI narrative
- Select existing HTML elements (
<p>,<h1>, etc.) - Create new semantic boundaries
- Nested annotation support
Terminology Search
Find and apply codes from:
- SNOMED-CT: Clinical concepts
- ICPC-2: Primary care classification
- LOINC: Laboratory observations
- ATC: Medication classification
Features:
- Free-text search
- Hierarchy browser
- Recent/favorite codes
- Synonym matching
Annotation Management
Create Annotations
- Select text span
- Search terminology
- Choose concept code
- Define elementClass name
- Preview in context
- Save annotation
Edit Annotations
- Modify concept code
- Adjust text boundaries
- Change elementClass
- Add notes/rationale
Validate Annotations
- Check for overlaps
- Verify code appropriateness
- Review completeness
- Export validation report
Output Formats
Generate p(ePI) compatible output:
- HtmlElementLink extensions
- HTML with class attributes
- FHIR Bundle JSON
- Training dataset format
Collaboration Features
Support multiple annotators:
- Assign sections to experts
- Track annotation progress
- Compare inter-annotator agreement
- Resolve conflicts
- Consensus building
Training Data Export
For NLP services:
- Export as IOB/BIO format
- JSON-LD for entity recognition
- Custom ML framework formats
- Split train/test/validation sets
User Roles
Medical Expert
- Annotate clinical concepts
- Validate terminology selection
- Review automated annotations
Content Curator
- Manage annotation projects
- Assign tasks
- Monitor progress
ML Engineer
- Export training data
- Validate dataset quality
- Integrate with NLP pipelines
Integration
Connects to:
- FHIR Server for ePI retrieval
- Terminology services (SNOMED-CT, ICPC-2)
- Preprocessor validation
- NLP Services training pipelines
- SM Tool for Supporting Material
- Keycloak for access control
Quality Assurance
Features:
- Inter-annotator agreement metrics (Cohen's Kappa)
- Annotation guidelines enforcement
- Consistency checks
- Expert review workflow
Related Concepts
- ePI - Annotated content
- p(ePI) - Output format
- Standard Terminologies - Applied codes
- Preprocessor - Validated/trained
- NLP Services - Training data consumer
- SM Tool - Related tagging tool