Role Overview
We are seeking an experienced Data Engineer to join our growing team. In this role, you will design, build, and maintain our data infrastructure – from robust data pipelines and warehousing solutions to data validation and standardization processes. You will work closely with multidisciplinary teams (Data Science, R&D, Clinical, Product) to ensure accurate, secure, and high-quality data flows that power both real-time production alerts and in-depth clinical research.
Key Responsibilities
Data Warehousing & Architecture
- Design and maintain scalable data warehouse/data lake solutions for both production and research environments.
- Set up easy-to-compare datasets between raw data, production server data, and research environment data.
- Enable deep-dive data exploration for clinicians, data scientists, and researchers.
ETL Pipelines
- Develop, optimize, and monitor ETL workflows to ingest healthcare data from multiple sources (HL7, FHIR, custom APIs).
- Implement robust data validation processes (e.g., schema checks, quality checks) at each stage of the pipeline.
- Troubleshoot data ingestion or mapping errors, ensuring reliable flows for both real-time and batch processes.
Data Standardization & Mapping
- Build and maintain mappers to standardize clinical codes for drugs (ATC), lab tests (LOINC), diagnoses (ICD), vitals, units, and frequencies.
- Collaborate with clinical experts and product teams to ensure correct mapping of care areas, dosage forms, and other healthcare specifics.
Data Validation & Deployment
- Oversee data validation as part of the deployment process, comparing new data feeds or rule updates to historical baselines.
- Implement real-time validation and anomaly detection techniques (e.g., out-of-range vitals, unknown drug codes) to ensure data integrity.
Collaboration & Research Support
- Work with Data Scientists and Researchers to support the creation of statistical rules, integration of knowledge bases (Medi-Span), and iterative development of medication alerts.
- Facilitate a smooth handoff from the research environment to production, ensuring minimal disruption and maximum accuracy.
Qualifications
Education:
Bachelor’s or Master’s in Computer Science, Software Engineering, Information Systems, or a related field.
Experience:
3+ years in data engineering or a related role (healthcare/clinical analytics experience is a plus).
Technical Skills:
- Python programming skills are required
- Proficiency in SQL, Pandas library, Scala and/or Java is an advantage
- Hands-on experience with ETL frameworks is an advantage
- Familiarity with cloud-based platforms for data warehousing is an advantage
Data Modeling:
Understanding of data modeling practices
Version Control & DevOps:
Knowledge of Git, CI/CD pipelines, and containerization (Docker) is an advantage.
Soft Skills:
Excellent problem-solving abilities, communication skills, and a collaborative mindset.