Data Engineer | MedAware

Role Overview

We are seeking an experienced Data Engineer to join our growing team. In this role, you will design, build, and maintain our data infrastructure – from robust data pipelines and warehousing solutions to data validation and standardization processes. You will work closely with multidisciplinary teams (Data Science, R&D, Clinical, Product) to ensure accurate, secure, and high-quality data flows that power both real-time production alerts and in-depth clinical research.

Key Responsibilities

Data Warehousing & Architecture

Design and maintain scalable data warehouse/data lake solutions for both production and research environments.
Set up easy-to-compare datasets between raw data, production server data, and research environment data.
Enable deep-dive data exploration for clinicians, data scientists, and researchers.

ETL Pipelines

Develop, optimize, and monitor ETL workflows to ingest healthcare data from multiple sources (HL7, FHIR, custom APIs).
Implement robust data validation processes (e.g., schema checks, quality checks) at each stage of the pipeline.
Troubleshoot data ingestion or mapping errors, ensuring reliable flows for both real-time and batch processes.

Data Standardization & Mapping

Build and maintain mappers to standardize clinical codes for drugs (ATC), lab tests (LOINC), diagnoses (ICD), vitals, units, and frequencies.
Collaborate with clinical experts and product teams to ensure correct mapping of care areas, dosage forms, and other healthcare specifics.

Data Validation & Deployment

Oversee data validation as part of the deployment process, comparing new data feeds or rule updates to historical baselines.
Implement real-time validation and anomaly detection techniques (e.g., out-of-range vitals, unknown drug codes) to ensure data integrity.

Collaboration & Research Support

Work with Data Scientists and Researchers to support the creation of statistical rules, integration of knowledge bases (Medi-Span), and iterative development of medication alerts.
Facilitate a smooth handoff from the research environment to production, ensuring minimal disruption and maximum accuracy.

Qualifications

Education:

Bachelor’s or Master’s in Computer Science, Software Engineering, Information Systems, or a related field.

Experience:

3+ years in data engineering or a related role (healthcare/clinical analytics experience is a plus).

Technical Skills:

Python programming skills are required
Proficiency in SQL, Pandas library, Scala and/or Java is an advantage
Hands-on experience with ETL frameworks is an advantage
Familiarity with cloud-based platforms for data warehousing is an advantage

Data Modeling:

Understanding of data modeling practices

Version Control & DevOps:

Knowledge of Git, CI/CD pipelines, and containerization (Docker) is an advantage.

Soft Skills:

Excellent problem-solving abilities, communication skills, and a collaborative mindset.