Harshitha Data Engineer
Mail: *****************@*****.*** Mobile: +1 31526275 LinkedIn
PROFESSIONAL SUMMARY
Results-driven Data Engineer with 2+ years of experience developing and optimizing data pipelines across IT and Healthcare domains. Proficient in Python, SQL, PySpark, Databricks, AWS, and Azure, with expertise in building scalable ETL workflows and production- ready data solutions. Skilled at collaborating with cross-functional teams to streamline architecture, enhance data quality, and deliver actionable insights that support business decisions. TECHNICAL SKILLS
• Data Engineering: ETL/ELT Development, Data Modeling, Data Pipelines, Schema Evolution, Data Lakehouse Architecture, Data Quality & Governance
• Programming & Scripting: Python, SQL, PySpark, Bash, JSON, YAML
• Cloud Platforms: Azure (ADF, Synapse Analytics, ADLS Gen2, Key Vault, Azure Functions), AWS (S3, Redshift, Glue, EMR, Lambda)
• Data Tools & Frameworks: Databricks, Apache Airflow, Kafka, dbt, Delta Lake, Apache Spark, Hive, Hudi, PostgreSQL, SSIS
• DevOps & Automation: Git, GitHub Actions, CI/CD, Terraform (IaC), Jenkins, Docker (Basic), CloudWatch, Azure Monitor
• Visualization & BI: Power BI, Tableau (Basic), Looker (Beginner)
• Methodologies & Standards: Agile/Scrum, DataOps, DevOps, SDLC, HIPAA Compliance PROFESSIONAL EXPERIENCE
Harmonecare Detriot, MI
Data Engineer Oct 2024 - Present
• Developed and maintained ETL/ELT pipelines using PySpark and Apache Airflow to ingest, process, and transform healthcare data from diverse sources, improving data accuracy, analytics readiness, and reducing reporting timelines.
• Integrated HL7/FHIR clinical data into Amazon S3 using Databricks notebooks, applying JSON-based schema definitions to ensure interoperability between EMR systems and downstream applications.
• Designed a Lakehouse architecture leveraging Delta Lake and Amazon Redshift to centralize storage, support transformations, and provide a HIPAA-compliant environment for analytics and regulatory use cases.
• Developed reusable Python scripts for data validation, profiling, and anomaly detection, helping the team automate quality checks and reduce manual testing efforts across pipelines.
• Built secure data ingestion pipelines from on-prem SQL Server to Amazon S3, using parameterized workflows and governance practices to ensure consistency, reusability, and compliance.
• Implemented data partitioning strategies and Parquet formats in S3 and Delta Lake, which reduced storage costs and optimized query performance for large analytical workloads.
• Collaborated with data science teams to preprocess and curate EHR datasets, exposing model-ready features that supported clinical risk modeling and predictive analytics initiatives.
• Set up monitoring and alerting frameworks using AWS CloudWatch and Lambda, providing visibility into pipeline health and enabling faster resolution of failures.
• Automated handling of schema drift and source changes by developing metadata-driven PySpark transformations, which lowered downtime and manual intervention during updates.
• Documented data flows, governance policies, and transformation logic in Confluence to maintain transparency, support audit readiness, and align with HIPAA and SDLC standards. Kellton Hyderabad, India
Data Engineer Nov 2021 – July 2023
• Gained hands-on experience with Databricks and Delta Lake to process and organize datasets into bronze, silver, and gold layers, supporting reporting and analytics requirements.
• Assisted in creating secure data ingestion workflows using Azure Data Factory (ADF) and Azure Key Vault to move data from on-premises systems into Azure Data Lake Storage Gen2 (ADLS).
• Contributed to the design and maintenance of ETL/ELT pipelines using PySpark, ADF, and Apache Airflow, helping the team ingest and transform structured and semi-structured data.
• Wrote basic Python scripts and simple JSON configurations to automate ingestion and validation tasks, reducing repetitive manual work for the team.
• Supported data modeling and transformation activities in Azure Synapse Analytics, preparing datasets for dashboards and downstream applications.
• Helped manage schema changes by assisting in building metadata-driven pipelines in ADF, reducing manual updates when source formats varied.
• Monitored daily pipeline runs with Azure Monitor and basic alerting rules, ensuring smooth data loads and quick issue resolution.
• Collaborated with data science and BI teams by preparing cleaned and curated datasets in PostgreSQL for reporting and analysis.
• Contributed to CI/CD practices by learning version control with GitHub and helping deploy ADF pipelines and Databricks notebooks under team guidance.
• Documented data flows, pipeline steps, and governance practices in Confluence to support audits and maintain project transparency. EDUCATION
Masters in Applied Data Science – Clarkson University