ASHRITHA NARRA
Mail: ************@*****.*** Mobile: +1-669-***-**** LinkedIn
Professional Summary
Senior Data Engineer with hands-on experience building and maintaining scalable data pipelines and cloud-based solutions on AWS and Azure. Skilled in Python, PySpark, Apache Spark, Kafka, and Airflow, with a solid grasp of ETL processes, data modeling, and performance tuning for large datasets. Experienced in managing both structured and unstructured data while ensuring data quality, governance, and compliance with standards like GDPR and SOC 2. Enjoys collaborating with cross-functional teams, automating workflows with CI/CD and Kubernetes, and delivering reliable, efficient, and secure data solutions. Passionate about turning complex data into actionable insights and improving systems that help teams make smarter decisions.
Technical Skills
•Programming & Scripting: Python, Java, SQL, PySpark, Shell Scripting
•Big Data & Processing: Apache Spark, Spark Streaming, Hadoop, Kafka, AWS Glue, Azure Databricks
•Data Modelling & Storage: MySQL, PostgreSQL, Oracle, Amazon RDS, Azure SQL Database, DynamoDB, Parquet, JSON, CSV, Log Files, HL7, FHIR
•Data Engineering & Pipelines: ETL/ELT Development, Real-time Data Streaming, Metadata-driven Pipelines, Schema Evolution, Data Lineage, Data Quality & Validation, Error Handling, Partitioning & Bucketing, File Compaction
•Cloud Platforms: (AWS: S3, Glue, EMR, Lambda, ECS, EKS, SQS, SNS, CloudWatch, DynamoDB, RDS, KMS) (Azure: Databricks, Data Factory, Key Vault, Azure Monitor, Azure SQL Database, Azure Purview)
•Workflow Orchestration: Apache Airflow, Azure Data Factory, Jenkins, GitHub Actions, AWS CodePipeline
•DevOps & CI/CD: Docker, Kubernetes, Jenkins, GitHub Actions, Azure DevOps, CI/CD Automation
•Security & Compliance: OAuth2, JWT, AWS KMS, Azure Key Vault, TLS/SSL, Data Encryption, Access Control, HIPAA, HITRUST, GDPR
•Visualization & Reporting: Power BI, Tableau, Grafana, Kibana, Lucidchart
•Web & API Development: REST APIs, GraphQL, Spring Boot, Swagger/OpenAPI
•Monitoring & Observability: Prometheus, Grafana, CloudWatch, Datadog, PagerDuty
•Collaboration & Methodologies: GitHub, GitLab, Confluence, Agile/Scrum, Code Reviews, Incident Response, Governance Frameworks
Professional Experience
Purevisitx, Austin, TX Dec 2024 – Current
Sr Data Engineer
•Designed and deployed automated ETL workflows using Python, AWS Glue, and PySpark, enabling efficient processing of multi-terabyte enterprise datasets for analytics and reporting.
•Developed scalable ingestion and storage frameworks to integrate structured (CSV, Parquet, RDBMS) and unstructured (JSON, log files) data into Amazon S3, optimizing in Parquet format and reducing storage costs by 25%.
•Tuned SQL queries and large-scale Spark jobs through partitioning, caching, and broadcast joins, reducing ETL runtimes by 30% and improving SLA adherence.
•Implemented robust data validation frameworks in PySpark and AWS Glue to enforce schema standards, apply business rules, and enhance data accuracy by 15%.
•Engineered real-time streaming pipelines using Apache Kafka and Spark Streaming, enabling near real-time analytics and fraud detection dashboards.
•Designed optimized relational data models in MySQL and PostgreSQL, ensuring scalability, normalization, and smooth integration with analytical systems.
•Created reusable, metadata-driven ETL frameworks in PySpark and AWS Glue, reducing development effort and ensuring consistency across multiple data projects.
•Implemented monitoring and alerting solutions with Apache Airflow, Amazon CloudWatch, and AWS Lambda, proactively tracking data pipeline health, SLA compliance, and automated recovery.
•Collaborated with governance and compliance teams to implement frameworks around PII handling, GDPR, HIPAA, and SOC 2, ensuring secure data management.
•Optimized Spark performance using partitioning, bucketing, and file compaction strategies, improving query performance by 40% on billion-row datasets.
•Enhanced DevOps automation by integrating Jenkins, Docker, and AWS CodePipeline for CI/CD deployment of ETL and data processing pipelines.
•Developed visualization and monitoring dashboards using Tableau, Grafana, and CloudWatch Metrics to improve data observability and stakeholder reporting.
•Documented data lineage, architecture diagrams, and process workflows using Confluence and Lucidchart, improving team onboarding and audit readiness.
•Performed peer code reviews on GitHub/GitLab, ensuring best practices for Spark optimization, Airflow DAG design, and AWS pipeline management.
Matrix-IFS, New Jersey May 2024 – Nov 2024
Data Engineer
•Developed and orchestrated data pipelines using Apache Airflow and Azure Data Factory, automating ingestion and transformation of Companies House financial data feeds.
•Built PySpark transformations on Azure Databricks for cleansing, parsing, and enrichment of multi-source datasets, enabling accurate entity resolution across 10M+ financial records.
•Designed normalized and denormalized schemas in Oracle, MySQL, and Azure SQL Database, improving query efficiency by 35% and supporting large-scale compliance and financial reporting.
•Ensured regulatory data integrity by implementing referential integrity checks, audit trails, and data validation rules, meeting FCA and SEC compliance requirements.
•Secured sensitive financial data through OAuth2 authentication, Spring Security, and Azure Key Vault, ensuring encryption in transit and at rest with TLS/SSL.
•Implemented schema evolution strategies in PySpark, handling upstream API changes gracefully and ensuring uninterrupted data flow.
•Created automated testing suites (unit & integration) for ETL pipelines, reducing production data errors by 20% and ensuring data reliability.
•Configured monitoring and alerting with Apache Airflow, Azure Monitor, and Application Insights, improving SLA compliance from 92% 99% and minimizing pipeline downtime.
•Automated ETL deployments via CI/CD pipelines using Jenkins, GitHub Actions, and Azure DevOps, enabling zero-downtime releases and consistent environment management.
•Authored data lineage documentation and transformation mappings in Confluence and Azure Purview, accelerating audit readiness by 30%.
•Developed interactive dashboards in Power BI and Tableau to visualize financial KPIs, enabling compliance teams to identify anomalies in corporate filings.
•Collaborated with business analysts and compliance officers to translate complex regulatory requirements into robust data engineering solutions integrated within Azure-based risk systems.
Harmonecare, Detroit, MI July 2023 – April 2024
Data Engineer
•Built scalable ingestion pipelines for third-party healthcare APIs (EHR/EMR, HL7, FHIR) using AWS Glue, AWS Lambda, and Amazon S3, centralizing patient and clinical records into an enterprise-wide healthcare data lake.
•Implemented OAuth2 and JWT authentication for REST API sessions, ensuring secure and compliant access to sensitive PHI and healthcare transactions.
•Developed PySpark jobs on Amazon EMR to normalize JSON, HL7, and FHIR responses from multiple healthcare providers, enabling unified data structures for clinical and operational reporting.
•Implemented robust error handling, retry logic, and Amazon SQS/SNS-based dead-letter queues, reducing healthcare API ingestion failures by 40% and improving reliability of patient data pipelines.
•Containerized ingestion pipelines with Docker and deployed them on Kubernetes (EKS), achieving auto-scaling and high availability for critical healthcare data services.
•Established CI/CD pipelines with Jenkins, GitHub Actions, and AWS CodePipeline, enabling automated deployments across Dev, QA, and Prod environments in compliance with HIPAA change management.
•Defined data reconciliation and validation rules for patient encounters, lab results, and billing records using AWS Glue DataBrew and PySpark, improving accuracy of daily clinical and financial reconciliation reports.
•Authored runbooks and RCA documentation in Confluence, cutting healthcare incident resolution time by 25% for operational support teams.
•Enhanced observability with Prometheus, Grafana, and Amazon CloudWatch dashboards, providing real-time visibility into API health, ingestion latency, and compliance monitoring.
•Implemented HIPAA and HITRUST-compliant encryption and tokenization using AWS KMS and TLS/SSL, ensuring end-to-end protection of sensitive healthcare datasets.
•Partnered with QA and compliance teams to integrate automated testing frameworks for healthcare pipelines, reducing regression issues and audit findings.
•Collaborated with cloud engineers to optimize EKS workloads and S3 storage tiers, reducing healthcare cloud infrastructure costs by 20%.
Amazon Web Services (AWS) July 2022 – June 2023
Full stack software developer
•Learned to design and develop web applications using Angular, Java (Spring Boot), and REST APIs, building responsive interfaces for modern browsers and devices.
•Gained hands-on experience building RESTful and GraphQL APIs with Spring Boot, integrating them with Amazon RDS (PostgreSQL/MySQL) and Amazon DynamoDB.
•Applied input validation and server-side schema checks using Spring Validation and Hibernate Validators to ensure data correctness and reduce errors.
•Created observability dashboards with Amazon CloudWatch, AWS X-Ray, and Elastic Stack, monitoring API performance and latency metrics.
•Implemented basic authentication and authorization using OAuth2, JWT, and Amazon Cognito, and secured data using AWS KMS and TLS.
•Explored backend optimization techniques such as service dependency management, query tuning, and caching with Amazon ElastiCache (Redis).
•Set up automated alerts and notifications using Amazon CloudWatch Alarms, Prometheus, and PagerDuty, gaining exposure to incident management.
•Collaborated with peers and mentors to deploy applications on AWS services including ECS, EKS, Lambda, S3, SQS, and SNS, learning about cloud scalability and availability.
•Assisted in integrating CI/CD pipelines using AWS CodePipeline, GitHub Actions, and Docker, learning to automate deployment processes.
•Documented technical details and API contracts using Confluence and Swagger/OpenAPI, improving understanding of team collaboration and onboarding practices.
•Participated in monitoring and troubleshooting using CloudWatch Logs, Kibana, and Datadog, learning how to identify and resolve issues in web applications.
•Followed security and compliance guidelines with mentorship from senior engineers, gaining awareness of SOC 2, ISO 27001, and GDPR requirements.
Education
Master of Data Science in University of North Texas - Denton, TX, USA May 2022
Bachelor of Computer Science in RMD Engineering College - Chennai, TN, India April 2020