Data Engineer Real-Time

Location:

Pearl, MS

Posted:

August 02, 2025

Contact this candidate

Resume:

Reethika Reddy Lachagari

DATA ENGINEER

816-***-**** ****************@*****.*** Austin, TX LinkedIn Portfolio Data Engineer with 3+ years of experience executing complex data modeling, data warehousing, and CI/CD pipeline strategies across cloud-native ecosystems. Proven success in deploying scalable, real-time data pipelines using PySpark, Delta Lake, and Airflow. Adept in delivering actionable insights through BI platforms while enforcing robust data governance across AWS, Azure, and GCP. EDUCATION

Masters of Science in Computer Science

Campbellsville University, Louisville, KY, USA

Jan 2023 – Dec 2024

Bachelors of Science

Osmania University, Hyderabad, Telangana, India

Aug 2018 – Jul 2021

EXPERIENCE

Northern Trust, USA Nov 2024 – Current

Data Engineer

• Orchestrated high-volume ETL workflows using Azure Data Factory and Airflow, enabling 30% improvement in data accuracy across 2M+ daily records.

• Transformed 1TB+ datasets via PySpark and Spark SQL to drive a 60% boost in processing throughput.

• Automated 150+ workflows using Python, AWS Lambda, and Jenkins as part of a robust CI/CD pipeline.

• Modeled NoSQL schema in MongoDB to optimize semi-structured data access patterns.

• Delivered KPI-driven dashboards with Power BI and Tableau for real-time business monitoring.

• Tuned Redshift and Glue queries on AWS to improve response times by 67% for 10M+ records.

• Unified batch and streaming data in Azure Data Lake using Delta Lake architecture.

• Spearheaded CI/CD integration via Jenkins and Git, cutting deployment issues by 40%.

• Codified cloud infrastructure using Terraform to enforce scalable, consistent provisioning.

• Championed Agile delivery using Jira and centralized documentation via Confluence. CitiusTech, India Jan 2020 – Jul 2022

Data Engineer

• Engineered ingestion frameworks in Databricks and Apache NiFi for 500GB+ diverse healthcare data, aligning with data modeling best practices.

• Analyzed over 1TB of patient claims using PySpark to enable targeted cohort segmentation.

• Streamlined real-time data ingestion from IoT devices via Apache Kafka, slashing latency bottlenecks.

• Implemented Delta Lake-based Lakehouse model to ensure ACID-compliant and scalable storage.

• Developed reusable LookML blocks for 30+ Looker dashboards to accelerate self-service BI adoption.

• Consolidated 20+ data sources into live analytics via AWS QuickSight.

• Supported distributed data warehousing using Cassandra and BigQuery across 50M+ records.

• Lowered compute costs by optimizing Hive batch processing with advanced partition strategies.

• Scaled CI/CD-enabled ETL workflows using Jenkins and Git across 40+ deployments

• Collaborated with cross-functional teams via Jira and drove documentation standardization using Confluence. SKILLS

• Languages & Scripting: Python, SQL, PySpark, Spark SQL, LookML, Bash

• Big Data & ETL: Apache Spark, Hadoop, Databricks, Apache Airflow, Azure Data Factory, Apache NiFi, AWS Glue, Apache Kafka, Delta Lake

• Cloud Platforms: AWS (S3, EC2, Lambda, Glue, Redshift, Athena, CloudWatch, EMR), Azure (Synapse, Data Lake), GCP (BigQuery), Lakehouse Architecture

• Databases: MySQL, SQL Server, MongoDB, Cassandra, Snowflake, AWS Redshift

• BI & Analytics: Power BI (DAX), Tableau, Looker, AWS QuickSight, Jupyter Notebooks

• DevOps & Automation: Jenkins, Git, Docker, Jira, Confluence, Terraform

• Methodologies: Agile, Scrum

• Soft Skills: Communication, Collaboration, Stakeholder Management

• Governance & Orchestration: Data Catalog (Glue Data Catalog), Data Governance, Airflow, NiFi PROJECTS

Real-Time Retail Analytics Platform

• Engineered real-time ingestion pipelines using AWS Glue, Lambda, and S3 to handle over 1M transactions daily.

• Developed a Redshift-powered data warehouse model for real-time Power BI dashboards with 25+ KPIs.

• Enabled 10-minute reporting latency via streaming architecture and optimized data modeling techniques.

• Integrated Jenkins/Git for automated CI/CD deployment, enhancing release reliability by 40%. Healthcare Claims ETL Pipeline

• Architected scalable PySpark ETL pipelines in Azure Databricks to validate and transform 500K+ insurance claims weekly.

• Delivered analytics-ready data to Snowflake warehouse with quality assurance logic that reduced errors by 45%.

• Created 15+ interactive dashboards using LookML in Looker for clinical and financial analysis.

• Accelerated sprint velocity by 20% through agile sprint planning and Jira-based workflow optimization. CERTIFICATIONS

• Hacker Rank - Python & SQL Certifications

• Microsoft Certified - Fabric Data Engineer Associate

• Aws Certified - Cloud Practitioner

Contact this candidate