Senior Data Engineer - ETL, Spark, AWS/Azure/GCP

Location:

Lewisville, TX

Salary:

75000

Posted:

December 11, 2025

Contact this candidate

Resume:

Harshith Chilukuri

DATA ENGINEER

Location: TX Email: **************@*****.*** Phone: 469-***-**** Linkedin PROFESSIONAL SUMMARY

Data Engineer with 4+ years of experience designing, building, and optimizing large-scale data pipelines, real-time streaming systems, and cloud-native architectures across AWS, Azure, and GCP.

Proficient in developing end-to-end ETL workflows using Spark, Kafka, Airflow, and SQL, including both batch and real- time streaming pipelines, supporting large-scale, high-throughput data platforms.

Skilled in programming with Python, Scala, SQL, and Java, leveraging automation to improve data processing efficiency and reduce latency in mission-critical environments.

Strong background in data quality management, governance, and regulatory compliance (SOX, Basel III, CCAR, GDPR) across financial services, e-commerce, and education industries.

Experienced in enabling self-service analytics by building APIs, integrations, and BI solutions, including Power BI, Tableau, and QuickSight dashboards for business stakeholders.

TECHNICAL SKILLS

Languages & Frameworks: Python, PySpark, SQL, Spark-SQL, PL/SQL

Big Data & Cloud: AWS (S3, Glue, Redshift, EMR, Athena, IAM), Azure (ADF, Synapse, Stream Analytics), Databricks, Kafka, Hadoop

Data Warehousing: Snowflake, Redshift, Hive, Azure SQL DW, PostgreSQL, MySQL, Oracle

Orchestration & CI/CD: Airflow, Jenkins, Terraform, Git, GitLab, Bitbucket

Data Governance & Quality: Apache Atlas, Collibra, Informatica DQ, Talend DQ

Visualization: Power BI, Tableau, Looker

Python Libraries: Pandas, NumPy, SQLAlchemy, PyTest PROFESSIONAL EXPERIENCE

BNY MELLON TX, USA

Data Engineer Feb 2024 – Present

Engineered and automated ETL pipelines with AWS Glue, S3, Kinesis, and PySpark to process 2TB+ of daily transaction data, enabling real-time fraud detection and 30% faster risk analysis.

Optimized Snowflake warehouse structures, reducing query execution time by 25% during critical trading periods.

Built ML pipelines with Python, R, TensorFlow, PyTorch, and Pandas, boosting forecasting accuracy by 35% and deploying scalable models within real-time ETL workflows using Java.

Led migration of 50GB+ daily transactional data from on-prem systems to AWS Redshift, enhancing analytics and regulatory reporting with Athena.

Deployed Apache Airflow for ETL orchestration, reducing pipeline failures by 30% and ensuring SLA compliance.

Enforced data governance and regulatory compliance (SOX, Basel III, CCAR, and GDPR) across Snowflake and SQL environments

Collaborated with product managers, data scientists, and engineers to ensure ETL pipelines support AI/ML workflows and business analytics at scale.

KPMG India

Data Engineer Jul 2020 – Jul 2022

Automated data warehouse operations and ETL workflows using Apache Spark, SparkSQL, AWS S3, and Hive, improving reliability and accessibility.

Designed and maintained AWS infrastructure including EC2, S3, IAM, and Elastic File System, ensuring scalability and data security.

Transformed and migrated large datasets using AWS Glue and EMR, integrating structured and semi-structured data into the enterprise data lake.

Implemented CI/CD pipelines with Jenkins, Terraform, and AWS, enabling efficient integration and deployment cycles.

Created and optimized SQL stored procedures, functions, and triggers across Redshift and relational databases to improve performance.

Utilized AWS Step Functions, Data Pipeline, and Glue Workflows for orchestration, cataloging data with Crawlers, and automating ETL scheduling.

Cipla India

Data Analyst Sep 2019 – Jun 2020

Developed Python-based ETL workflows using AWS Glue to ingest and transform datasets across multiple business domains for analytics reporting.

Built Power BI dashboards to visualize mobility and operational KPIs, improving data-driven decision-making.

Designed and validated predictive models using SAS, R, and Scikit-learn to identify high-risk patients and forecast claim denials.

Led data cleaning, validation, and standardization to maintain high-quality and reliable datasets across analytics pipelines.

Supported data governance initiatives by implementing access controls and metadata documentation to ensure compliance. PROJECTS

Financial Data Lake Integration for Regulatory Analytics

Integrated audit and financial data from 6 departments into Delta Lake on AWS S3, enabling traceable lineage and time-travel queries.

Modeled star schemas in ER Studio and automated data refreshes with Airflow for real-time dashboards. Metadata Catalog and Validation Automation

Built a metadata discovery framework with Informatica EDC for lineage tracking and impact analysis across 10+ enterprise data systems.

Automated schema validations with PL/SQL and Python, reducing QA dependency and speeding up data delivery. EDUCATION

University of North Texas Denton, TX Aug 2020 – May 2023 Master’s in Advanced Data Analyst

Contact this candidate