Harshith Chilukuri
DATA ENGINEER
Location: TX Email: **************@*****.*** Phone: 469-***-**** Linkedin PROFESSIONAL SUMMARY
Data Engineer with 4+ years of experience designing, building, and optimizing large-scale data pipelines, real-time streaming systems, and cloud-native architectures across AWS, Azure, and GCP.
Proficient in developing end-to-end ETL workflows using Spark, Kafka, Airflow, and SQL, including both batch and real- time streaming pipelines, supporting large-scale, high-throughput data platforms.
Skilled in programming with Python, Scala, SQL, and Java, leveraging automation to improve data processing efficiency and reduce latency in mission-critical environments.
Strong background in data quality management, governance, and regulatory compliance (SOX, Basel III, CCAR, GDPR) across financial services, e-commerce, and education industries.
Experienced in enabling self-service analytics by building APIs, integrations, and BI solutions, including Power BI, Tableau, and QuickSight dashboards for business stakeholders.
TECHNICAL SKILLS
Languages & Frameworks: Python, PySpark, SQL, Spark-SQL, PL/SQL
Big Data & Cloud: AWS (S3, Glue, Redshift, EMR, Athena, IAM), Azure (ADF, Synapse, Stream Analytics), Databricks, Kafka, Hadoop
Data Warehousing: Snowflake, Redshift, Hive, Azure SQL DW, PostgreSQL, MySQL, Oracle
Orchestration & CI/CD: Airflow, Jenkins, Terraform, Git, GitLab, Bitbucket
Data Governance & Quality: Apache Atlas, Collibra, Informatica DQ, Talend DQ
Visualization: Power BI, Tableau, Looker
Python Libraries: Pandas, NumPy, SQLAlchemy, PyTest PROFESSIONAL EXPERIENCE
BNY MELLON TX, USA
Data Engineer Feb 2024 – Present
Engineered and automated ETL pipelines with AWS Glue, S3, Kinesis, and PySpark to process 2TB+ of daily transaction data, enabling real-time fraud detection and 30% faster risk analysis.
Optimized Snowflake warehouse structures, reducing query execution time by 25% during critical trading periods.
Built ML pipelines with Python, R, TensorFlow, PyTorch, and Pandas, boosting forecasting accuracy by 35% and deploying scalable models within real-time ETL workflows using Java.
Led migration of 50GB+ daily transactional data from on-prem systems to AWS Redshift, enhancing analytics and regulatory reporting with Athena.
Deployed Apache Airflow for ETL orchestration, reducing pipeline failures by 30% and ensuring SLA compliance.
Enforced data governance and regulatory compliance (SOX, Basel III, CCAR, and GDPR) across Snowflake and SQL environments
Collaborated with product managers, data scientists, and engineers to ensure ETL pipelines support AI/ML workflows and business analytics at scale.
KPMG India
Data Engineer Jul 2020 – Jul 2022
Automated data warehouse operations and ETL workflows using Apache Spark, SparkSQL, AWS S3, and Hive, improving reliability and accessibility.
Designed and maintained AWS infrastructure including EC2, S3, IAM, and Elastic File System, ensuring scalability and data security.
Transformed and migrated large datasets using AWS Glue and EMR, integrating structured and semi-structured data into the enterprise data lake.
Implemented CI/CD pipelines with Jenkins, Terraform, and AWS, enabling efficient integration and deployment cycles.
Created and optimized SQL stored procedures, functions, and triggers across Redshift and relational databases to improve performance.
Utilized AWS Step Functions, Data Pipeline, and Glue Workflows for orchestration, cataloging data with Crawlers, and automating ETL scheduling.
Cipla India
Data Analyst Sep 2019 – Jun 2020
Developed Python-based ETL workflows using AWS Glue to ingest and transform datasets across multiple business domains for analytics reporting.
Built Power BI dashboards to visualize mobility and operational KPIs, improving data-driven decision-making.
Designed and validated predictive models using SAS, R, and Scikit-learn to identify high-risk patients and forecast claim denials.
Led data cleaning, validation, and standardization to maintain high-quality and reliable datasets across analytics pipelines.
Supported data governance initiatives by implementing access controls and metadata documentation to ensure compliance. PROJECTS
Financial Data Lake Integration for Regulatory Analytics
Integrated audit and financial data from 6 departments into Delta Lake on AWS S3, enabling traceable lineage and time-travel queries.
Modeled star schemas in ER Studio and automated data refreshes with Airflow for real-time dashboards. Metadata Catalog and Validation Automation
Built a metadata discovery framework with Informatica EDC for lineage tracking and impact analysis across 10+ enterprise data systems.
Automated schema validations with PL/SQL and Python, reducing QA dependency and speeding up data delivery. EDUCATION
University of North Texas Denton, TX Aug 2020 – May 2023 Master’s in Advanced Data Analyst