HARSHA VARDHAN YELURI
DATA ENGINEER
+1-313-***-**** **************@*****.***
PROFESSIONAL SUMMARY
Results-driven Data Engineer with over 4+ years of experience designing, building, and optimizing scalable data infrastructure, cloud-native ETL workflows, and real-time streaming applications. Skilled in Python, SQL, Apache Spark, Kafka, and cloud platforms including AWS, Azure, and GCP. Adept at turning raw, messy data into reliable, structured datasets that power BI tools and ML models. Proven track record of reducing processing time, automating workflows, and supporting cross-functional teams through secure, high-availability pipelines. TECHNICAL SKILLS
Programming & Scripting: Python, SQL, Scala, Java, Bash, R Big Data & Processing: Apache Spark, Hadoop, Hive, Kafka, Flink, Dask, Presto, Delta Lake Data Warehousing: Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse, PostgreSQL, MySQL, Oracle, MongoDB, Cassandra
ETL & Orchestration: Apache Airflow, dbt, AWS Glue, Informatica, Talend, Alteryx, Azure Data Factory, NiFi Cloud & DevOps: AWS (S3, Lambda, Glue, EMR), Azure (ADF, Synapse), GCP (BigQuery, Dataflow), Docker, Kubernetes, Terraform, Jenkins, GitLab CI/CD
Data Quality & Governance: Great Expectations, Monte Carlo, Collibra, Alation, Apache Atlas, Deequ, DataDog,
Evidently AI
Visualization & Collaboration: Tableau, Power BI, Looker, Superset, Jira, Confluence, Trello, Slack, Microsoft Teams, Notion
PROFESSIONAL EXPERIENCE
INTEL CORPORATION
Data Engineer May 2024 – Present
• Developed a Customer Behavior Insights pipeline using Kafka and Spark Streaming to ingest and enrich clickstream and CRM logs, driving behavioral analytics for 15M+ monthly active users.
• Built a Data Validation module in Python integrated with Airflow DAGs to monitor and alert on schema drifts, nulls, and data anomalies, reducing undetected issues by 70%.
• Constructed and optimized a Redshift Data Mart for forecasting sales performance by region and product, cutting Tableau dashboard load time by 40%.
• Created a ML Feature Store Loader to feed curated behavioral signals into Amazon SageMaker pipelines for predictive analytics.
• Automated SLA tracking for batch pipelines through a custom Airflow plugin with alerting via Slack and PagerDuty, improving SLA adherence by 60%.
• Enabled deployment of Spark jobs via Docker and Kubernetes, integrated into GitLab CI/CD for zero- downtime releases and rollback control.
• Implemented enterprise-wide PII masking policy using dynamic data masking and encryption at rest, ensuring HIPAA and GDPR compliance.
• Partnered with finance teams to build ingestion pipelines for SAP invoice data, integrating with Snowflake to support financial reporting.
MICROLAND
Data Engineer Sep 2019 – Jul 2022
• Led a Data Warehouse Migration project migrating 1.2TB from Oracle to AWS Redshift using Glue, reducing storage and query costs by 20% with columnar compression and optimized distribution keys.
• Designed a Real-Time Infra Monitoring Pipeline streaming logs through Kafka, processed in Snowflake, and visualized in Power BI for cloud ops teams. 2
• Created a Product Analytics Engine with PySpark to compute usage KPIs (DAU, MAU, retention) across mobile and web platforms with daily loads of 2+ TB.
• Built a Customer Support Analytics pipeline that classified ticket types using NLP models in Python, improving root cause analysis in the support dashboard.
• Developed a Reusable ETL Framework using Talend for ingesting partner data feeds, with schema validation, data profiling, and change detection modules.
• Tuned complex SQL workloads in Snowflake using clustering and result cache optimization, reducing refresh times of dashboards by over 25%.
• Managed data archival in S3 using lifecycle rules and Glacier policies, achieving a 30% cost reduction on cold storage.
• Deployed and scheduled Airflow workflows via ECS Fargate for improved resource elasticity and isolation.
TRIGENT SOFTWARE INC.
Data Analyst Intern Jan 2019 – Aug 2019
• Developed and automated 10+ recurring reports in Tableau and Power BI, improving reporting efficiency by 25% for internal stakeholders.
• Conducted exploratory data analysis (EDA) to identify trends, anomalies, and business opportunities, leading to actionable insights for client projects.
• Collaborated with analysts and developers to validate datasets and ensure adherence to data governance and quality standards.
• Utilized Excel (VLOOKUP, PivotTables, Macros) to perform ad-hoc data analysis and generate custom datasets for business units.
• Documented data processes, workflow diagrams, and reporting standards, ensuring smooth handover and continuity for future projects.
• Assisted in migrating datasets to cloud storage on AWS S3, ensuring secure, scalable, and organized data management for analytical use.
EDUCATION
Master of Science in Information Systems
Central Michigan University – Mount Pleasant, MI
Aug 2022 – May 2024
Bachelor of Technology in Computer Science Engineering Amrita Vishwa Vidyapeetham – Coimbatore, India
Jun 2017 – May 2021
CERTIFICATIONS
• AWS Certified: Data Engineer – Associate
• HTML, CSS, And Java Script For Web Developers
PROJECT HIGHLIGHT
Real-Time Analytics Pipeline for E-Commerce
• Designed a Kafka + Spark Streaming pipeline for live tracking of user interactions across e-commerce web and mobile platforms.
• Implemented resource-optimized partitioning and schema evolution logic, reducing processing time by 30%.
• Stored enriched data in Redshift and exposed it via Power BI dashboards, enabling near real-time monitoring of product engagement.
• Integrated Lambda and CloudWatch for automated orchestration, error recovery, and performance alerting.