HARIKA PRASANNA BARLA
San Jose, CA +1-203-***-**** *****************@*****.***
SUMMARY
Data Engineer with 6+ years of experience designing and optimizing end-to-end data pipelines, databases, and ETL workflows across cloud and on-prem environments. Skilled in translating business and analytical requirements into performant SQL queries, database models, and automation solutions. Expert in Python, SQL, and Spark with a strong foundation in relational modeling, query optimization, and large-scale data transformations. Experienced in Azure, AWS, and GCP, with working knowledge of Kafka, MongoDB, and trade lifecycle data flows. Adept at building scalable, reliable data systems supporting analytics and reporting for enterprise clients in fast-paced, data-driven environments. TECHNICAL SKILLS
Databases & Modeling: DB2, PostgreSQL, SQL Server, Sybase, Snowflake, Azure SQL, MongoDB, Redshift, BigQuery
ETL & Orchestration: Azure Data Factory (ADF), Apache Spark, Databricks, Airflow, SSIS, Informatica, dbt
Programming & Automation: Python, SQL, Shell/K-Shell, PowerShell Big Data & Messaging: Kafka, Hadoop, Delta Lake
Performance Optimization: Query tuning, Indexing, Caching, Partitioning, Database statistics, Archiving DevOps & CI/CD: Jenkins, Azure DevOps, Docker, Kubernetes, GitHub Actions Visualization & Reporting: Power BI, Tableau, Looker Studio Modeling Tools: PowerDesigner, ERWin, Visio
Agile Collaboration: JIRA, Confluence, Scrum Methodology PROFESSIONAL EXPERIENCE
Cloudera Technologies Inc USA — Data Engineer Jan 2025 – Present
• Designed and developed performant databases and ETL frameworks using ADF, Databricks, and SQL, integrating multi-source data into Azure Synapse and Snowflake.
• Translated complex business requirements into optimized SQL and stored procedures, delivering accurate and timely reporting for operations and analytics teams.
• Built automation scripts in Python for data quality validation, schema reconciliation, and ingestion monitoring, reducing manual oversight by 30%.
• Implemented incremental and CDC patterns for near-real-time processing; fine-tuned pipelines for partitioning, indexing, and parallel execution.
• Collaborated with architects and data modelers to design normalized and denormalized schemas using PowerDesigner, improving query performance by 40%.
• Integrated Kafka streaming data into batch pipelines for downstream analytics and dashboards.
• Followed Agile methodology with sprint planning, code reviews, and CI/CD releases. ZoomInfo Technologies USA — Data Engineering Intern May 2024 – Nov 2024
Developed and scheduled ADF pipelines for ingestion and transformation of enterprise data into Azure Synapse and PostgreSQL environments.
Built parameterized data flows with schema drift support; optimized pipeline concurrency and runtime by 35%.
Created SQL validation scripts for data completeness and accuracy checks.
Automated deployment workflows using Azure DevOps and Git, ensuring consistent releases across development and production.
Enhanced monitoring through custom Python scripts and email alerts, improving SLA adherence by 25%.
Best Buy India — Sr. Data Analyst Jan 2020 – Dec 2022
• Re-engineered ETL jobs from legacy SSIS to dbt + Snowflake, reducing data latency by 50%.
• Designed dimensional data models (star/snowflake) to support financial and product analytics, improving query performance by 65%.
• Created Python automation scripts for metadata-driven ingestion and validation.
• Implemented stored procedures and complex SQL for business reporting, query optimization, and archiving strategies.
• Partnered with BI teams to deliver data-ready layers powering 25+ dashboards in Power BI and Tableau.
Liberty Mutual India — Associate Data Analyst Aug 2017 – Dec 2019
• Built SQL and Python-based ETL for claims and policy data ingestion from flat files and APIs into DB2 and AWS Redshift.
• Designed indexes, partitioning, and query plans to optimize DB2 performance, cutting report runtime by 45%.
• Developed stored procedures and automated batch jobs via Unix shell scripting for data validation and reconciliation.
• Created Tableau dashboards for underwriting KPIs, improving visibility into financial performance.
• Collaborated with cross-functional Agile teams to meet delivery timelines and enhance system reliability.
PROJECT HIGHLIGHTS
• Real-Time Fraud Detection System: Built Kafka + Spark pipeline to detect suspicious activity in real time; reduced fraud losses by 25%.
• Cloud Data Warehouse Migration: Migrated 3TB+ of enterprise data into Azure Data Factory; enabled 50% faster analytics for 200+ users.
• Trade Lifecycle Data Integration: Built ADF and Databricks pipelines to integrate trading, position, and risk datasets; implemented relational modeling in Synapse with optimized indexing for faster portfolio reporting.
• Consumer & Healthcare Analytics Platform: Integrated survey, social, and healthcare datasets via APIs; built Delta Lake pipelines on Databricks and dashboards in Tableau and Power BI; improved decision-making for 200+ users.
• Enterprise Data Lakehouse Modernization: Built ADF + Databricks + Synapse pipelines implementing medallion architecture, cutting pipeline runtimes by 60% and enabling daily refresh for analytics dashboards.
CERTIFICATIONS
• AWS Certified Solutions Architect – Associate
• Google Data Analytics Professional Certificate
• SnowPro Advanced: Data Engineer
EDUCATION
• Master of Science in Business Analytics — St. Francis College, Brooklyn, NY (Dec 2024)
• B.Tech in Electronics & Communications Engineering — JNTUH, Hyderabad (May 2017)