Data Engineer Power Bi

Location:

Karnataka, India

Posted:

October 15, 2025

Contact this candidate

Resume:

Shambhavi Pandala Data Engineer

+1-913-***-**** *****************@*****.*** LinkedIn

SUMMARY

Data Engineer with 3+ years of experience designing and optimizing scalable ETL and streaming pipelines across Azure and AWS. Experienced in Databricks, PySpark, Airflow, and data modeling, with a strong track record in data governance and compliance in fintech and healthcare. Skilled at resolving complex data quality issues, enabling data-driven decisions, and delivering actionable insights through Power BI and Databricks SQL dashboards. Thrives in Agile environments, collaborating effectively to turn business requirements into robust, high-performance data solutions. SKILLS

Programming & Scripting: Python (Pandas, NumPy, PySpark), SQL, Bash Big Data & Distributed Computing: Apache Spark, Delta Lake, SQL, DLT, Notebooks, Hadoop, Structured Streaming ETL/ELT & Workflow Orchestration: Apache Airflow, dbt, Azure Data Factory, Delta Live Tables, CDC (Change Data Capture) Data Warehousing & Modeling: Azure Synapse Analytics, Snowflake, Kimball Dimensional Modeling, Data Vault 2.0 Cloud Platforms & Services: Microsoft Azure (ADF, Synapse, Blob, Purview), AWS (S3, Glue, EMR), Databricks on Azure Streaming & Messaging: Apache Kafka, Structured Streaming Data Governance & Security: Unity Catalog, Microsoft Purview, SOC 2, GDPR, HIPAA Compliance, Role-Based Access Control Data Visualization & Reporting: Power BI, Excel (PivotTables, VLOOKUP), Databricks SQL Dashboards CI/CD & DevOps: Azure DevOps, Git, GitHub Actions, CI/CD Pipelines, YAML Version Control & Collaboration: Git, GitHub, Jira, Confluence, Agile (Scrum) EXPERIENCE

Data Engineer Databricks Jan 2025 – Present Remote, USA

Enabled real-time fraud detection by helping design and deploy streaming data pipelines with Delta Live Tables and Structured Streaming, reducing risk by ensuring transactions were processed within seconds.

Improved batch processing performance by 30% by developing modular, scalable ETL workflows in PySpark and SQL to handle diverse datasets from Amazon S3 and Kafka, ensuring faster and more reliable data availability.

Eliminated repetitive manual work by automating pipeline orchestration with Apache Airflow and dbt, improving deployment consistency and strengthening data quality.

Contributed to cloud migration initiatives by modernizing legacy ETL processes and assisting in the transition to Databricks on Azure, which simplified infrastructure and improved scalability.

Strengthened compliance and data governance by implementing column-level lineage and fine-grained access controls with Unity Catalog and Microsoft Purview, supporting SOC 2 and GDPR standards.

Improved executive reporting accuracy by 12% by conducting in-depth root-cause analysis with SQL and enhancing Power BI dashboards, giving leadership more trustworthy insights.

Optimized Databricks workloads, applying Z-ordering, dynamic partitioning, and caching techniques to significantly reduce query execution time and lower compute usage.

Accelerated delivery of analytics solutions by 20% by collaborating with data scientists and analysts to translate complex business needs into scalable lakehouse architectures.

Developed dimensional data models (Star Schemas) in the Databricks Lakehouse, providing a reliable single source of truth for BI and machine learning teams.

Data Engineer Accenture Jun 2021 – Dec 2023 India

Built batch, streaming, and CDC pipelines on Databricks (PySpark, Delta Lake, Structured Streaming) and Azure Data Factory, delivering near real-time analytics for regulatory compliance and population health monitoring.

Reduced clinical dashboard refresh time by 40% by designing and deploying scalable ETL pipelines in Python, Apache Airflow, and Azure Data Factory, improving timely access to healthcare data.

Improved data quality and reliability by implementing anomaly detection frameworks in Python + SQL, resolving recurring upstream issues and enhancing clinical dataset accuracy.

Cut reporting latency by 35% by modeling and transforming healthcare datasets in Azure Synapse using Kimball dimensional design, enabling faster operational and trend analysis

Contributed to HIPAA/GDPR compliance efforts by implementing metadata lineage tracking and role-based access controls via Azure Purview, supporting governance initiatives.

Developed patient-centric dashboards in Power BI and Databricks SQL, providing clinicians and administrators with actionable insights, reducing manual reporting effort, and improving operational visibility.

Conducted root cause analysis and implemented monitoring solutions using Power BI proactively identifying pipeline issues and significantly reducing system downtime, ensuring smoother healthcare operations. Data Analyst Intern Infosys Jan 2021 – May 2021 India

Assisted in collecting, cleaning, and preparing large datasets from multiple business sources for reporting and analysis.

Built interactive dashboards in Power BI and Excel to track key performance metrics and trends.

Conducted exploratory data analysis (EDA) using Python (Pandas, NumPy) and SQL to uncover actionable insights.

Collaborated with senior analysts to document business requirements and deliver ad-hoc reports for client stakeholders.

Automated recurring data validation checks and reporting tasks, improving accuracy and saving manual effort.

Supported the migration of legacy reports to modern BI tools, improving accessibility and reducing manual reporting time. EDUCATION

Master of Science in Computer Science Jan 2024 – May 2025 University of Central Missouri, USA

CERTIFICATIONS

Google Professional Data Engineer

Microsoft Certified Azure Data Engineer Associate

AWS Certified Data Engineer Associate

Contact this candidate