Post Job Free
Sign in

Data Engineer Power Bi

Location:
Hillsboro, OR
Salary:
75000
Posted:
September 10, 2025

Contact this candidate

Resume:

HARSHA NARASIMHA MURTHY

Location: Dallas, TX Phone: +1-469-***-**** Email: ***************@*******.*** LinkedIn Professional Summary

Data Engineer with 4+ years of hands-on experience designing and maintaining scalable ETL/ELT pipelines, data lakes, and cloud-based data architectures across industries including telecom, energy, and public transit.

Proficient in Python, SQL, Apache Airflow, Power BI, MySQL, Snowflake, and AWS/Azure ecosystems, with deep expertise in data modeling, pipeline orchestration, and business intelligence solutions.

Successfully built batch and streaming data workflows using tools like AWS Glue, Databricks, and Kafka, enabling near real-time insights and improving reporting latency by up to 80%.

Strong background in data validation and testing using tools like Great Expectations, ensuring data integrity, schema consistency, and production readiness across ingestion zones.

Delivered multiple end-to-end analytics projects, including transit reliability dashboards and GitHub activity lakehouses, leveraging modern data engineering practices and cloud-first design.

Excellent communicator and collaborator with experience working cross-functionally with DevOps, QA, and domain experts, aligning data delivery with strategic business needs. Experience

UBER TX

Data Engineer Jan 2025 – Apr 2025

Built ELT pipelines using Python and SQL to process structured and semi-structured data from internal APIs and flat files into Snowflake.

Designed schema models in Snowflake and implemented dbt transformations for dimensional tables supporting business metrics.

Developed batch ingestion workflows in AWS Glue and automated scheduling using Airflow DAGs for daily data loads.

Used Apache Spark on Databricks to clean and transform raw datasets from S3, improving query efficiency for reporting layers.

Configured data validation logic using Great Expectations to enforce schema consistency across ingestion zones.

Monitored data pipelines using CloudWatch and implemented retry logic with Lambda functions to handle ingestion failures.

Infosys India

Data Engineer Dec 2021 – Jul 2023

Built scalable ETL pipelines using Python, SQL, and Azure Data Factory to ingest call detail records (CDRs), billing logs, and CRM feeds into Azure Synapse for downstream analytics.

Developed and deployed PySpark jobs in Databricks to standardize high-volume usage data from multiple vendor systems, improving consistency across KPIs.

Designed star and snowflake schemas in Snowflake to support high-concurrency reporting on customer churn, usage trends, and billing anomalies.

Integrated real-time Kafka streams into Azure Data Lake Gen2, implementing checkpointing and watermarking to manage out-of-order events and late-arriving data.

Implemented Delta Lake architecture with Bronze, Silver, and Gold layers to enforce data lineage and enable reprocessing of failed batches.

Built reusable CI/CD pipelines using Azure DevOps, automating deployment of notebooks, Spark jobs, and infrastructure templates.

Authored modular dbt models for customer segmentation, recharge behavior, and tariff plan mapping, enabling analysts to self-serve insights.

Created operational dashboards in Power BI to monitor pipeline health, row-level latency, and file arrival metrics, reducing downtime across critical loads.

Tuned Spark job configurations and optimized partitioning logic to reduce data transformation latency by over 30%.

Worked cross-functionally with DevOps, QA, and domain SMEs to align data delivery with business requirements and release cycles.

KPMG India

Data Analyst Jul 2019 – Nov 2021

Collected, cleaned, and transformed large-scale operational data using MySQL to identify inefficiencies and boost production performance.

Developed interactive Power BI dashboards to visualize key metrics, enabling faster data-driven decision- making for plant leadership.

Implemented Apache Airflow pipelines to automate ETL workflows, streamlining data ingestion from ERP and IoT sources.

Collaborated with cross-functional teams to perform root cause analysis on downtime events, reducing machine failure rates by leveraging data mining techniques.

Utilized Python and SQL to perform statistical analysis, trend forecasting, and anomaly detection in smelting and mining operations.

Maintained version control of analytics scripts and reports using Git, ensuring smooth collaboration and code reproducibility.

Created consolidated reports combining SCADA and ERP data, enhancing operational visibility and supporting predictive maintenance strategies.

Delivered weekly insights to stakeholders, supporting strategic planning through detailed reports and KPI tracking dashboards.

Technical Skills

Core Analytics: Data Cleaning, Data Wrangling, Trend Analysis, Statistical Analysis, Predictive Modeling, Data Mining, Report Automation

Programming & Scripting: Python (Pandas, NumPy, Matplotlib, Seaborn, TensorFlow, PyTorch), SQL (Joins, CTEs, Subqueries), R, VBA, Shell Scripting

Visualization Tools: Power BI, Tableau, Google Data Studio, Excel (Pivot Tables, VLOOKUP, Power Query)

Databases & Data Management: MySQL, PostgreSQL, Oracle, MongoDB, Google BigQuery, Snowflake, Redshift, SQL, NoSQL, Data Modeling, Data Warehousing

ETL, Integration & Orchestration: Apache Airflow, Talend, SSIS, Azure Data Factory, AWS Glue, Power Query, Apache Kafka, AWS Step Functions

Cloud Platforms: AWS (S3, RDS, Redshift, EC2, VPC, Lambda), Azure (Data Lake, Data Factory, VMs, Functions), GCP (BigQuery)

Model Deployment: Flask, FastAPI, Docker

Testing & Validation: A/B Testing, Cross-Validation, Data Validation, ROC-AUC

Operating Systems & Virtualization: Linux (Ubuntu, CentOS), Windows Server, macOS, VMware, Hyper-V

Networking & Security: TCP/IP, DNS, VPNs, Firewalls, IDS/IPS, Network Troubleshooting

Automation & Configuration Management: Ansible, Puppet, Terraform, Kubernetes, Docker

DevOps & CI/CD Tools: Jenkins, Git, GitLab, JIRA, CI/CD Pipelines

Monitoring & Performance: Nagios, Zabbix, Prometheus, Grafana, ELK Stack Education

University of Texas at Dallas

TX

M.S. in Information Technology and Management Aug 2023 - May 2025 M.S. Ramaiah Institute of Technology

India

Bachelor of Technology in Mechanical Engineering Aug 2017 - Aug 2021 Projects

Public Transit Reliability Dashboard: End-to-End Data Pipeline Mar 2025 - Present

Developed a real-time pipeline using Python, Kafka, Spark, and Airflow to ingest GTFS feeds into AWS Redshift, processing 50K+ daily records for transit reliability insights.

Enabled Power BI dashboards tracking on-time rates and delays across 120+ bus routes, reducing reporting lag by 80% and improving transit planning accuracy.

GitHub Activity Lakehouse: End-to-End Data Engineering Pipeline Jan 2025 – Mar 2025

Engineered a data lake on AWS S3 using AWS Glue and Airflow, storing and transforming GitHub event logs

(~10GB/day) into partitioned Parquet format.

Modeled and loaded curated data into Snowflake, enabling Tableau dashboards for 100+ repo insights on contributor activity, commit velocity, and issue frequency. Building a Scalable ETL Pipeline for Sales Data Analytics Sep 2024 – Dec 2024

Built an ELT pipeline using SQL, dbt, and Apache Airflow to consolidate multi-source sales data into Snowflake, processing 2M+ records/month with 99.9% reliability.

Automated Power BI reporting, cutting manual data prep by 40 hours/month and improving sales performance insights across 6 regional teams by 30%.



Contact this candidate