Senior Data Engineer Spark, DW, and BI Platform Lead

Location:

Ottawa, ON, Canada

Posted:

May 06, 2026

Contact this candidate

Resume:

Manisha Nampally

Email: *****************@*****.*** Phone: 514-***-****

LinkedIn: www.linkedin.com/in/manisha-nampally-92ab213a4

Professional Summary

Data Engineer with 6+ years of experience building scalable big data infrastructure and analytics platforms using Apache Spark (PySpark, Spark SQL) and distributed data processing frameworks. Proven expertise in designing batch and near real time pipelines processing high volume datasets (20M–500M+ daily records) to support compliance monitoring, revenue analytics, KPI reporting, and product performance insights. Experienced in designing high performance data structures and semantic data layers to support business intelligence analytics and multi dimensional reporting.

Strong background in dimensional data modeling, data warehousing, and performance optimization to ensure high availability, resiliency, and SLA adherence. Experienced in partnering with Product, Finance, Risk, and Business stakeholders to translate complex requirements into reliable data solutions, dashboards, and actionable insights.

Hands on expertise in Python, SQL, workflow orchestration (Airflow), CI/CD automation, and BI reporting tools, with a strong focus on scalable system design, data quality, observability, and long-term platform reliability.

Tools & Technologies

Programming & Query Languages

Python

SQL (Advanced)

PySpark

Spark SQL

Big Data & Distributed Processing

Apache Spark

Distributed Data Processing

Batch & Near Real Time Processing

Data Partitioning & Performance Optimization

Delta Lake

Data Warehousing & Modeling

Dimensional Modeling (Star Schema, SCD Type 2)

Data Warehousing Concepts

Snowflake

Azure Synapse

Workflow Orchestration

Apache Airflow

Azure Data Factory

Cloud Platforms

Microsoft Azure (Databricks, ADLS Gen2)

AWS (S3, Glue-working knowledge)

BI & Analytics

Power BI

KPI & Funnel Analytics

Data Mining & Ad-hoc Analysis

DevOps & CI/CD

Azure DevOps

Git

CI/CD Pipelines

Infrastructure Automation

Monitoring & Data Quality

Data Validation Frameworks

Logging & Observability

SLA Monitoring

Educational Background

- Bachelors in Information Technology- JNTUH College of Engineering Hyderabad, Telangana, India.

Professional Experience

TELUS Health-ON

Azure Data Engineer

April 2024 – Present

Project: Telecom Compliance & Revenue Analytics Platform

Description:

Designed and scaled distributed Spark based data pipelines powering telecom subscriber analytics, revenue compliance monitoring, and churn analysis across 500M+ daily billing and usage records.

- Architected scalable Spark (PySpark + Spark SQL) pipelines processing 20M–500M+ telecom billing and subscriber records daily, improving data reliability and reducing failure rates by 35%.

- Designed dimensional models (Star Schema, SCD Type 2) to support compliance reporting, revenue assurance, and subscriber lifecycle analytics, improving query performance by 40%.

- Built batch and near real-time data workflows to support telecom revenue leakage detection and campaign performance tracking, enabling business teams to reduce revenue discrepancies by $1M+ annually.

- Partnered with Product Managers and Finance stakeholders to define KPIs including ARPU, churn rate, subscriber growth, and conversion metrics, translating business needs into scalable data architecture.

- Automated CI/CD deployment pipelines using Git and Azure DevOps, reducing release deployment time by 50% and improving production stability to 99.9% SLA adherence.

- Implemented data quality, monitoring, and observability frameworks, reducing data incidents by 30% across distributed billing systems.

- Developed interactive Power BI dashboards enabling stakeholders to analyse subscriber funnel metrics, segmentation, campaign performance, and revenue growth trends.

- Conducted ad hoc data mining and analysis using Spark SQL and notebooks, identifying churn driving patterns that improved targeted retention campaigns by 15%.

- Optimized partitioning and file compaction strategies in Delta Lake, improving large-scale query performance by 45%. Engineered high performance, multi dimensional data models (Star Schema, SCD Type 2) for telecom and banking analytics, supporting 500M+ daily records and improving complex BI query performance by 45–60%.

- Designed denormalized and aggregated data structures optimized for OLAP style reporting, reducing dashboard load times from 20s to under 5s.

Environment:

Big Data & Distributed Systems: Apache Spark, PySpark, Spark SQL, Delta Lake, Azure Databricks

Data Warehousing & Analytics: Snowflake, Azure Synapse Analytics, Star Schema/SCD Type 2 modeling

Workflow Orchestration & ETL: Apache Airflow, Azure Data Factory

Cloud Platforms: Microsoft Azure (Databricks, ADLS Gen2), AWS (S3, Glue working knowledge)

Programming & Scripting: Python, SQL, Scala (working knowledge)

BI & Reporting Tools: Power BI, Tableau

CI/CD & Version Control: Azure DevOps, Git, Automated Pipeline Integration

Monitoring & Quality Frameworks: Data validation & reconciliation frameworks, SLA monitoring, logging, root cause analysis

---

RBC Bank-ON

BI Analyst/Power BI Developer

October 2021– May 2024

Description: I designed, developed, and optimized enterprise scale financial data pipelines and analytical solutions, processing 500M+ banking records daily across accounts, loans, and credit portfolios. I implemented cloud first data orchestration, automated reconciliation workflows, and robust governance frameworks to ensure accuracy, regulatory compliance, and operational efficiency.

- Developed automated reconciliation and anomaly detection pipelines in Python and SQL for daily transaction feeds, reducing manual validation effort by 65% and preventing $1M+ potential reconciliation discrepancies.

- Implemented dynamic risk assessment dashboards in Power BI, integrating credit exposure, delinquency, and portfolio metrics, improving decision making speed for senior management by 40%.

- Built automated ETL monitoring and alerting using Azure Data Factory and Airflow, proactively detecting failed loads and ensuring 99.9% reporting SLA compliance.

- Optimized high volume SQL transformations and aggregations for month end close, cutting report generation time from 8 hours to 3 hours (63% faster).

- Developed metadata driven ETL frameworks enabling consistent handling of new financial data sources, reducing onboarding time from 2 weeks to 4 days.

- Designed role based data access and PII masking policies for 2000+ internal users, ensuring compliance with privacy standards and audit requirements.

- Conducted lineage tracking and impact analysis across ETL pipelines, enabling faster root cause resolution for data discrepancies and reducing downtime by 30%.

- Implemented historical data archival and partitioning strategies, improving query performance by 50% while reducing storage costs by 20%.

- Collaborated with finance, risk, and audit teams to develop automated reporting templates aligned with Basel III and IFRS, improving compliance review efficiency by 35%.

- Developed self service analytics datasets for branch and corporate finance teams, cutting dependency on IT for ad hoc queries by 30% and enabling faster data driven decisions.

- Led knowledge sharing sessions on SQL optimization, ETL automation, and dashboard design, improving team skill adoption and reporting quality.

Environment: Azure Data Factory, Azure SQL Data Warehouse, SQL Server, Oracle, Databricks, Python, Power BI, Airflow, Azure DevOps, Git

---

Maxso Technologies – India

Client: DHL

ETL Analyst/Developer

June 2019 – August 2021

Description: Worked as an ETL Analyst/Developer for DHL, responsible for designing, developing, and maintaining high volume ETL pipelines and data warehouse solutions. Worked on end to end design, development, and optimization of high volume ETL pipelines for operational and financial logistics data. Processed 10M+ shipment and billing records daily, ensuring accurate reporting, KPI tracking, and SLA compliance. Focused on automation, reconciliation, and analytics to improve operational decision making and reduce manual intervention. Developed batch ETL workflows processing 10M+ shipment and billing records daily into SQL Server and Teradata data warehouses, supporting operational and financial reporting.

- Implemented automated Python/SQL reconciliation frameworks for shipment, delivery, and billing data, reducing manual verification by 60% and preventing $500K+ potential billing errors.

- Developed real time exception monitoring and alerting for ETL pipelines, improving SLA adherence from 95% to 99% for daily reporting workflows.

- Designed metadata driven ETL frameworks to handle schema changes and source evolution automatically, reducing manual updates by 50% and improving team productivity.

- Built dynamic Tableau dashboards integrating logistics, billing, and delivery KPIs, enabling operations teams to identify delays and route inefficiencies, improving delivery performance by 25%.

- Optimized batch and incremental ETL strategies, reducing pipeline runtime by 50% for daily shipment and billing processing.

- Developed cost efficient data storage strategies, including partitioning and delta loading, reducing storage costs by 15% while maintaining performance.

- Conducted data lineage and audit tracking for all ETL processes, enabling end to end traceability and SLA compliance for operational reporting.

- Collaborated with cross functional finance, logistics, and operations teams to standardize KPIs and deliver actionable insights for invoicing, route efficiency, and hub throughput metrics.

- Implemented data quality and anomaly detection rules in PySpark and SQL, reducing reporting errors by 35% and ensuring trusted operational metrics.

Environment: SQL Server, Teradata, Oracle, SSRS, Tableau, Python, PySpark, ETL workflows, Data Warehousing, Batch Processing, Airflow, Delta Lake

Contact this candidate