Bharathkumar M
*****************@*****.*** • 567-***-**** • linkedin.com/in/bharath520
SUMMARY
Senior Data Engineer with 10+ years of experience delivering enterprise data platforms across insurance, financial services,
private credit, HVAC services, and banking. Career started in SQL Server / SSIS / SSRS / Power BI development at HSBC and
Conduent, grew into data engineering and BI delivery at CNA Insurance and Byte Link, and progressed into cloud-native data
engineering leadership at Antares Capital, Service Experts, and AAA Hoosier Motor Club. Equally comfortable owning the full
Azure stack end-to-end and dropping back into deep on-prem SQL Server and MSBI work when the role calls for it.
Hands-on expert across Azure Data Factory (metadata-driven ControlTable frameworks, ForEach + Lookup, dynamic stored
procedures), Azure Synapse Analytics, Azure Databricks (PySpark, Spark SQL, Delta Lake, Structured Streaming), Azure SQL
Database, ADLS Gen2, and Self-Hosted Integration Runtimes; secondary experience with AWS (S3, Glue, Redshift, RDS,
SageMaker) and Google BigQuery. Sole engineer responsible for the current AAA HMC modernization, where I provisioned
the full Azure ingestion stack from scratch (Linked Services, HMCSHIR Self-Hosted IR, Key Vault, RBAC), built a
metadata-driven framework loading PostgreSQL via SSH tunnel, on-prem SQL Server via ODBC, and BigQuery into Azure
SQL, designed a per-table self-healing watchdog, and authored 100+ pages of handoff documentation.
Strong data modeling background covering star and snowflake schemas, SCD Type 2, dimensional modeling, medallion
architecture on Delta Lake, and metadata-driven design patterns. Power BI specialist with advanced DAX (CALCULATE,
USERELATIONSHIP, semi-additive time intelligence), row-level security, and dashboards spanning financial reporting,
regulatory analytics, portfolio performance (IRR, MOIC), Customer 360, and operational KPIs. Experienced with CI/CD (Azure
DevOps, GitHub Actions), monitoring (Azure Monitor, Log Analytics, Application Insights, Splunk), and cloud security
(RBAC, Managed Identity, Key Vault), driving business value through secure, reliable, well-documented data platforms
aligned with Agile and SDLC practices.
SKILLS
Category Skills & Tools
Cloud Platforms Azure Synapse Analytics, Azure Data Lake Storage Gen2, Azure Databricks (Unity
Catalog, External Locations, Instance Pools), Azure Machine Learning Studio, Azure
Data Factory, Azure SQL Database (Managed Identity), Azure SQL Managed
Instance, Azure Functions, Azure Logic Apps, Azure Stream Analytics, Azure DevOps,
Event Hub, Key Vault, AWS S3, AWS Glue, AWS Redshift, AWS RDS, AWS SageMaker,
Google BigQuery
Data Engineering & ETL SSIS (Lookup, Conditional Split, Derived Column, Foreach Loop, SCD Type 2), Azure
Data Factory (metadata-driven ControlTable framework, ForEach + Lookup, dynamic
stored procedure invocation, Mapping Data Flows), Python (Pandas, NumPy,
requests), PySpark, Spark SQL, Delta Lake, Structured Streaming, Kafka, Batch &
Streaming Pipelines, Watermark-based Incremental Loads, ROW_NUMBER
Deduplication, Self-Healing Watchdog Frameworks, Polybase, OPENROWSET, ODBC
integration via Self-Hosted Integration Runtime, SSH-tunneled connectivity
Infrastructure & Azure Linked Services (Managed Identity for Azure SQL, Service Account for
Integration BigQuery, ODBC for on-prem SQL Server, ADLS Gen2), Self-Hosted Integration
Runtime (HMCSHIR install, registration, multi-node management),
AutoResolveIntegrationRuntime configuration, SSH tunnel infrastructure on Windows
service accounts (NSSM service pattern), ODBC driver setup, ADF Triggers (schedule
and tumbling window)
Data Modeling & Star Schema, Snowflake Schema, Slowly Changing Dimensions (SCD Type 1 / Type 2),
Design Partitioning (range, hash, list), Logical & Physical Design, Normalization (3NF),
Medallion Architecture (bronze / silver / gold on Delta Lake), Control Table /
Metadata-Driven Architecture
Monitoring & Logging Azure Monitor, Diagnostic Settings, Log Analytics (Kusto Query Language),
Application Insights, Action Groups + ADF Failure Alerts, Splunk (log monitoring,
Category Skills & Tools
alerting, dashboards), AWS CloudWatch, custom audit tables with action-code
taxonomies
Databases & Azure Synapse (dedicated + serverless SQL pools), Azure SQL Database, SQL Server
Warehousing (on-prem and cloud, 2016 / 2019 / 2022), PostgreSQL, AWS Redshift, Google
BigQuery
Visualization & BI Power BI (advanced DAX with CALCULATE, USERELATIONSHIP, semi-additive YTD /
QTD / MTD time intelligence, KPIs, row-level security, drill-through, bookmarks), SSRS
(parameterized, sub-reports, subscriptions), SSAS (Tabular & Multidimensional),
Tableau, Qlik Sense, Power Apps, Power Automate
Machine Learning & AI Azure ML Studio, Databricks MLlib, Scikit-learn, Forecasting (ARIMA, Prophet),
Classification, Anomaly Detection, AWS SageMaker
CI/CD & DevOps Azure DevOps (Pipelines, Repos, YAML build / release), GitHub Actions, Git
(feature-branch and trunk-based workflows), CI/CD for ADF / Synapse / Databricks
artifacts
API & Integration REST APIs, SOAP APIs, Postman, ODBC drivers, SSH Tunnels, Self-Hosted Integration
Runtime, Azure Functions, Azure Logic Apps, SAP HANA, SharePoint, third-party API
integrations
Languages Python, T-SQL, PL/SQL, PySpark, MDX (SSAS Multidimensional)
EDUCATION
Master of Science 2024
Indiana Institute of Technology – Fort Wayne, IN
Bachelor of Science 2015
Nagarjuna University, Guntur, IND
CERTIFICATIONS
• Microsoft Certified Associate – Azure Data Engineer
• Career Essentials in Data Analysis by Microsoft and LinkedIn Education
• Career Essentials in Project Management by Microsoft and LinkedIn Education
EXPERIENCE
Azure Data Engineer / Lead Data Engineer
AAA Hoosier Motor Club (HMC) March 2026 – Present, Indianapolis, IN
Sole data engineer on the data modernization initiative, architecting end-to-end ETL framework migrating enterprise data
from AWS-hosted PostgreSQL, on-prem SQL Server (via ODBC), and Google BigQuery into Azure SQL Database. Designed
for incremental loads, schema drift tolerance, composite primary keys, and self-healing recovery from source resync events.
• Provisioned full Azure ingestion stack from scratch: configured Linked Services for Azure SQL Database
(LS_AzureSQL_AAATest_MI with Managed Identity authentication), Google BigQuery (service account authentication),
on-prem SQL Server via ODBC, and ADLS Gen2; installed and registered the HMCSHIR Self-Hosted Integration Runtime
on the on-prem server with multi-node support and configured AutoResolveIntegrationRuntime for cloud-only
activities.
• Implemented security and access controls using Azure Key Vault and RBAC: stored BigQuery service account JSON,
PostgreSQL credentials, SSH private keys, and ODBC connection secrets in Key Vault; granted the Data Factory
Managed Identity Key Vault Secrets User role for runtime secret retrieval; provisioned database-level permissions on
Azure SQL for the ADF Managed Identity and SHIR service account, removing all plaintext credentials from pipeline and
dataset definitions.
• Configured Azure Monitor end-to-end for pipeline observability: enabled Diagnostic Settings on the Data Factory to
stream Pipeline / Trigger / Activity runs and AllMetrics to a Log Analytics workspace; created an Action Group
(HMC-DataPipeline-Alerts) with email and Teams webhook recipients; built ADF_PipelineFailure_Alert with
metric-based conditions across six production pipelines and per-pipeline severity tuning for proactive failure
notification.
• Built coordinated ADF trigger orchestration on Eastern Time: TR_CPL_Warmup_Daily (12:15 AM) wakes the Azure SQL
instance and Databricks pool, TR_CPL_ResyncDetector_Daily (12:30 AM) runs the per-table watchdog, TR_CPL_Daily
(1:30 AM) runs the main load, TR_bigquery_to_azure_sql (Saturday 2:30 AM) runs the weekly BigQuery pipeline, and
Globalware runs Saturday 10:30 PM, with deliberate gaps to avoid resource contention and enable same-night anomaly
detection plus selective full reload.
• Set up the Azure Databricks workspace for the ETL framework: configured Unity Catalog External Locations
(hmcsaiteporting-bigquery, connectplus-raw) bound to a managed storage credential, provisioned the
HMC_Pipeline_pool instance pool with pre-warm logic, and authored two parameterized PySpark notebooks
(HMC_member_incremental_merge for CPL composite-PK merges and HMC_generic_incremental_merge reused across
BigQuery and future sources) with schema-tolerant unionByName history accumulation.
• Designed custom audit and operational logging schema in Azure SQL: cpl.ResyncLog with an action-code taxonomy
(BASELINE_SET / COUNT_OK / WATERMARK_RESET / SKIPPED) capturing every watchdog decision, cpl.WatchdogState
holding per-table baselines and thresholds, and cpl.SchemaChangeLog tracking auto-ALTER events for downstream
review.
• Built dynamic stored procedure framework in T-SQL with ROW_NUMBER deduplication,
INFORMATION_SCHEMA-based schema drift handling (auto-ALTER for new columns, type-drift detection), and
composite primary key support, replacing source-specific procs with reusable per-schema templates.
• Designed watermark-driven incremental load pattern in Azure Data Factory using ForEach + Lookup over a central
ControlTable, orchestrating data flows across ADLS Gen2 staging and Azure SQL targets with auto-create staging tables
and idempotent merge operations.
• Implemented PostgreSQL connectivity via SSH tunnel on the Self-Hosted Integration Runtime, diagnosing standby
replica WAL recovery conflicts (error 40001) and converting parallel ForEach to sequential to ensure stable nightly
extracts.
• Migrated on-prem SQL Server data integration off a retiring server by consolidating legacy SSIS jobs into a single ADF
pipeline using ODBC connectivity through the Self-Hosted Integration Runtime, with parquet staging and
TRUNCATE+INSERT loads.
• Built Google BigQuery to Azure SQL incremental and reference pipelines using service account authentication,
composite primary key handling, and NULL-preserving load logic to retain legitimate source rows without sacrificing
constraint integrity.
• Developed parameterized PySpark notebooks in Azure Databricks performing schema-tolerant unionByName history
merge with primary key deduplication on watermark DESC, supporting both PostgreSQL and BigQuery source patterns
through one reusable notebook.
• Designed self-healing watchdog system using T-SQL stored procedures with dynamic source row-count detection that
compares against per-table thresholds and selectively resets watermarks on anomaly events, replacing a
global-threshold design that caused unnecessary full reloads.
• Rebuilt brittle SSH tunnel infrastructure by replacing three coordinated scheduled tasks with a single
service-account-owned task surviving RDP disconnects, with structured logging and a watchdog restart mechanism.
• Developing Python automation in Azure Functions to integrate third-party API data for business KPI reporting, replacing
manual weekly script execution with scheduled deployment, Key Vault-based authentication, and ADLS / Azure SQL
persistence.
• Built coordinated trigger orchestration (warmup watchdog main pipeline) on Eastern Time schedules, enabling
same-night anomaly detection and selective watermark reset for only affected tables.
• Authored a full set of handoff documentation (six documents totaling 100+ pages) covering CPL pipeline architecture,
BigQuery pipeline architecture, Globalware ODBC integration, watchdog logic and threshold tuning, SSH tunnel
infrastructure (v1 to v2 incident lessons), and Databricks workspace setup; established design decisions, trade-offs,
troubleshooting playbooks, and operational runbooks for downstream maintainers.
Reduced pipeline runtime from ~2h 25m to ~45m through Databricks pool warmup pattern. Resolved 32GB storage
exhaustion via TRUNCATE-first staging design. Delivered three production pipelines covering ~50GB of data across AWS, GCP,
on-prem, and Azure source systems, with metadata-driven design enabling new tables to be onboarded via a single
ControlTable INSERT.
Azure Data Engineer / Power BI Developer
Service Experts March 2024 – February 2026, Richardson, TX
Team member on the data engineering squad building unified customer and operational data integration across Azure
Synapse Analytics, Azure Databricks, Delta Lake, and Snowflake. Delivered real-time and batch reporting solutions for
HVAC service operations supporting analytics, ML forecasting, and field operations.
• Designed scalable data integration solutions in Azure Synapse Analytics with dedicated and serverless SQL pools,
materialized views, Polybase external tables, OPENROWSET queries, and partitioning strategies for large-scale
analytical workloads.
• Built and maintained Azure Databricks notebooks using PySpark, Spark SQL, and Python libraries (Pandas, NumPy,
Scikit-learn, MLlib) for transformation, cleansing, schema evolution, aggregation, and ML model development.
• Leveraged Delta Lake with Structured Streaming and Kafka ingestion for managing batch and streaming datasets with
ACID compliance, enabling real-time analytics via Azure Event Hub and Databricks pipelines.
• Developed metadata-driven, parameterized pipelines in Azure Data Factory orchestrating flows across Synapse,
Databricks, ADLS, Blob Storage, Snowflake, and sources including SAP HANA, SharePoint, on-prem SQL Server (via
Self-Hosted IR), and REST APIs tested via Postman.
• Designed star and snowflake schema models in Synapse implementing medallion architecture (bronze / silver / gold) on
Delta Lake; built SSAS Tabular models with hierarchies, partitions, and DAX expressions for enterprise BI reporting.
• Built Power BI dashboards sourced from Synapse, Databricks, and Snowflake with advanced DAX, KPIs, slicers,
drill-downs, and row-level security delivering churn probability, revenue prediction, and technician utilization insights.
• Built Logic Apps and Power Automate workflows to trigger pipelines on file arrival and integrate Power BI / SharePoint
alerts with email and Teams notifications, reducing manual handoffs between systems.
• Implemented CI/CD using Azure DevOps Pipelines and GitHub Actions for automated deployments of ADF, Synapse,
and Databricks components, with Azure Monitor and Application Insights for runtime telemetry and alerting.
• Integrated Azure Key Vault, Managed Identity, Azure SQL Managed Instance, and RBAC for secure secrets
management and access control across ADF, Databricks, and Synapse, removing plaintext credentials from pipeline
definitions.
• Collaborated with data science, BI, and field operations teams in Agile sprints, contributing to architecture reviews,
sprint planning, code reviews, and stakeholder demos.
Delivered unified Customer 360 data platform and operational forecasting models supporting HVAC service operations
across multiple regions. Streamlined deployments through DevOps automation and reduced manual data preparation work
via metadata-driven pipeline design.
Senior Data Engineer / Power BI Developer
Byte Link Systems September 2023 – March 2024, Katy, TX
Senior data engineer on a client services team building ETL pipelines, data integration, and reporting solutions. Focused on
core data engineering work: T-SQL development, SSIS migration, relational modeling, query tuning, and Power BI
dashboards, with light use of Azure Data Factory and Databricks where projects called for cloud destinations.
• Developed and tuned complex T-SQL stored procedures, views, UDFs, and CTEs in SQL Server, optimizing execution
plans through covering indexes, statistics maintenance, and query refactoring to resolve long-running joins and
parameter-sniffing issues.
• Designed logical and physical data models for client warehouses and reporting marts, applying normalization (3NF) for
OLTP-style data and star schema with conformed dimensions and SCD Type 2 for analytical workloads.
• Built and maintained SSIS packages for batch ETL across SQL Server, PostgreSQL, flat files, and Excel sources, using
Lookup, Conditional Split, Derived Column, and SCD transformations with package-level configurations and
parameterized connections.
• Wrote Python scripts (Pandas, NumPy, requests) for data profiling, CSV / Excel cleansing, REST API ingestion, and
lightweight transformation jobs that fed downstream SQL workflows.
• Used Azure Data Factory selectively for cloud copy workloads (Mapping Data Flows, ForEach, Lookup, Web, Wait
activities) and PySpark / Spark SQL in Azure Databricks for the occasional large transformation, but kept the bulk of
logic in T-SQL and SSIS to match the team's stack.
• Integrated REST and SOAP APIs into ETL workflows with retry logic, exponential backoff, and pagination handling for
third-party data feeds.
• Built Power BI dashboards with star-schema data models, advanced DAX measures, time intelligence (YTD, MoM, QoQ
rolling averages), drill-throughs, and bookmarks for client business reviews.
• Developed SSRS reports (ad-hoc and parameterized) and migrated legacy Excel and Access reports into Power BI as part
of client modernization engagements.
• Collaborated with analysts and client stakeholders to gather requirements, design source-to-target mappings, and
document data lineage for audit and handoff.
Delivered multiple client-facing ETL and reporting projects with a focus on SQL performance, data quality, and maintainable
code, balancing on-prem SQL Server / SSIS work with selective cloud usage where it added real value.
Senior Data Engineer / Power BI Developer
Antares Capital February 2020 – August 2022, Chicago, IL
Senior engineer on the data engineering team at a private credit and direct lending firm, building data pipelines and Power
BI reporting solutions for financial reporting, regulatory submissions, and investment performance / portfolio management.
Worked primarily on the Azure stack (Synapse, Databricks, ADF) with AWS Redshift, Glue, and S3 as secondary sources,
with a strong BI focus on Power BI dashboards and DAX modeling for portfolio managers and finance leadership.
• Designed and developed interactive Power BI dashboards for loan portfolio analytics, fund performance, and regulatory
reporting, with advanced DAX measures (CALCULATE filter context, USERELATIONSHIP for role-playing dates,
semi-additive YTD / QTD / MTD measures), KPIs, drill-throughs, bookmarks, and row-level security for
portfolio-manager and fund-level access control.
• Built scalable data integration workflows using Azure Synapse Analytics (dedicated and serverless SQL pools), Azure
Data Factory, and Databricks with SQL, PySpark, and Spark SQL; pulled secondary sources from AWS Redshift and AWS
Glue catalogs via cross-cloud copy patterns.
• Designed star, snowflake, and hybrid schemas in Azure Synapse and AWS Redshift; created optimized stored
procedures, views, and UDFs in T-SQL with indexing, partitioning, and statistics tuning for analytics and ETL
performance on loan-level and transaction-level fact tables.
• Delivered regulatory and financial reporting datasets (loan tape, exposures, covenants, fund-level performance) with
audit-ready lineage, reconciliations against source systems of record, and DAX measures tied to finance-approved
calculation logic.
• Built portfolio-management dashboards covering IRR, MOIC, weighted-average spread, default and recovery metrics,
and concentration limits, sourced from Synapse and Databricks gold-layer tables curated for finance and investment
teams.
• Developed and deployed machine learning models in Databricks and Azure ML Studio using Python, Pandas, and MLlib
for forecasting and classification (default risk, prepayment likelihood), with secondary experimentation on AWS
SageMaker pipelines.
• Enabled real-time and batch data processing using Delta Lake, Azure Stream Analytics, Kafka, and AWS S3, ensuring
ACID compliance, schema enforcement, and data versioning on portfolio-update feeds.
• Orchestrated Databricks jobs with retry logic, autoscaling, and monitoring via Azure Monitor, AWS CloudWatch, and
Splunk dashboards for observability and SLA tracking on critical end-of-day and month-end reporting jobs.
• Collaborated with portfolio managers, finance, risk, and compliance stakeholders to gather requirements, prototype
dashboards, and deliver scalable BI / data solutions with CI/CD automation using GitHub and Azure DevOps.
Delivered a Power BI reporting layer covering portfolio performance, regulatory reporting, and fund analytics across a
private credit / direct lending portfolio, with the underlying data engineering platform on Azure Synapse + Databricks and
selective AWS integration for legacy data sources.
Data Engineer / Power BI Developer and Analyst
CNA Insurance March 2018 – January 2020, Chicago, IL
Data engineer on a BI and analytics team at a commercial property and casualty insurance carrier, building SQL Server data
models, SSIS ETL pipelines, and SSRS / Power BI / Tableau reporting for Sales, Underwriting, Claims, and Finance. Focused
on core data engineering: T-SQL development, SSIS package design, SSAS modeling, performance tuning, and dashboard
delivery.
• Designed and implemented logical and physical data models in SQL Server with normalization (3NF) and referential
integrity for transactional systems, and dimensional star / snowflake models for analytical workloads in Underwriting
and Claims.
• Built and optimized complex T-SQL stored procedures, triggers, views, functions, and indexes supporting ETL,
reconciliations, and business processes; tuned SQL performance using DMVs, Execution Plans, and Index Tuning Advisor
to resolve bottlenecks through covering indexes, statistics maintenance, and parameter-sniffing mitigations.
• Developed and maintained SSIS ETL pipelines with SCD Type 2, Conditional Splits, Lookups, Derived Columns, and
Foreach Loop containers, integrating flat files, mainframe extracts, legacy databases, and cloud sources into the
enterprise data warehouse.
• Migrated on-prem SQL Server and MySQL databases to Amazon RDS leveraging S3 backup / restore, validating row
counts, identity reseed, and downstream SSIS package connections after cutover.
• Built SSAS Tabular and Multidimensional models with hierarchies, partitions, calculated measures (DAX and MDX), and
role-based security to support enterprise BI for Sales, Underwriting, and Finance.
• Designed and deployed SSRS reports with parameterized expressions, drill-throughs, subscriptions, and scheduled
deliveries; modernized legacy SSRS and Excel reporting by migrating into Tableau and Power BI with DAX models and
time-intelligence measures.
• Delivered BI solutions for Sales, Underwriting, Claims, and Finance teams, collaborating with stakeholders to provide
insights into claims development, policy trends, loss ratios, and financial performance.
Built and maintained a SQL Server + SSIS + SSAS reporting foundation supporting multiple business lines across the
insurance organization, modernizing legacy SSRS and Excel reporting into Power BI and Tableau dashboards used daily by
Sales, Underwriting, and Finance leadership.
Data Analyst / SQL Developer (SSIS, SSRS, SSAS)
Conduent April 2016 – February 2018, Somerset, NJ
• Wrote and tuned complex T-SQL queries (CTEs, window functions, PIVOT / UNPIVOT, recursive queries) for ad-hoc
business analysis, monthly operational reporting, and reconciliations across client datasets.
• Developed and maintained SSIS packages for batch ETL, integrating flat files, Excel, SharePoint lists, and SQL Server
sources with Lookup, Conditional Split, Derived Column, Foreach Loop, and SCD transformations; scheduled and
monitored jobs via SQL Server Agent.
• Built SSRS reports (ad-hoc, parameterized, sub-reports, drill-down) with expressions, custom code, and scheduled
subscriptions for delivery to business users.
• Designed SSAS Tabular models with hierarchies, calculated measures, KPIs, and role-based security to support
self-service analytics.
• Performed data analysis and profiling on operational datasets to identify trends, anomalies, and data quality issues,
presenting findings to business stakeholders in regular review meetings.
• Created interactive dashboards in Power BI and Tableau on top of SQL Server and SSAS sources, with slicers,
drill-throughs, and DAX measures for KPIs and time intelligence.
• Migrated legacy SSRS, QlikView, and Excel reports into Power BI and Tableau as part of internal modernization
initiatives.
• Contributed to small Azure migration projects using Azure Data Factory and Azure SQL Database to copy on-prem
datasets into the cloud, gaining initial exposure to cloud-based ETL patterns.
• Collaborated with analysts, project managers, and client stakeholders to gather reporting requirements, document
source-to-target mappings, and deliver validated datasets and dashboards.
Junior Data Analyst / MSBI Developer
HSBC April 2015 – March 2016, Bangalore, India
• Wrote SQL queries against SQL Server databases for ad-hoc business questions, learned query tuning fundamentals
(indexes, execution plans), and supported senior analysts on monthly reporting tasks.
• Assisted in designing and maintaining SSIS ETL workflows ingesting data from SQL Server, Excel, and XML sources into
reporting systems, learning common transformations (Lookup, Conditional Split, Derived Column) and SSIS package
patterns.
• Developed SSRS reports (ad-hoc and parameterized) and supported the BI team with report deployments,
subscriptions, and user troubleshooting.
• Built interactive dashboards in Power BI and Qlik Sense to visualize KPIs and business metrics, picking up DAX basics
and data modeling concepts.
• Gained hands-on experience with Power Apps and Power Automate flows on SharePoint to streamline simple approval
workflows across Microsoft 365.
• Supported governance and deployment activities by reusing templates, following team standards, and learning
enterprise rollout practices in a regulated banking environment.