Jhansi Bhargavi
Data Engineer
Contact: 437-***-**** Email: **********@*****.***
Accomplishments:
7+ years of experience in designing, building, and optimizing scalable data pipelines in cloud and hybrid environments.
Hands-on in developing real-time and batch processing pipelines using Spark, Kafka, and Airflow.
Proficient in SQL and NoSQL databases like PostgreSQL, MongoDB, and BigQuery.
Strong expertise in cloud platforms: AWS, Azure, and GCP with experience in S3, EMR, Redshift, Data Lake, and Pub/Sub.
Built and maintained ETL/ELT workflows using Apache Airflow, Azure Data Factory, and Google Dataflow.
Designed data lake architectures and implemented zone-based layering: raw, processed, curated.
Experience with PySpark, Python, and Scala for data transformation and validation logic.
Implemented data quality frameworks using Great Expectations and Delta Lake.
Developed cost-optimized pipelines and query patterns using BigQuery, Athena, and Redshift Spectrum.
Built CI/CD pipelines for data jobs using GitHub Actions, Azure DevOps, and Terraform.
Integrated with BI tools like Looker and Power BI for downstream analytics consumption.
Worked with product, analytics, and business teams to translate data requirements into scalable solutions.
Applied advanced partitioning, bucketing, and clustering techniques to improve data access performance.
Participated in data governance initiatives, metadata tracking, and compliance (GDPR, HIPAA).
Mentored junior engineers on data engineering best practices, pipeline optimization, and code reviews.
Experienced in designing scalable CDC (Change Data Capture) pipelines using Debezium and Kafka Connect.
Automated metadata extraction and lineage tracking using OpenLineage and DataHub.
Built feature stores using Redis and Feast to support ML pipeline integration.
Conducted POCs on Lakehouse architecture using Apache Hudi and Delta Lake.
Integrated APIs and external vendor data feeds securely using OAuth and REST interfaces.
Experience in data modeling (3NF, Star, Snowflake) for enterprise analytics warehouses.
Set up end-to-end data CI/CD processes with validation, linting, and unit tests using pytest and dbt test.
Led root cause analysis (RCA) efforts during critical data outages and implemented long-term fixes.
Migrated legacy ETL scripts from shell/PL-SQL to modular Python-based pipelines with logging and alerting.
Strong understanding of cost optimization strategies across AWS, Azure, and GCP cloud billing metrics.
Education:
Bachelors Degree in Computer Science and Engineering from Andhra University - 2017
Technical Skills:
Programming: Python, PySpark, Scala, SQL, Bash
Cloud Platforms: AWS (S3, Lambda, Redshift), Azure (ADF, Databricks), GCP (BigQuery, Pub/Sub)
Data Processing: Apache Spark, Kafka, Beam, Kinesis, Dataflow
Orchestration: Airflow, Azure Data Factory, Cloud Composer
Data Warehousing: Redshift, BigQuery, Snowflake, Azure Synapse
ETL/ELT Tools: DBT, Informatica, Glue, SSIS
Databases: PostgreSQL, Oracle, MongoDB, MySQL
DevOps & Infra: Terraform, Docker, GitHub Actions, Azure DevOps
Monitoring & Logging: CloudWatch, Datadog, Azure Monitor, Stackdriver
Reporting & BI: Power BI, Looker, Tableau
Work Experience:
Interac, Toronto Dates: June 2024 – Present
Senior Data Engineer
Project Overview:
Designed and implemented real-time data ingestion pipelines using Apache Kafka and AWS Kinesis for retail inventory updates.
Utilized AWS Lambda to process events in near real-time and store structured data in Amazon S3 and Redshift.
Built Spark Structured Streaming jobs on EMR to transform streaming data for downstream analytics.
Used Glue Catalog to manage metadata across S3 and enable schema evolution.
Created materialized views in Redshift Spectrum to support Power BI dashboards for executives.
Implemented data deduplication and watermarking logic to ensure clean event handling.
Designed partitioning and bucketing strategies in S3 to optimize query performance.
Applied Delta Lake for handling late-arriving data and supporting ACID operations.
Tuned Spark jobs using broadcast joins, caching, and checkpointing.
Built custom alerting system using SNS and CloudWatch to detect data anomalies in ingestion.
Developed reusable Terraform modules to provision and manage infrastructure as code.
Secured data pipelines using IAM roles, KMS encryption, and S3 bucket policies.
Integrated Great Expectations to enforce data quality validation on landing and transformed zones.
Enabled cost tracking by tagging ETL jobs and applying Athena usage reports.
Collaborated closely with data analysts and product managers to prioritize data features.
Participated in agile ceremonies, sprint reviews, and handled production on-call rotations.
Environment & Technologies: AWS S3, Kinesis, Lambda, Redshift, EMR, Apache Kafka, Spark Structured Streaming, Glue Catalog, Delta Lake, PostgreSQL, Terraform, Maven, GitHub, CloudWatch, SNS, Great Expectations, Python
Equitable Bank, ON Dates: Nov 2022 – May 2024
Data Engineer – Cloud Migration Specialist
Project Overview:
Migrated 10+ years of legacy financial data from on-prem SQL Server and Oracle to Azure Data Lake Storage Gen2.
Built reusable and parameterized ADF pipelines to orchestrate batch and incremental ingestion.
Used Azure Databricks with PySpark for data transformation, cleansing, and schema enforcement.
Implemented Slowly Changing Dimensions (SCD Type 2) logic to support historical tracking.
Applied data validation logic using DataFrame APIs and Delta expectations.
Leveraged Mount Points and DBFS to read/write securely between ADLS and Databricks.
Enabled data lineage and governance using Azure Purview, capturing metadata across layers.
Developed Dev/Test/Prod CI/CD workflows using Azure DevOps YAML pipelines.
Integrated Azure Monitor and Log Analytics to track pipeline failures and performance metrics.
Created role-based access using Azure Active Directory groups and Key Vault secrets.
Used Databricks Auto Loader to incrementally load new files with schema evolution.
Shared curated Delta tables as data products to analytics teams via Unity Catalog.
Help build a data quality dashboard in Power BI showing freshness and completeness metrics.
Mentored junior engineers on Databricks optimization and ADF best practices.
Environment & Technologies: Azure Data Lake Gen2, Azure Data Factory, Azure Databricks, Delta Lake, PySpark, SQL Server, Oracle, Azure Key Vault, Azure Monitor, Azure DevOps, DBFS, Auto Loader, Azure Purview, Power BI, Git, YAML Pipelines, Python
Uber Canada, Toronto Dates: April 2021 – Oct 2022
Data Engineer – Analytics & Attribution
Project Overview:
Built an event-driven data pipeline to track multi-touch user journeys from websites, CRM, and ad platforms.
Used Google Pub/Sub for capturing real-time events and Dataflow (Apache Beam) for stream transformations.
Designed BigQuery schemas with partitioning and clustering for faster attribution queries.
Implemented Airflow DAGs to orchestrate batch workflows for enrichment and scoring logic.
Developed SQL-based attribution models (first-touch, last-touch, linear, time-decay).
Used Looker dashboards to provide insights to marketing, product, and finance teams.
Enabled data export to Google Sheets and external APIs for operational workflows.
Built custom UDFs and stored procedures in BigQuery to handle scoring logic and window functions.
Applied Cloud Composer for managing Airflow pipelines with environment isolation.
Enforced GDPR-compliant data masking using BigQuery data policies and DLP API.
Integrated campaign metadata from Salesforce and HubSpot to link ad spend with conversion.
Provided ongoing optimization and refactoring of SQL models to improve performance and cost efficiency.
Environment & Technologies: Google BigQuery, Google Pub/Sub, Cloud Dataflow, Cloud Composer (Airflow), Looker, GCS, Google DLP API, Salesforce, HubSpot, UDFs, SQL, Python, DAGs, Stackdriver, API Integrations.
TCS Global Dates: August 2017 – March 2021
Data Engineer – Customer Data Platform (CDP)
Project Overview:
Built an enterprise-wide Customer 360 platform by integrating data from Salesforce, Shopify, Zendesk, and Marketo.
Leveraged Fivetran for ELT ingestion into Snowflake with automatic schema evolution.
Created DBT models for transformations including joins, deduplication, and identity resolution logic.
Defined data contracts and applied versioning for schema stability across teams.
Applied SCD Type 2 logic in DBT for tracking customer lifecycle changes.
Implemented row-level security policies and masking using Snowflake access controls.
Automated daily and hourly model runs using dbt Cloud scheduler with Slack alerts.
Built unified data marts for marketing, sales, and support teams for self-service analytics.
Monitored freshness and model failures via dbt metadata and integrated with Airflow DAGs.
Conducted code reviews and performance tuning for complex SQL-based DBT models.
Environment & Technologies: Snowflake, Fivetran, DBT (Cloud + CLI), dbt Scheduler, Slack Integration, SQL, Jinja, Airflow, Salesforce, Zendesk, Marketo, Shopify, Looker, GitHub, dbt Tests, Snowflake Access Controls