Data Engineer Azure

Location:

Irvine, CA

Salary:

90000

Posted:

September 10, 2025

Contact this candidate

Resume:

VENKATA SAI PAVAN DEMA

+1-949-***-**** *********@*****.***

PROFESSIONAL SUMMARY

Results-driven Data Engineer with 4+ years of experience designing, building, and optimising large-scale data pipelines, data lakes, and cloud data platforms across banking and financial services. Proven expertise in Azure Data Engineering (ADF, Synapse, Databricks, Event Hubs, Purview, Cosmos DB) and hands-on experience in multi-cloud ecosystems (AWS, GCP). Strong background in batch and real-time data processing, streaming architectures, dimensional modelling, and regulatory compliance reporting (Basel, AML, KYC, GDPR, PCI DSS). Skilled in data governance, data quality frameworks, and CI/CD automation to deliver secure, scalable, and analytics-ready datasets supporting fraud detection, credit risk scoring, and customer analytics. Adept at collaborating with business, compliance, and risk teams in Agile/Scrum environments to deliver high-impact, data-driven solutions.

PROFESSIONAL EXPERIENCE

Client: HSBC, New York

Role: Azure Data Engineer

Date: Sep 2023 – Present

Designed and implemented end-to-end Azure Data Factory (ADF) pipelines to ingest customer transactions, credit card swipes, and loan applications from multiple core banking systems into ADLS Gen2, enabling unified reporting.

Developed real-time fraud detection pipelines by integrating Event Hubs, Stream Analytics, and Databricks (PySpark), improving fraud response times from minutes to seconds.

Built regulatory compliance models (Basel III, AML, KYC) in Synapse Analytics with star and snowflake schemas, ensuring audit-ready submissions.

Optimized T-SQL stored procedures and partitioned Synapse tables, cutting reconciliation report runtimes by 30%.

Leveraged Delta Lake on Databricks to transform billions of transaction records into analytics-ready datasets for credit risk scoring and customer 360 analytics.

Configured Cosmos DB for storing mobile banking events and logs, supporting real-time fraud monitoring dashboards.

Implemented data governance with Purview to capture lineage and enforce GDPR/PCI DSS compliance across banking operations.

Built ADF + Databricks-based data quality frameworks (null/duplicate checks, schema validation), ensuring 99.9% accurate datasets before publishing to Synapse.

Delivered Power BI dashboards integrated with curated datasets for executives to track fraud trends, loan defaults, and revenue forecasts.

Automated deployments with Azure DevOps CI/CD pipelines using ARM templates and Terraform, achieving consistent, secure environment provisioning.

Enforced PII data security via Key Vault, RBAC, and encryption, reducing compliance risk.

Developed incremental CDC pipelines from core systems, reducing load times by 40%.

Set up Azure Monitor & Log Analytics alerts to proactively detect pipeline failures affecting financial reporting.

Partnered with compliance officers and risk teams to translate regulatory requirements into automated data solutions, accelerating delivery timelines.

Contributed in Agile/Scrum teams, delivering iterative features for migration, fraud detection, and analytics.

Client: DXC Technology, India

Role: Data Engineer

Date: May 2020 – July 2022

Designed and developed ETL pipelines in Python to ingest structured and unstructured data from enterprise systems into cloud data warehouses.

Built and optimized SQL queries, stored procedures, and indexing strategies, improving query performance for large transactional datasets.

Developed Spark jobs (PySpark/Scala/Java) for batch and streaming data processing, supporting both historical and real-time analytics.

Designed and maintained PostgreSQL/MySQL databases with normalization and partitioning for high-volume workloads.

Implemented NoSQL databases (MongoDB, Cassandra, DynamoDB) to handle semi-structured and real-time workloads.

Created warehouse models (Snowflake, Redshift, BigQuery) with star/snowflake schemas, improving BI reporting capabilities.

Automated ETL/ELT workflows using NiFi, Talend, Airbyte, and dbt, reducing manual intervention.

Processed petabyte-scale data using Hadoop HDFS, Hive, and Spark to enable enterprise-scale reporting.

Integrated Kafka pipelines to capture real-time data streams from APIs and transactional systems.

Built multi-cloud data pipelines (AWS, Azure, GCP), ensuring portability and resilience.

Orchestrated pipelines with Airflow DAGs, handling scheduling, dependencies, and monitoring for hundreds of jobs daily.

Developed ingestion frameworks for REST APIs, GraphQL, Salesforce, Google Analytics, and SAP, speeding up integration by 50%.

Processed JSON, Avro, Parquet, and ORC data formats, making them analytics-ready.

Implemented data validation, profiling, and cleansing rules, improving reliability of datasets used for business dashboards.

Applied MDM and governance practices with Collibra, Amundsen, and Alation, ensuring compliance with GDPR/HIPAA/SOC2.

Containerized pipelines with Docker & Kubernetes, enabling scalable, portable workloads.

Built CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI), reducing deployment time by 25%.

Automated provisioning with Terraform & CloudFormation, cutting environment setup time.

Secured datasets with RBAC, encryption at rest/in transit, and data masking, reducing compliance risks.

Partnered with stakeholders to translate business requirements into pipelines and models, ensuring adoption of data solutions.

Delivered BI dashboards (Tableau, Power BI, Looker) to executives, influencing business strategy.

Supported data science teams with feature-rich datasets, accelerating ML projects.

Contributed to Agile ceremonies, shared knowledge in team sessions, and mentored juniors.

TECHNICAL SKILLS

Cloud Platforms: Azure (ADF, Synapse, Databricks, Event Hubs, Purview, Cosmos DB), AWS (S3, Glue, Redshift, EMR), GCP (BigQuery, Dataflow, Pub/Sub)

Data Engineering: ETL/ELT, Apache Spark (PySpark/Scala), Kafka, Airflow, NiFi, dbt

Databases: SQL (PostgreSQL, MySQL, T-SQL), NoSQL (MongoDB, Cassandra, DynamoDB), Delta Lake, ADLS Gen2

Data Modeling & Warehousing: Star/Snowflake schemas, Synapse, Snowflake, Redshift, BigQuery

Governance & Security: Purview, Collibra, GDPR, PCI DSS, Data Quality, MDM

DevOps & Automation: Azure DevOps, Jenkins, Terraform, Docker, Kubernetes

BI & Analytics: Power BI, Tableau

Programming: Python, SQL, Scala, Java

CERTIFICATIONS

Microsoft Certified: Azure Data Engineer Associate (DP-203)

AWS Certified Data Analytics – Speciality

EDUCATION

University of Cumberlands, Kentucky, USA

Master of Sciences (MS) in Information Systems and Technology

Christ University, Bangalore, India

Bachelor of Technology (B. Tech), in Information Technology

Contact this candidate