Azure Data Engineer - End-to-End Data Platform Expert

Location:

Secaucus, NJ

Salary:

125000

Posted:

March 24, 2026

Contact this candidate

Resume:

Jitendra Kotapati

Azure Data Engineer

************@*****.*** 510-***-****

Summary

• Experienced Azure Data Engineer with 7 years of end-to-end expertise designing scalable data platforms, real-time processing systems, and cloud-native solutions that transform raw, complex data into trusted, analytics-ready assets for business, operational, and ML teams.

• Highly skilled in Python, PySpark, SQL, Airflow, Kafka, Snowflake, Data bricks, Delta Lake, and building distributed data pipelines across multi-cloud environments including AWS, Azure, and GCP.

• Hands-on experience building and operating large-scale Big Data platforms on Azure, including Azure Data Stack (ADLS Gen2, HDInsight), Spark, Scala, and Cosmos DB (Map Reducer).

• Experience implementing and maintaining CI/CD pipelines using Azure DevOps, supporting automated build, test, and deployment of data solutions.

• Proficient in Docker, Kubernetes, Terraform, GitHub Actions, Jenkins, and implementing CI/CD pipelines for automated deployments, environment provisioning, and pipeline versioning.

• Strong command of SQL and high-level programming languages including Scala, Java, Python, and C#, used for data transformation, analytics, and backend data services.

• Extensive experience with CI/CD pipelines, production deployments, debugging, monitoring, and troubleshooting across the full engineering lifecycle.

• Experienced in data governance, security compliance (HIPAA, PCI, and SOC2), RBAC, encryption, and implementing policy-driven access to ensure data trust, privacy, and regulatory readiness.

• Strong collaborator with engineering managers and FTE teams, following strict engineering practices, deployment standards, and compliance requirements in pre-prod and prod environments.

Technical Skills

Programming Languages

Python(Advanced),SQL,PySpark,Scala(Advanced–ApacheSpark,Distributed Data Processing),Shell/Bash, JavaScript (for automation/REST integration)

Python Data & ML Libraries

Pandas, NumPy, PySpark, Dask, Polars, FastAPI / Flask, Scikit-Learn, TensorFlow/Keras (basic), ETL

Big Data Ecosystem

Apache Spark (PySpark), Hadoop (HDFS, YARN, MapReduce), Hive, Impala, Presto, Trino, HBase, Kafka, Databricks, Snowflake,

Data Security & Governance

RBAC, IAM, Encryption,HIPAA, PCI, SOC2, Secure Data Access, Audit Logging, Data Lineage

Cloud Platforms

AWS, Azure, GCP

Streaming & Real-Time Processing

Apache Kafka, Kinesis Streams

DevOps / CI-CD

Git, GitHub, GitLab, Bitbucket, Jenkins, Docker, Kubernetes

Data Engineering Tools

Delta Lake, Iceberg, Parquet/ORC/Avro

API & Microservices

REST APIs using FastAPI / Flask, GraphQL (basic)

Professional Experience

Client: NVIDIA, Seattle, WA Mar 2024 - Present

Position: Cloud Data Engineer

Responsibilities:

• Developed unified data models using Delta Lake, Snowflake, and Big Query for customers, accounts, transactions, rewards, and fraud signals to power analytics and real-time deaccessioning.

• Designed, developed, and optimized large-scale distributed data pipelines using Apache Spark (Scala) to process high-volume, high-velocity datasets supporting AI/ML and analytics workloads.

• Built and maintained Azure-based Big Data platforms, leveraging ADLS Gen2, HDInsight, and Azure Data Factory, ensuring scalable, fault-tolerant, and high-throughput data processing.

• Implemented Cosmos DB (Map Reducer-style workloads) for distributed data ingestion, transformation, and aggregation across petabyte-scale datasets.

• Built and operated Hadoop-based data processing pipelines using HDFS, YARN, and Spark on Azure HDInsight for large-scale batch analytics.

• Developed complex SQL queries and data models to support downstream analytics, reporting, and feature engineering use cases.

• Tuned Spark on Azure for optimal performance by configuring executors, memory management, partitioning, and shuffle behavior across HDInsight clusters.

• Applied CI/CD best practices using Azure DevOps pipelines for automated build, test, deployment, and rollback of data services and Spark jobs.

• Led security hardening initiatives, including remediation of code vulnerabilities, secret and identity management, secure endpoint configuration, and compliance with enterprise security standards.

• Supported data governance and compliance requirements, implementing data partitioning strategies, access controls, and physical/logical security restrictions.

• Performed deep debugging and troubleshooting of production issues across Spark jobs, distributed services, and Azure infrastructure, ensuring high availability and reliability.

• Built and maintained CI/CD pipelines using Azure DevOps, enabling automated deployment of Spark jobs, data pipelines, and infrastructure configurations.

• Engineered batch and streaming pipelines using Spark Structured Streaming, Kafka, and Event Hubs to track the full lifecycle of card activity — activation to collections.

• Implemented secure data engineering practices including encryption at rest and in transit, IAM-based access controls, and RBAC to protect sensitive financial and customer data.

• Implemented monitoring, alerting, and SLA tracking for Data bricks jobs using Data dog, Cloud Watch, and Spark logs.

• Integrated Cosmos DB with Spark on Azure (HDInsight) for distributed processing and advanced analytics workflows.

• Implemented Azure security best practices, including Managed Identities, Key Vault integration, RBAC, private endpoints, and network isolation.

• Continuously optimized cost and performance of Azure storage and Spark compute resources.

• Built real-time BNPL and instant credit decision pipelines using Kafka and Spark Streaming.

• Debugged and resolved production issues in Azure environments, including Spark failures, data latency, access issues, and infrastructure-related incidents.

• Engineered fraud prevention datasets using Kafka, Spark, Python, and real-time rules engines for device intelligence, synthetic identity scoring, and chargeback patterns.

Client: Bread Financials, Charlotte, NC Feb 2022 – Feb 2024

Position: ETL / Big Data Engineer

• Worked on daily credit card and payment transaction data, building ETL jobs in Spark and SQL to load data from source systems into the data warehouse.

• Handled large transaction files and streaming feeds from digital payment and BNPL systems, making sure data was processed on time for reporting and analytics.

• Developed Spark jobs to calculate balances, fees, interest, and installment amounts for credit card and financing products.

• Integrated data from core banking, POS systems, and external APIs to support savings accounts, deposits, and payment history tracking.

• Added data validation and reconciliation checks to compare source data with target tables and catch data issues early.

• Tuned Spark jobs and SQL queries to improve performance and reduce processing time, especially during month-end and peak loads.

• Worked closely with business analysts and product teams to understand requirements and make changes based on real business scenarios.

• Supported analytics and dashboard teams by providing trusted datasets for customer behavior, credit usage, and financial wellness insights.

• Followed security and compliance standards by masking sensitive data, controlling access, and supporting audit and compliance requests.

Client: Merck, Kenilworth, NJ Sep 2020 - Jan 2022 Position: Data Engineer

Responsibilities:

• Building end-to-end data pipelines, healthcare domain models, and cloud-native data products that unify medical, pharmacy, behavioral, and employer claims into a single analytics-ready platform.

• Designed healthcare domain data models (ICD, CPT, NDC, DRG, NPI, BlueCard) to support cross-service analytics across medical, Rx, and out-of-state claims.

• Designed and implemented enterprise-scale Azure data pipelines supporting clinical, manufacturing, and research data using Azure Data Factory, ADLS Gen2, HDInsight, Spark, and Scala.

• Built and optimized distributed data processing frameworks with Apache Spark (Scala) to handle high-volume, structured and semi-structured datasets in regulated environments.

• Designed and implemented Azure-native data platforms using ADLS Gen2, Azure Data Factory, HDInsight, and Azure Compute, supporting clinical, manufacturing, and enterprise data workloads.

• Built and maintained end-to-end Azure data pipelines for ingestion, transformation, and storage of large-scale structured and semi-structured datasets using Apache Spark (Scala).

• Implemented Cosmos DB for scalable ingestion and aggregation of high-volume datasets across regulated environments.

• Optimized Spark on Azure by tuning executors, memory allocation, partitioning strategies, and shuffle performance to meet enterprise throughput and reliability requirements.

• Authored complex SQL queries and data models to support analytics, reporting, and downstream data consumption across R&D and business teams.

• Tuned Cosmos DB RU/s provisioning, auto scale, and throughput limits to balance performance, cost efficiency, and compliance requirements.

• Ensured HIPAA-compliant data processing by implementing encryption, role-based access controls, audit logging, and secure data handling across healthcare claims pipelines.

• Implemented automated data quality validation, including duplicate claim checks, eligibility mismatches, PA validations, and provider network mapping.

• Developed preventive-care and wellness data products such as screening-due lists, wellness-reward metrics, and population-health outreach datasets.

• Built and optimized SQL-based analytical queries and transformations in Big Query alongside Snowflake and Data bricks to support healthcare claims reporting, regulatory analytics, and operational insights.

• Engineered behavioral/mental-health integration pipelines combining therapy sessions, authorizations, diagnosis codes, and claims with HIPAA safeguards.

• Supported GxP, SOX, and data governance compliance, implementing data partitioning, access controls, audit logging, and validation documentation.

• Enforced Azure security best practices, including Managed Identities, Azure Key Vault, RBAC, private endpoints, encryption at rest and in transit, aligned with pharma security standards.

• Led security and data protection initiatives, including identity and access management (IAM), secret rotation, encryption, and secure endpoint configuration aligned with pharma security standards.

• Created BlueCard out-of-state claims workflows, mapping external identifiers and producing consolidated datasets for finance and actuarial teams.

• Delivered REST APIs and data marts supporting member portals, provider portals, claims history dashboards, and mobile app insights.

Client: Root Insurance, Columbus, OH Nov 2018 – Jul 2020 Position: Data Engineer

Responsibilities:

• Designed and developed end-to-end ETL/ELT pipelines using Python, SQL, PySpark, Airflow, and Databricks to process high-volume transactional, behavioral, and market datasets.

• Built real-time streaming pipelines using Kafka and Spark Streaming to process credit card transactions, payments, fraud alerts, trade events, and online banking activity with millisecond latency.

• Developed MapReducer-style data processing workflows using Cosmos DB for scalable aggregation, transformation, and parallel execution across large data partitions.

• Engineered Python micro services and APIs (FastAPI/Flask) that integrate data from retail banking systems, payment networks, trading platforms, wealth management applications, and fraud detection tools.

• Monitored and troubleshot Cosmos DB performance issues, including throttling, partition hot-spots, and query inefficiencies.

• Developed Customer 360 datasets by ingesting data from multiple channels (mobile app, ATM, cards, deposits, call center, CRM) to support personalization, segmentation, and behavioral analysis.

• Streamlined commercial banking data flows by building ingestion pipelines for loan portfolios, treasury transactions, credit line utilization, and customer onboarding.

• Processed and enriched market data (equities, bonds, FX, derivatives) to support portfolio analytics, wealth advisory tools, and risk scoring models for high-net-worth clients.

• Implemented secure, real-time pipelines for ACH, wire, SWIFT, RTP, Zelle, and corporate treasury transactions, ensuring high availability and zero data loss.

• Built real-time fraud detection data pipelines, integrating card transactions, login attempts, device fingerprints, and behavioral biometrics with ML scoring engines.

• Automated risk and compliance data workflows supporting AML monitoring, credit risk, market risk, liquidity risk, cybersecurity alerts, and regulatory reporting.

Client: Somerset Savings Bank, NJ Sep 2017– Oct 2018

Position: Associate Software Developer

Responsibilities:

• Modeled customer, account, transaction, and loan data to support cross-product analytics covering checking, savings, CDs, mobile activity, loan performance, and debit-card usage.

• Integrated online and mobile banking data (logins, transactions, bill payments, alerts, mobile deposits) to build real-time behavioral monitoring and digital engagement dashboards.

• Engineered business-banking data pipelines to process ACH files, wire transfers, merchant transactions, and small-business loan applications with automated validation rules.

• Developed lending-data workflows to process loan applications, credit reports, repayment schedules, interest calculations, and delinquency metrics for retail and business credit products.

• Built ATM & debit-card transaction pipelines, normalizing switch logs, fraud-alerts, denied transactions, and surcharge activity for fraud detection and reconciliation.

• Automated KYC/AML checks by integrating internal banking data with third-party sources and applying rule-based validation for high-risk and unusual activity patterns.

• Implemented automated data quality checks for account mismatches, duplicate transactions, loan-balance inconsistencies, and card-network reconciliation issues.

• Developed secure data marts powering online banking dashboards, branch performance reports, loan-risk models, and customer financial-health insights.

• Ensured regulatory compliance (FDIC, FFIEC, AML, KYC) through encrypted pipelines, audit logs, role-based access, anonymization, and automated monitoring.

EDUCATION:

• Bachelor in Electrical and Electronics Engineering from JNTU Kakinada

• Masters in Computer Science from Northwestern Polytechnic University

Contact this candidate