Jitendra Kotapati
Azure Data Engineer
************@*****.*** 510-***-****
Summary
• Experienced Azure Data Engineer with 7 years of end-to-end expertise designing scalable data platforms, real-time processing systems, and cloud-native solutions that transform raw, complex data into trusted, analytics-ready assets for business, operational, and ML teams.
• Highly skilled in Python, PySpark, SQL, Airflow, Kafka, Snowflake, Data bricks, Delta Lake, and building distributed data pipelines across multi-cloud environments including AWS, Azure, and GCP.
• Hands-on experience building and operating large-scale Big Data platforms on Azure, including Azure Data Stack (ADLS Gen2, HDInsight), Spark, Scala, and Cosmos DB (Map Reducer).
• Experience implementing and maintaining CI/CD pipelines using Azure DevOps, supporting automated build, test, and deployment of data solutions.
• Proficient in Docker, Kubernetes, Terraform, GitHub Actions, Jenkins, and implementing CI/CD pipelines for automated deployments, environment provisioning, and pipeline versioning.
• Strong command of SQL and high-level programming languages including Scala, Java, Python, and C#, used for data transformation, analytics, and backend data services.
• Extensive experience with CI/CD pipelines, production deployments, debugging, monitoring, and troubleshooting across the full engineering lifecycle.
• Experienced in data governance, security compliance (HIPAA, PCI, and SOC2), RBAC, encryption, and implementing policy-driven access to ensure data trust, privacy, and regulatory readiness.
• Strong collaborator with engineering managers and FTE teams, following strict engineering practices, deployment standards, and compliance requirements in pre-prod and prod environments.
Technical Skills
Programming Languages
Python(Advanced),SQL,PySpark,Scala(Advanced–ApacheSpark,Distributed Data Processing),Shell/Bash, JavaScript (for automation/REST integration)
Python Data & ML Libraries
Pandas, NumPy, PySpark, Dask, Polars, FastAPI / Flask, Scikit-Learn, TensorFlow/Keras (basic), ETL
Big Data Ecosystem
Apache Spark (PySpark), Hadoop (HDFS, YARN, MapReduce), Hive, Impala, Presto, Trino, HBase, Kafka, Databricks, Snowflake,
Data Security & Governance
RBAC, IAM, Encryption,HIPAA, PCI, SOC2, Secure Data Access, Audit Logging, Data Lineage
Cloud Platforms
AWS, Azure, GCP
Streaming & Real-Time Processing
Apache Kafka, Kinesis Streams
DevOps / CI-CD
Git, GitHub, GitLab, Bitbucket, Jenkins, Docker, Kubernetes
Data Engineering Tools
Delta Lake, Iceberg, Parquet/ORC/Avro
API & Microservices
REST APIs using FastAPI / Flask, GraphQL (basic)
Professional Experience
Client: NVIDIA, Seattle, WA Mar 2024 - Present
Position: Cloud Data Engineer
Responsibilities:
• Developed unified data models using Delta Lake, Snowflake, and Big Query for customers, accounts, transactions, rewards, and fraud signals to power analytics and real-time deaccessioning.
• Designed, developed, and optimized large-scale distributed data pipelines using Apache Spark (Scala) to process high-volume, high-velocity datasets supporting AI/ML and analytics workloads.
• Built and maintained Azure-based Big Data platforms, leveraging ADLS Gen2, HDInsight, and Azure Data Factory, ensuring scalable, fault-tolerant, and high-throughput data processing.
• Implemented Cosmos DB (Map Reducer-style workloads) for distributed data ingestion, transformation, and aggregation across petabyte-scale datasets.
• Built and operated Hadoop-based data processing pipelines using HDFS, YARN, and Spark on Azure HDInsight for large-scale batch analytics.
• Developed complex SQL queries and data models to support downstream analytics, reporting, and feature engineering use cases.
• Tuned Spark on Azure for optimal performance by configuring executors, memory management, partitioning, and shuffle behavior across HDInsight clusters.
• Applied CI/CD best practices using Azure DevOps pipelines for automated build, test, deployment, and rollback of data services and Spark jobs.
• Led security hardening initiatives, including remediation of code vulnerabilities, secret and identity management, secure endpoint configuration, and compliance with enterprise security standards.
• Supported data governance and compliance requirements, implementing data partitioning strategies, access controls, and physical/logical security restrictions.
• Performed deep debugging and troubleshooting of production issues across Spark jobs, distributed services, and Azure infrastructure, ensuring high availability and reliability.
• Built and maintained CI/CD pipelines using Azure DevOps, enabling automated deployment of Spark jobs, data pipelines, and infrastructure configurations.
• Engineered batch and streaming pipelines using Spark Structured Streaming, Kafka, and Event Hubs to track the full lifecycle of card activity — activation to collections.
• Implemented secure data engineering practices including encryption at rest and in transit, IAM-based access controls, and RBAC to protect sensitive financial and customer data.
• Implemented monitoring, alerting, and SLA tracking for Data bricks jobs using Data dog, Cloud Watch, and Spark logs.
• Integrated Cosmos DB with Spark on Azure (HDInsight) for distributed processing and advanced analytics workflows.
• Implemented Azure security best practices, including Managed Identities, Key Vault integration, RBAC, private endpoints, and network isolation.
• Continuously optimized cost and performance of Azure storage and Spark compute resources.
• Built real-time BNPL and instant credit decision pipelines using Kafka and Spark Streaming.
• Debugged and resolved production issues in Azure environments, including Spark failures, data latency, access issues, and infrastructure-related incidents.
• Engineered fraud prevention datasets using Kafka, Spark, Python, and real-time rules engines for device intelligence, synthetic identity scoring, and chargeback patterns.
Client: Bread Financials, Charlotte, NC Feb 2022 – Feb 2024
Position: ETL / Big Data Engineer
• Worked on daily credit card and payment transaction data, building ETL jobs in Spark and SQL to load data from source systems into the data warehouse.
• Handled large transaction files and streaming feeds from digital payment and BNPL systems, making sure data was processed on time for reporting and analytics.
• Developed Spark jobs to calculate balances, fees, interest, and installment amounts for credit card and financing products.
• Integrated data from core banking, POS systems, and external APIs to support savings accounts, deposits, and payment history tracking.
• Added data validation and reconciliation checks to compare source data with target tables and catch data issues early.
• Tuned Spark jobs and SQL queries to improve performance and reduce processing time, especially during month-end and peak loads.
• Worked closely with business analysts and product teams to understand requirements and make changes based on real business scenarios.
• Supported analytics and dashboard teams by providing trusted datasets for customer behavior, credit usage, and financial wellness insights.
• Followed security and compliance standards by masking sensitive data, controlling access, and supporting audit and compliance requests.
Client: Merck, Kenilworth, NJ Sep 2020 - Jan 2022 Position: Data Engineer
Responsibilities:
• Building end-to-end data pipelines, healthcare domain models, and cloud-native data products that unify medical, pharmacy, behavioral, and employer claims into a single analytics-ready platform.
• Designed healthcare domain data models (ICD, CPT, NDC, DRG, NPI, BlueCard) to support cross-service analytics across medical, Rx, and out-of-state claims.
• Designed and implemented enterprise-scale Azure data pipelines supporting clinical, manufacturing, and research data using Azure Data Factory, ADLS Gen2, HDInsight, Spark, and Scala.
• Built and optimized distributed data processing frameworks with Apache Spark (Scala) to handle high-volume, structured and semi-structured datasets in regulated environments.
• Designed and implemented Azure-native data platforms using ADLS Gen2, Azure Data Factory, HDInsight, and Azure Compute, supporting clinical, manufacturing, and enterprise data workloads.
• Built and maintained end-to-end Azure data pipelines for ingestion, transformation, and storage of large-scale structured and semi-structured datasets using Apache Spark (Scala).
• Implemented Cosmos DB for scalable ingestion and aggregation of high-volume datasets across regulated environments.
• Optimized Spark on Azure by tuning executors, memory allocation, partitioning strategies, and shuffle performance to meet enterprise throughput and reliability requirements.
• Authored complex SQL queries and data models to support analytics, reporting, and downstream data consumption across R&D and business teams.
• Tuned Cosmos DB RU/s provisioning, auto scale, and throughput limits to balance performance, cost efficiency, and compliance requirements.
• Ensured HIPAA-compliant data processing by implementing encryption, role-based access controls, audit logging, and secure data handling across healthcare claims pipelines.
• Implemented automated data quality validation, including duplicate claim checks, eligibility mismatches, PA validations, and provider network mapping.
• Developed preventive-care and wellness data products such as screening-due lists, wellness-reward metrics, and population-health outreach datasets.
• Built and optimized SQL-based analytical queries and transformations in Big Query alongside Snowflake and Data bricks to support healthcare claims reporting, regulatory analytics, and operational insights.
• Engineered behavioral/mental-health integration pipelines combining therapy sessions, authorizations, diagnosis codes, and claims with HIPAA safeguards.
• Supported GxP, SOX, and data governance compliance, implementing data partitioning, access controls, audit logging, and validation documentation.
• Enforced Azure security best practices, including Managed Identities, Azure Key Vault, RBAC, private endpoints, encryption at rest and in transit, aligned with pharma security standards.
• Led security and data protection initiatives, including identity and access management (IAM), secret rotation, encryption, and secure endpoint configuration aligned with pharma security standards.
• Created BlueCard out-of-state claims workflows, mapping external identifiers and producing consolidated datasets for finance and actuarial teams.
• Delivered REST APIs and data marts supporting member portals, provider portals, claims history dashboards, and mobile app insights.
Client: Root Insurance, Columbus, OH Nov 2018 – Jul 2020 Position: Data Engineer
Responsibilities:
• Designed and developed end-to-end ETL/ELT pipelines using Python, SQL, PySpark, Airflow, and Databricks to process high-volume transactional, behavioral, and market datasets.
• Built real-time streaming pipelines using Kafka and Spark Streaming to process credit card transactions, payments, fraud alerts, trade events, and online banking activity with millisecond latency.
• Developed MapReducer-style data processing workflows using Cosmos DB for scalable aggregation, transformation, and parallel execution across large data partitions.
• Engineered Python micro services and APIs (FastAPI/Flask) that integrate data from retail banking systems, payment networks, trading platforms, wealth management applications, and fraud detection tools.
• Monitored and troubleshot Cosmos DB performance issues, including throttling, partition hot-spots, and query inefficiencies.
• Developed Customer 360 datasets by ingesting data from multiple channels (mobile app, ATM, cards, deposits, call center, CRM) to support personalization, segmentation, and behavioral analysis.
• Streamlined commercial banking data flows by building ingestion pipelines for loan portfolios, treasury transactions, credit line utilization, and customer onboarding.
• Processed and enriched market data (equities, bonds, FX, derivatives) to support portfolio analytics, wealth advisory tools, and risk scoring models for high-net-worth clients.
• Implemented secure, real-time pipelines for ACH, wire, SWIFT, RTP, Zelle, and corporate treasury transactions, ensuring high availability and zero data loss.
• Built real-time fraud detection data pipelines, integrating card transactions, login attempts, device fingerprints, and behavioral biometrics with ML scoring engines.
• Automated risk and compliance data workflows supporting AML monitoring, credit risk, market risk, liquidity risk, cybersecurity alerts, and regulatory reporting.
Client: Somerset Savings Bank, NJ Sep 2017– Oct 2018
Position: Associate Software Developer
Responsibilities:
• Modeled customer, account, transaction, and loan data to support cross-product analytics covering checking, savings, CDs, mobile activity, loan performance, and debit-card usage.
• Integrated online and mobile banking data (logins, transactions, bill payments, alerts, mobile deposits) to build real-time behavioral monitoring and digital engagement dashboards.
• Engineered business-banking data pipelines to process ACH files, wire transfers, merchant transactions, and small-business loan applications with automated validation rules.
• Developed lending-data workflows to process loan applications, credit reports, repayment schedules, interest calculations, and delinquency metrics for retail and business credit products.
• Built ATM & debit-card transaction pipelines, normalizing switch logs, fraud-alerts, denied transactions, and surcharge activity for fraud detection and reconciliation.
• Automated KYC/AML checks by integrating internal banking data with third-party sources and applying rule-based validation for high-risk and unusual activity patterns.
• Implemented automated data quality checks for account mismatches, duplicate transactions, loan-balance inconsistencies, and card-network reconciliation issues.
• Developed secure data marts powering online banking dashboards, branch performance reports, loan-risk models, and customer financial-health insights.
• Ensured regulatory compliance (FDIC, FFIEC, AML, KYC) through encrypted pipelines, audit logs, role-based access, anonymization, and automated monitoring.
EDUCATION:
• Bachelor in Electrical and Electronics Engineering from JNTU Kakinada
• Masters in Computer Science from Northwestern Polytechnic University