Senior Data Engineer

Location:

United States

Posted:

September 22, 2025

Contact this candidate

Resume:

Name: Anjala Masadi

Senior Data Engineer

Contact: +1-571-***-****

Email: ****************@*****.***

Professional summary

Accomplished Senior Data Engineer with 12+ years of experience designing and optimizing enterprise-scale data platforms across AWS, Azure, and GCP. Skilled in Python, Pandas, Apache Spark, Kafka, BigQuery, Amazon Redshift, Snowflake, and Data Vault/Dimensional Modeling, with strong expertise in ETL/ELT orchestration using AWS Glue, Cloud Dataflow, Apache Airflow, Apache NiFi, and Databricks. Proficient in CI/CD, Terraform, Docker, and Kubernetes, with proven ability to embed compliance standards (PCI-DSS, SOX, HIPAA, Basel III/IV, GDPR) into secure, governed, and high-performance data ecosystems.

Proficient in Scrum and Kanban, with expertise in using Azure Boards to drive Agile-based project execution while embedding compliance requirements from PCI-DSS, SOX, GLBA, HIPAA, Basel III/IV, and GDPR.

Expert in Python programming, applying Pandas and R for data analysis, and incorporating testing frameworks like Pytest and unittest to ensure accuracy and reliability of pipelines.

Specialized in large-scale ETL/ELT orchestration using Cloud Dataflow, Apache Beam, AWS Glue (Studio, Workflows, Crawlers), Azure Data Factory, Mapping Data Flows, Apache NiFi, Apache Sqoop, and Informatica PowerCenter.

Strong knowledge of relational and cloud-native databases including PostgreSQL, Cloud SQL, Amazon RDS, SQL Server, MySQL, Amazon Aurora, Azure SQL Database, and Azure Cosmos DB.

Adept at advanced modeling approaches such as Data Vault, Dimensional Modeling, and 3NF, supported by seamless integration through PyODBC and ODBC connectors.

Experienced in developing and exposing APIs using FastAPI and Flask, secured with API Gateway, Azure API Management, and Amazon Cognito for authentication and authorization.

Hands-on expertise with modern compute platforms including Cloud Run, AWS Lambda, AWS Step Functions, Amazon ECS, Fargate, Amazon EKS, GKE, AKS, and Docker for containerization and serverless processing.

Advanced knowledge of cloud analytics platforms such as BigQuery, Amazon Redshift, Redshift Spectrum, Amazon Athena, and Azure Synapse Analytics, with experience implementing Delta Lake in Azure Databricks environments.

Skilled in big data processing using HDFS, Dataproc, Amazon EMR, EMR Serverless, HDInsight, and Apache Spark, including Spark Structured Streaming for real-time workloads.

Strong exposure to event-driven and streaming technologies like Kafka, Amazon MSK, Amazon Kinesis Data Streams, Kinesis Data Analytics, Azure Event Hubs, and Google Pub/Sub.

Proficient in workflow management using DAG, Cloud Composer, Airflow, Apache Oozie, and Integration Runtimes to orchestrate multi-cloud pipelines.

Experienced in enterprise reporting and BI using Looker, Amazon QuickSight, Tableau, QlikView, Power BI, and Crystal Reports for business insights.

Advanced skills in ML platforms including Vertex AI, Azure Machine Learning, Amazon SageMaker, SageMaker Model Monitor, and LLMs, incorporating MLOps into data workflows.

Adept at observability and monitoring through Prometheus, Grafana, Cloud Monitoring, Cloud Logging, Amazon CloudWatch, AWS X-Ray, AWS CloudTrail, Azure Monitor, Log Analytics, and Application Insights.

Specialized in data governance and cataloging with Google Data Catalog, Azure Purview, AWS Glue Data Catalog, and IBM Guardium to ensure compliance and metadata control.

Skilled in infrastructure automation with Terraform, AWS CloudFormation, Azure ARM Templates, and Bicep, combined with security practices in VPC, IAM, GCP IAM, RBAC, Security Groups, Subnets, KMS, AWS KMS, Azure Key Vault, and Customer Managed Keys (CMK).

Experienced with CI/CD integration and DevOps pipelines using GitHub, GitHub Actions, Jenkins, AWS CodePipeline, CodeBuild, CodeDeploy, CodeCommit, Azure DevOps, and SonarQube.

Familiar with additional big data and transformation tools such as Pig, U-SQL in Azure Data Lake Analytics, and Alteryx for specialized workflows.

Recognized for combining technical depth with leadership, enabling effective delivery of Agile and Compliance Projects, mentoring teams, and optimizing solutions across multi-cloud data engineering platforms.

Certification

Google Cloud Certified: Professional Data Engineer.

AWS Certified Data Engineer-Associate.

Microsoft Certified: Azure Data Engineer- Associate.

Professional Experience

Fifth Third Bank Washington, D.C

April 2022 to Present

Senior Data Engineer

Project Description:

At Fifth Third Bank, I modernized risk and fraud detection platforms on Google Cloud Platform (GCP) by engineering scalable ETL/ELT pipelines with Python, Pandas, Cloud Dataflow (Apache Beam), and Dataproc (Apache Spark). I designed relational and analytical models in PostgreSQL, Cloud SQL, and BigQuery using Data Vault and Dimensional Modeling, ensuring auditability and compliance with PCI-DSS, SOX, and GLBA. The platform integrated Google Pub/Sub, Kafka, Vertex AI, and Looker, with Docker, Cloud Run, GKE, Terraform, Jenkins, and Data Catalog enabling real-time fraud detection, governance, and audit readiness.

Responsibilities:

Directed Agile delivery using Scrum, aligning engineering, finance, and compliance teams to deliver enterprise-grade data platforms that adhered to PCI-DSS, SOX, and GLBA standards, while ensuring sprint deliverables matched evolving regulatory and business needs.

Engineered scalable ETL/ELT pipelines using Python, Pandas, and Cloud Dataflow (Apache Beam), processing transactions, loan applications, and payment events at scale, and optimizing for near real-time credit and risk analytics.

Designed and optimized relational schemas in PostgreSQL on Cloud SQL, implementing 3NF, Star Schema, and Dimensional Models, and applied Data Vault modeling for lineage tracking and audit-ready financial data structures.

Enabled secure hybrid integration by bridging on-premises systems with Google Cloud via PyODBC connectors, enforcing encryption, tokenization, and governance-aligned data contracts.

Developed and exposed secure RESTful APIs using FastAPI on Cloud Run, publishing customer profiles, risk metrics, and transaction histories for fraud detection dashboards and partner integrations.

Structured datasets in PostgreSQL and Cloud SQL to support delinquency prediction, portfolio analysis, and customer segmentation, applying surrogate keys and partition strategies for performance tuning.

Leveraged BigQuery for historical and streaming transaction analysis, improving fraud detection and credit approval workflows through targeted anomaly detection and ML-based alerts.

Streamlined ingestion pipelines using Cloud Dataflow (Apache Beam) to process dynamic, schema-evolving third-party data feeds, integrating change data capture (CDC) patterns for resiliency.

Modernized legacy data platforms by migrating from HDFS to Google Cloud Storage (GCS) and Dataproc (Apache Spark), enabling scalable big data transformations and reducing operational overhead.

Delivered high-throughput data processing frameworks on Cloud Dataflow and Apache Spark (Dataproc), optimizing DAG execution, caching, and shuffle strategies to enhance fraud alert precision and latency.

Implemented real-time streaming with Google Pub/Sub and Kafka for automated anomaly detection, over-limit monitoring, and fraud rule enforcement, scaling throughput with topic partitions and consumer groups.

Built executive dashboards in Looker, embedding drill-down reports on loan growth, delinquency trends, fraud KPIs, and compliance thresholds, aligned with Basel III/IV stress testing metrics.

Redeveloped risk-scoring models using Apache Beam and Dataproc Spark pipelines, improving default prediction accuracy and enabling regulatory stress simulations.

Automated orchestration of ETL pipelines with Cloud Composer (Airflow), applying task retries, SLA monitoring, and dependency chaining for mission-critical compliance workflows.

Operational zed ML models in Vertex AI, deploying fraud detection, churn prediction, and credit scoring models, and integrated prompt engineering with LLMs for unstructured data insights from contracts, transcripts, and audit notes.

Provisioned secure infrastructure with Terraform, GCP IAM, VPC, and KMS, embedding secrets management, network segmentation, and encryption-at-rest to safeguard customer and payment datasets.

Containerized lightweight ETL and API microservices using Docker, deployed to Cloud Run and GKE, enabling horizontal scaling, blue/green rollouts, and fault-tolerant reconciliation jobs.

Established CI/CD frameworks with GitHub Actions and Jenkins, integrating automated pytest and unittest test suites, code linting, and validation gates for reliable pipeline releases.

Integrated Cloud Monitoring, Cloud Logging, and Prometheus with pipelines and ML workflows, building observability dashboards for anomaly trends, error tracking, SLA compliance, and proactive incident management.

Administered data lakes and warehouses on BigQuery and GCS, implementing tiered storage policies, fine-grained ACLs, and metadata management with Google Data Catalog for audit-ready governance.

Retired monolithic ETL systems by rebuilding pipelines with Apache Beam on GCS, Pig, and Dataproc Spark, increasing throughput, compliance traceability, and long-term maintainability.

Cigna, Blue Ridge, GA

Senior Data Engineer

Feb 2021- Mar 2022

Project Description:

At Cigna, I engineered scalable ETL/ELT pipelines using Python, AWS Glue, DMS, and Glue Workflows, consolidating patient, encounter, and claims data into Amazon RDS, Aurora, Redshift, and Athena while applying Data Vault and Dimensional Modeling for compliance with HIPAA, SOX, and PCI-DSS. I enabled real-time streaming with Amazon Kinesis and MSK (Kafka), and built secure microservices using AWS Lambda, API Gateway, and Cognito to power claims validation, eligibility checks, and provider dashboards. The platform was strengthened with Terraform, CloudFormation, SageMaker, QuickSight, Lake Formation, GitHub, Jenkins, CloudWatch, Prometheus, and Grafana, improving anomaly detection, predictive healthcare analytics, and audit-ready governance.

Responsibilities:

Spearheaded Agile workflows using Scrum to streamline data delivery across provider networks, EMRs, and claims platforms, accelerating the timely release of analytical insights across regional centers while ensuring compliance with HIPAA, SOX, and PCI-DSS mandates.

Designed scalable ingestion and transformation frameworks in Python, embedding Pytest and unittest for automated quality checks across EHR, lab, and claims pipelines, minimizing data drift and ensuring accuracy of clinical and operational feeds.

Modeled healthcare datasets using Data Vault, Dimensional Modeling, and 3NF techniques in Amazon RDS (SQL Server, PostgreSQL, MySQL) and Amazon Aurora, consolidating patient histories, encounters, and care events into performant query layers for analytics.

Enabled integration of decentralized clinical sources via AWS Glue connectors and AWS Data Migration Service (DMS), streamlining ingestion from 300+ EMRs and laboratory systems with schema validation and lineage enforcement.

Developed secure, tokenized microservices with AWS Lambda, API Gateway, and Amazon Cognito, exposing APIs for eligibility checks, claims validation, and care dashboards used by providers and payer systems.

Managed large-scale analytics marts in Amazon Redshift, integrating Redshift Spectrum and Amazon Athena to accelerate exploratory queries and support population health, readmission, and utilization KPIs.

Automated billing, pharmacy, and clinical note ingestion with AWS Glue Studio and AWS Glue Workflows, reducing manual intervention and standardizing data quality across hundreds of feeds.

Streamlined real-time patient event capture with Amazon Kinesis Data Streams and Amazon MSK (Kafka), integrating anomaly detection pipelines for alerts on adverse events, denied claims, and utilization anomalies.

Delivered operational and clinical dashboards in Amazon QuickSight, secured with AWS Lake Formation, while leveraging Prometheus and Grafana to build observability layers for real-time clinical system monitoring.

Executed high-volume Apache Spark workloads on Amazon EMR and EMR Serverless, generating cohorts, analyzing episodes of care, and enabling predictive analytics for follow-up and readmission risk.

Coordinated multi-stage workflows with AWS Step Functions, Amazon EventBridge, and Apache Airflow, aligning claims, eligibility, and compliance pipelines with SLA tracking and failure recovery.

Operationalized predictive ML models in Amazon SageMaker, monitored with SageMaker Model Monitor, producing risk scores, no-show predictions, and care pathway optimizations integrated into underwriting and provider systems.

Deployed containerized ETL and API workloads with Amazon ECS (Fargate) and Amazon EKS (Kubernetes), implementing Git-driven CI/CD practices to support versioning, code reviews, and collaborative delivery.

Provisioned HIPAA-compliant infrastructure with Terraform and AWS CloudFormation, embedding VPC isolation, IAM governance, and encryption with AWS KMS to meet strict regulatory controls.

Implemented event-driven notifications via Amazon SNS and queued workflows using Amazon SQS, delivering timely escalations to clinicians and compliance teams on urgent patient and claims events.

Administered metadata lineage with AWS Glue Data Catalog and AWS Lake Formation, ensuring schema traceability, governance, and readiness for internal and external compliance audits.

Hardened security posture with AWS Secrets Manager, IAM roles, and KMS, preventing credential leakage and ensuring encryption-at-rest/in-transit for sensitive PHI and claims data.

Established CI/CD automation with AWS CodePipeline, CodeBuild, CodeDeploy, CodeCommit, GitHub, and Jenkins, enabling rollback strategies, unit test coverage, and streamlined production releases.

Monitored enterprise pipelines and applications with Amazon CloudWatch, AWS X-Ray, AWS CloudTrail, alongside Prometheus and Grafana, providing full-stack observability, SLA monitoring, and compliance tracking.

Federal Financial Topeka, KS Louisville, Kentucky

Senior Data Engineer

Nov 2018 – Jan 2021

Project Description:

At Federal Financial, I engineered Python ETL pipelines with Azure Data Factory and modeled data in Azure SQL, Cosmos DB, and Data Lake Gen2 using Data Vault and Dimensional Modeling under SOX, Basel III, HIPAA, and PCI-DSS compliance. I migrated legacy workloads to Azure Databricks Delta Lake, enabled real-time analytics with Event Hubs/Stream Analytics, and deployed ML models in Azure Machine Learning with MLOps in Azure DevOps. The platform was secured with Azure Purview, Key Vault, ARM/Bicep, automated through CI/CD (DevOps, GitHub Actions, Jenkins), and visualized in Power BI with full observability via Azure Monitor and Grafana.

Responsibilities:

Directed SDLC initiatives with Kanban on Azure Boards, aligning legacy modernization with SOX, Basel III, HIPAA, and PCI-DSS compliance cycles to enable coordinated delivery across risk, data, and IT teams.

Engineered resilient ETL frameworks in Python with Azure Data Factory Mapping Data Flows, embedding Pytest and unittest validations to ensure anomaly detection and high data quality for loan origination, transactions, KYC, and account logs.

Modeled and optimized schemas using Data Vault, Dimensional Modeling, and 3NF techniques in Azure SQL Database and Azure Cosmos DB, accelerating regulatory reporting, credit scoring, and KPI tracking across banking workflows.

Integrated data from core banking, CRM, and third-party providers via ODBC connectors into Azure Data Lake Storage Gen2, enabling secure ingestion of millions of customer profiles and transactions with lineage traceability.

Deployed secure RESTful APIs through Azure API Management, protected with Azure Active Directory (AAD), OAuth2, and Managed Identities, exposing fraud metrics and risk indicators for downstream compliance tools.

Constructed high-volume financial marts in Azure Synapse Analytics, leveraging PolyBase and Serverless SQL Pools to deliver profitability, capital adequacy, and portfolio insights across global regions.

Automated ingestion and reconciliation pipelines with Azure Data Factory Event Triggers and Integration Runtimes, processing incremental data feeds from credit bureaus, payment gateways, and regulatory reporting agencies.

Migrated legacy ETL workloads from Pig and HDInsight into Azure Databricks Delta Lake, ensuring ACID compliance, scalability, and high-performance financial extracts for large-scale datasets.

Enabled real-time streaming of transactions with Azure Event Hubs and Azure Stream Analytics, providing fraud alerts, anomaly detection, and compliance-driven triggers for settlement workflows.

Delivered executive dashboards in Power BI, integrated with Azure Analysis Services, surfacing KPIs on loan defaults, portfolio churn, SLA breaches, and operational risk to senior executives.

Leveraged Azure HDInsight and Azure Data Lake Analytics (U-SQL) to conduct historical trend analysis, portfolio segmentation, and reconciliation of long-term risk and profitability data.

Orchestrated end-to-end pipelines with Apache Airflow on AKS and Azure Data Factory, enforcing SLAs and dependency chains for reconciliations, stress tests, and regulatory filings.

Operationalized ML pipelines through Azure Machine Learning with MLOps in Azure DevOps, deploying fraud detection and churn prediction models with automated retraining and monitoring cycles.

Deployed containerized microservices using Docker on AKS, secured with Azure Application Gateway (WAF), enabling scalable and protected workloads for transaction scoring and compliance checks.

Provisioned infrastructure with Azure ARM Templates and Bicep, embedding governance, security, and audit policies into deployments across Dev/QA/Prod environments.

Implemented CI/CD workflows with Azure DevOps Pipelines, GitHub Actions, Jenkins, and SonarQube, integrating unit and regression tests to enforce coding, security, and compliance standards.

Optimized compute and operational costs using Azure Virtual Machines, VM Scale Sets, and Azure Cost Management/Advisor, aligning infrastructure spend with enterprise financial governance.

Secured sensitive PII using Azure Key Vault, Customer Managed Keys (CMK), Azure Confidential Computing, and RBAC, enforcing encryption-in-transit and role-based access for regulatory compliance.

Established governance with Azure Purview and Azure Data Catalog, integrating lineage, metadata, and business glossaries, while monitoring pipeline health with Azure Monitor, Log Analytics, and Application Insights to ensure SLA adherence and audit readiness.

State of Kentucky KY

Data Engineer

July 2015- Sep 2018

Project Description:

At the State of Kentucky, I engineered ETL pipelines using Python, Informatica, and Apache NiFi to integrate legacy and healthcare data into SQL Server, Oracle, RDS, and Redshift with governance via AWS Glue Catalog and Informatica MDM. I enabled secure data exchange through APIs, ODBC, and SFTP, developed analytical models in Redshift and SSAS, and delivered dashboards with Tableau and Power BI under HIPAA and PCI-DSS compliance. The platform was automated with Terraform and CloudFormation, monitored with CloudWatch/CloudTrail, and secured with IAM and KMS, improving compliance tracking and decision-making.

Responsibilities:

Led subscriber onboarding and provisioning pipelines with Apache NiFi, integrated into AWS Glue and AWS DMS, embedding Pytest/unittest validations to ensure high data quality and accelerating insight delivery during promotional launches and SIM activation cycles.

Architected scalable batch workflows using Apache Spark on Amazon EMR, processing large volumes of CDRs, device telemetry, and session logs into structured datasets designed with Data Vault and Dimensional Modeling for regulatory compliance and SLA-driven analytics.

Developed and deployed secure RESTful APIs with Flask, exposed via AWS API Gateway and AWS Lambda, enabling internal OSS, CRM, and provisioning systems to access real-time SIM lifecycle and subscriber usage metadata.

Synchronized subscriber data across CRM, OSS, and provisioning platforms using Apache Sqoop with Amazon RDS and Aurora PostgreSQL, ensuring entitlement accuracy, timely plan updates, and reconciliation of provisioning workflows.

Designed and optimized analytical marts in Amazon Redshift, incorporating dimensional schemas and fine-tuned queries through Amazon Athena, delivering real-time insights into dropped calls, throughput bottlenecks, and service availability trends.

Consolidated multi-source telecom feeds via Informatica into an Amazon S3 Data Lake, while cataloging and governing metadata with AWS Glue Data Catalog, enabling tower management, billing, and interconnect provider analytics for fraud detection and compliance.

Established real-time alerting pipelines using Apache Kafka on Amazon MSK and Amazon Kinesis Data Streams, configured with compliance triggers to detect SIM swaps, outages, and degraded service events, supporting PCI-DSS and SOX-aligned monitoring.

Enhanced streaming analytics by tuning Spark Structured Streaming on EMR with Kinesis Data Analytics, applying time-series and anomaly detection models to strengthen subscriber segmentation, fraud detection, and churn prediction.

Automated infrastructure provisioning using Terraform and AWS CloudFormation, deploying secure environments with IAM, VPC, Security Groups, Subnets, and EC2, embedding audit policies for GDPR and PCI-DSS adherence.

Managed compute and storage workloads on Amazon EC2, S3, EFS, and FSx, ensuring high availability for analytics pipelines and shared data zones, while applying lifecycle policies for cost control.

Delivered KPI-rich dashboards in Tableau, integrated with Amazon Redshift, Athena, and QuickSight, visualizing metrics such as ARPU, SLA adherence, utilization heatmaps, and fraud detection trends for executives and regulators.

Streamlined release cycles by implementing Jenkins with AWS CodePipeline and CodeBuild, embedding regression test suites, SonarQube scans, and automated rollbacks to support continuous delivery of 20+ telecom services.

Re-engineered legacy batch jobs from Apache Oozie into AWS Step Functions, orchestrating contract refreshes, pricing updates, and cross-carrier reconciliations while improving transparency and audit traceability.

Migrated and containerized workloads with Docker, deploying microservices on Amazon ECS with Fargate and Amazon EKS, delivering highly available provisioning APIs and real-time subscriber identity resolution services.

Monitored full-stack workloads with Amazon CloudWatch, CloudTrail, and AWS X-Ray, while extending observability with custom dashboards in Prometheus and Grafana to track SLA adherence and root-cause analytics.

Secured subscriber data through AWS KMS, Secrets Manager, and IAM, implementing credential rotation, encryption in transit/at rest, and role-based access controls aligned with PCI-DSS and GDPR standards.

Optimized cloud expenditures with AWS Cost Explorer and Trusted Advisor, aligning compute, storage, and network spend with enterprise governance frameworks and reducing operational costs across telecom workloads.

Clarify Health Solutions San Francisco, CA

ETL Developer

Dec 2012- Jun 2015

Responsibilities:

Automated complex data transformation workflows using Alteryx, significantly reducing manual intervention and improving processing accuracy across cross-functional business use cases.

Performed exploratory data analysis with Pandas, uncovering trends and correlations from diverse datasets related to sales, finance and operational performance.

Built statistical models and segmentation frameworks using R, supporting strategic planning through variance forecasting and customer behavior analysis.

Designed and published interactive dashboards using QlikView, enabling near real-time KPI tracking for leadership across departments.

Delivered executive-level visualizations with Tableau, streamlining performance reporting and supporting high-impact business decisions.

Produced detailed financial and compliance reports using Crystal Reports, aligning outputs with regulatory standards and corporate policy requirements.

Tuned advanced SQL queries in MySQL, reducing data retrieval times and enhancing the responsiveness of reporting systems.

Developed dynamic, user-friendly UI components with JavaScript, enabling interactive filters and embedded analytics within internal dashboards.

Automated routine ETL operations using Python, improving script modularity, maintainability and execution efficiency across daily reporting cycles.

Contributed to the design and deployment of enterprise-grade pipelines using Informatica PowerCenter, supporting validated and timely data delivery to downstream systems.

Applied data access policies and compliance controls via IBM Guardium, conducting regular audits and producing detailed reports on sensitive data usage.

Centralized documentation management using Microsoft SharePoint, ensuring consistent access to ETL definitions, SQL scripts and business metadata.

Tracked project milestones, deliverables and cross-functional dependencies using Microsoft Project, maintaining alignment with business timelines.

Facilitated Agile-based delivery using Trello, coordinating backlog grooming, sprint tracking and cross-team collaboration for ongoing development.

Contact this candidate