Senior Data Engineer with Cloud & ETL Expertise

Location:

Jersey City, NJ

Posted:

January 16, 2026

Contact this candidate

Resume:

Sravya Rao

New Jersey 201-***-**** ***********@*****.*** https://www.linkedin.com/in/sravya-rao-0092892ab/

Professional Summary

Results-driven Senior Data Engineer with 9+ years of expertise in designing and implementing robust, scalable data solutions across financial services, healthcare, telecommunications, and banking sectors. Proven track record in end-to-end data management, cloud migrations, and building enterprise-grade data pipelines. Expertise in GCP, AWS, and modern data stack technologies with strong focus on performance optimization, real-time processing, and delivering actionable insights for business intelligence.

Certifications

AWS Certified Solutions Architect – Professional

GCP Cloud Engineer - Associate

Professional Experience

Bank of America January 2025 – Present

Senior Data Engineer – Application Developer New Jersey, United States

•Built PySpark pipelines on Databricks to ingest and process recorded audio files (mobile, Teams, Skype, Webex), converting them into MP3 format, applying transformations, and securely transferring grouped recordings into AWS S3.

•Developed and maintained AWS Glue ETL jobs (PySpark and Python shell) for ingesting, transforming, and curating voice-related datasets.

•Integrated Airflow (MWAA) to orchestrate Glue pipelines, manage task dependencies, automate daily refreshes, and monitor end-to-end pipeline health.

•Integrated real-time streaming data for fraud event monitoring using AWS Kinesis and Spark Structured Streaming.

•Designing and maintaining complex SQL queries, stored procedures, and data models to support ETL processes, reporting, and analytics.

•Architecting cloud-native solutions on AWS to modernize legacy data infrastructure, reducing processing time by 40% and operational costs by 30%

•Contributed to platform reliability by designing multi-AZ architectures, data backup strategies, and disaster recovery setups for core data components.

•Developed data transformation scripts and automation workflows using Python for cleansing, validation, and enrichment of datasets.

•Utilizes multiple architectural components (across data, application, business) in design and development of client requirements.

•Implementing data governance by managing Glue Catalog, maintaining metadata consistency, and aligning access policies with Lake Formation security standards.

Centene Corporation September 2023 – January 2025

Senior Data Engineer – AI/ML Engineer St. Louis, Missouri

•Engineered scalable data pipelines on GCP using Dataflow, Dataproc, and BigQuery to process 10+ million healthcare records daily for predictive analytics

•Built feature engineering pipelines using PySpark and Spark SQL to support machine learning models for member risk stratification and cost prediction

•Designed and implemented data lake architecture on GCP Cloud Storage with medallion architecture (bronze, silver, gold layers) for healthcare data

•Developed real-time data ingestion pipelines using Pub/Sub and Cloud Functions to capture and process claims data with sub-second latency

•Created automated data quality frameworks using Great Expectations and custom Python scripts, reducing data anomalies by 60%

•Implemented Snowflake data warehouses for clinical datasets, integrating with GCP storage and Databricks pipelines and enhanced the performance through materialized views, and query optimization, reducing costs by 25

•Built an HL7 v2 ingestion lane and parsed OBR/OBX segments to persist raw + curated layers in Delta.

•Designed secure data ingestion frameworks for streaming HL7 healthcare feeds, enabling HIPAA-compliant data delivery.

•Integrated machine learning pipelines within Databricks for clinical data predictions and anomaly detection using scikit-learn and XGBoost.

•Designed FHIR R4 normalization pipelines to materialize Patient, Encounter, Observation, Condition, Medication Request, Claim/ClaimResponse from HL7 and Epic Clarity, validated using the FHIR Validator as a CI step.

•Used Java MapReduce scripts for ingestion into BigQuery and managed transformations for clinical datasets.

•Monitored and maintained Hadoop cluster connectivity and security with Zookeeper, Hive, and Java tools.

•Managed Hadoop log files using Java-based parsers and established backup/disaster recovery strategies to safeguard data integrity.

•Used Terraform, Cloud Shell SDK, and Java APIs for infrastructure management, automation, and resource provisioning.

•Implemented a secure FHIR service API Management; enforced RBAC, audience-scoped OAuth2, and Key Vault–backed client secrets by enabling audit logging to Log Analytics for HIPAA traceability.

IBM, India

Client: Comcast June 2021 – August 2022

Senior Data Engineer

•Led the migration of OTT data solutions from traditional data centers to GCP, ensuring scalability, reliability, and efficient resource management.

•Developed specifications and comprehensive data solutions using SQL and Java for extraction, transformation, and reporting.

•Developed and optimized ETL pipelines in Snowflake for structured and semi-structured data (100 TB+), integrating linear TV ratings, digital streaming data, and Nielsen audience measurement datasets.

•Leveraged Google Cloud Platform (GCP) for data processing and storage, including Composer (Apache Airflow), Cloud Storage, and BigQuery.

•Created and managed Apache Airflow DAGs in Composer to automate workflows for data ingestion and transformation of OTT content metadata, reducing manual intervention and improving data quality.

•Employed Google Cloud Functions (Java) for data loading and transformation, automated ingestion of CSV/Parquet files from GCS into BigQuery.

•Implemented Snowflake tasks and streams to automate incremental data loading, ensuring real-time analytics alongside BigQuery and Pub/Sub data streams.

•Processed and streamed real-time data from Google Pub/Sub into BigQuery using Cloud Dataflow with Java, enabling continuous analytics of viewership and performance.

•Developed SQL and Java-based scripts to validate data integrity and ensure seamless data flow across multiple applications.

•Configured and utilized Databricks for scalable data transformations and analysis, integrating with GCP and AWS services, including Java UDFs for data processing.

•Implemented monitoring and optimization of Apache Spark jobs within the GCP environment for efficient data processing.

•Managed access and security policies in GCP to maintain data protection, compliance, and role-based access controls.

Accenture, India

Client: Silicon Valley Bank September 2020 – June 2021

Data Engineer / Big Data Solutions Engineer

•Designed and deployed big data solutions using Hadoop, Spark, and HBase to process financial transaction data at scale.

•Developed PySpark applications for data transformation and aggregation, processing 5TB+ of daily transaction data.

•Implemented Apache NiFi for automated data ingestion from multiple banking systems into centralized data lake.

•Built data quality validation framework using Python and SQL, ensuring 99.9% data accuracy for regulatory compliance.

•Optimized Hive queries and Spark jobs through partitioning, caching, and broadcast joins, reducing execution time by 50%

•Built real-time data processing pipelines using Apache Spark, Kafka, and AWS Glue, enabling dynamic and event-driven analytics.

•Integrated AWS Glue and Databricks ETL pipelines with Snowflake, ensuring seamless ingestion of curated datasets.

•Built analytical views and data marts in Snowflake to support Tableau dashboards and business reporting.

•Built and optimized data pipelines within Databricks (Scala), focusing on data cleansing, event enrichment, aggregation, and preparation for reporting and downstream analytics.

•Designed and implemented complex Hive queries to extract and process data from multiple sources into HDFS, enabling efficient transformations and reporting.

•Developed and optimized SQL queries for data extraction, transformation, aggregation, and reporting across HDFS, Hive, and cloud data stores.

•Developed analytics dashboards in Databricks and Tableau, enabling stakeholders to visualize KPIs and operational metrics.

•Managed and optimized Hadoop clusters, ensuring data security, integrity, and compliance with governance standards.

Accenture, India

Client: HSBC Bank May 2019 – September 2020

AWS Data Engineer / Hadoop Developer India

•Migrated on-premises data warehouse to AWS Redshift, managing 20TB+ of financial data with zero downtime using Talend jobs.

•Developed scalable ETL pipelines using AWS Glue, Lambda, and Step Functions for automated data processing workflows.

•Built real-time data streaming solutions using AWS Kinesis to process banking transactions for risk management.

•Implemented data security and encryption standards using AWS KMS and IAM for compliance with financial regulations.

•Optimized S3 data lake structure with intelligent tiering and lifecycle policies, reducing storage costs by 40%

•signed and implemented Oozie workflow engines for scheduling multiple Hive and Pig jobs.

•Worked with different file formats and converted Hive/SQL queries into Spark transformations.

•Handled data in batches through ETL processes using Talend and Unix shell scripting.

•Worked with on-premises MongoDB database and non-relational databases, implemented using SSMS with T-SQL.

•Automated data refreshes in Power BI and scheduled and maintained SQL Server Agent jobs for Talend Orchestration.

•Created automated data reconciliation processes using Python and SQL, ensuring data integrity across systems

Sutherland Global Services LLC May 2016 – April 2019 AWS Data Engineer India

•Designed and implemented end-to-end ETL pipelines using AWS services (S3, Glue, EMR) to process customer collection data.

•Built Hadoop-based data processing frameworks using MapReduce and Hive for large-scale batch analytics and operated in a cluster consisting of 105 nodes.

•Developed Python and Shell scripts for data extraction, transformation, and loading from multiple source systems

•Implemented data warehousing solutions on AWS Redshift with star schema design for optimized query performance

•Developed four executive dashboards in Power BI to analyze Key Performance Indicators (KPIs) such as Account Receivable, Overdue Receivable, and disputes across multiple regions and Lines of Business (LOBs) enhancing collections team performance.

•Engaged in real-time data processing using Kafka, Spark Streaming, and Spark Structured Streaming.

•Managed large datasets by utilizing partitions, Spark In-Memory capabilities, Broadcast variables in Spark, and efficient joins and transformations during the ingestion process.

•Utilized AWS Glue for schema extraction, ensuring compatibility with Parquet and Avro file formats in Hive.

•Oversaw the import of data from various sources, performed transformations using Hive and MapReduce, loaded data into HDFS, and extracted data from MySQL into HDFS using Sqoop.

•Utilized Hive to create tables, load data from the local file system to HDFS, and perform transformations, event joins, and pre-aggregations before storing the data to HDFS.

•Created automated reporting solutions using SQL and Python, reducing manual reporting effort by 70%

•Established data backup and disaster recovery procedures on AWS ensuring business continuity

Technical Skills

Programming: Python, SQL, Scala, Shell Scripting, PySpark

Data Warehousing: BigQuery, Snowflake, Amazon Redshift, Teradata

Big Data Technologies: Apache Spark, Hadoop, HDFS, Apache Kafka, PySpark, Spark SQL

ETL/ELT Tools: Apache Airflow, Informatica PowerCenter, Matillion, Talend, Apache NiFi

Databases: SQL Server, PostgreSQL, MySQL, Hive, HBase, MongoDB, Cassandra

Data Modeling: Star Schema, Snowflake Schema, Data Vault, Dimensional Modeling

Orchestration: Apache Airflow, Cloud Composer, Oozie

Streaming: Apache Kafka, AWS Kinesis, GCP Pub/Sub

Version Control: Git, GitHub, Bitbucket Visualization: Tableau, Power BI, Looker Methodologies: Agile, Scrum, CI/CD, DevOps practices

Cloud Platforms: Google Cloud Platform (GCP), Amazon Web Services (AWS)

GCP Services: Dataflow, Dataproc, Pub/Sub, Cloud Storage, Cloud Functions, Cloud Composer AWS Services: EMR, S3, Glue, Lambda, Redshift, Kinesis, Athena, EC2, IAM, AWS Security

Education

DePaul University Chicago, Illinois

Master of Science in Information Technology (Nov 2023)

Acharya Nagarjuna University India

B.Sc. in Computer Science (May 2016)

Contact this candidate