Data Engineer Senior

Location:

Dayton, OH

Salary:

110000

Posted:

October 15, 2025

Contact this candidate

Resume:

PRANAY B

Senior Data Engineer

**********@*****.*** +1-940-***-****

PROFESSIONAL SUMMARY

Experienced and results-driven Data Engineer with over 5 years of hands-on experience designing, developing, and optimizing scalable ETL pipelines and data solutions across cloud platforms including AWS, Azure, and GCP. Specialized in managing complex data ecosystems in finance, insurance, and healthcare, with a strong focus on data migration, real-time streaming, and cloud modernization.

Proficient in big data technologies such as Hadoop, Spark, Kafka, and Kinesis, with proven success in building high-throughput, low-latency data pipelojines to support both batch and streaming analytics. Skilled in cloud-based data migrations, including transitioning on-premises SQL Server environments to platforms like Amazon Redshift and Snowflake, reducing infrastructure costs while improving performance and scalability.

Hands-on experience in implementing ML pipelines and predictive analytics using TensorFlow, DBT, and Google Cloud AI Platform, enabling business teams to make faster, data-driven decisions. Adept at designing star schema and dimensional data models, optimizing Redshift performance for actuarial and underwriting teams, and boosting reporting speed by 40%.

Strong command over infrastructure as code (IaC) practices using Terraform, container orchestration with Docker and Kubernetes, and building CI/CD pipelines for seamless data deployment and management. Delivered consistent infrastructure provisioning, automated deployment of AWS services (S3, Lambda, Glue), and minimized human errors by 30%.

A collaborative team player with excellent problem-solving and communication skills, committed to delivering high-impact data solutions that align with business goals and regulatory standards. TECHNICAL SKILLS

Programming Languages: Python, SQL, PySpark, Scala, Java, Bash (Shell Scripting) Amazon Web Services

(AWS)

AWS Glue, Lambda, Redshift, S3, Step Functions, EMR, Kinesis, IAM, KMS, Lake Formation, Athena, CloudWatch, Code Pipeline, SageMaker, Glue Catalog

Microsoft Azure Services:

Azure Data Factory (ADF), Synapse Analytics, Azure SQL, Cosmos DB, Databricks, Event Hub, Stream Analytics, DevOps, Purview, AKS (Azure Kubernetes Service)

Google Cloud Platform

Services:

Google BigQuery, Google Dataflow, Pub/Sub, Google Cloud Storage, Google Cloud Dataproc, Google Data Catalog

Methodologies: Agile, Scrum, DevOps, Infrastructure as Code (IaC) ETL & Data Warehousing:

Dimensional Modeling, Change Data Capture (CDC), DBT, AWS Glue, Azure Data Factory, Azure Synapse Analytics, Apache Airflow, Amazon Redshift, Snowflake, Google BigQuery

Databases:

Amazon Redshift, Azure SQL Database, SQL Server, PostgreSQL, MySQL, NoSQL, Cosmos DB, AuroraDB

Big Data:

Apache Spark, PySpark, Hadoop, AWS EMR, Azure Databricks, Delta Lake, Parquet, Apache Airflow

Business Intelligence Tools: PowerBI, Tableau, QuickSight Containerization &

Deployment:

Docker, Kubernetes (AKS, EKS), Jenkins, Git, GitHub Actions, CI/CD, GitHub, AWS CodePipeline

PROFESSIONAL EXPERIENCE

Discover, Senior Data Engineer (Riverwoods, IL) Sep 2023 – Present

Designed and developed scalable ETL/ELT pipelines using AWS Glue and Apache Spark on AWS EMR, processing over 5 TB of structured and unstructured data daily to support real-time loan risk analysis.

Optimized data models in Amazon Redshift and Snowflake, implementing star/snowflake schemas, materialized views, and stored procedures to reduce query execution time by 30% and improve reporting speed for financial datasets.

Implemented serverless querying with Amazon Athena on data stored in S3 (Parquet format), reducing operational costs and accelerating insurance data queries by 40%.

Automated data transformations with DBT, streamlining raw-to-curated data conversion and ensuring consistency and reusability across analytics pipelines.

Improved dashboard performance in Power BI and Tableau by fine-tuning SQL logic and indexing strategies, reducing load times by 25% for real-time reporting.

Developed real-time streaming workflows using AWS Lambda, Glue, and Kinesis, processing 10M+ records/day to deliver dynamic loan risk scores with near-instant insights.

Built and deployed machine learning models using AWS SageMaker, enabling predictive loan risk analytics and integrating model outputs into downstream dashboards and alerts.

Containerized Spark jobs using Docker and Kubernetes, deployed on EMR with CI/CD pipelines through Git, Jenkins, and AWS CodePipeline, increasing processing throughput by 40% and reducing manual deployment overhead.

Secured sensitive financial datasets using AWS IAM (role-based access), AWS KMS (encryption), and Lake Formation (data governance) to ensure regulatory compliance across 100+ datasets.

Implemented Change Data Capture (CDC) strategies using AWS DMS, Lambda, Kinesis, and Kafka, enabling real-time ingestion and synchronization of over 1M records/minute for immediate risk updates.

Automated cloud infrastructure provisioning with AWS CloudFormation, creating repeatable, auditable environments that improved deployment consistency across projects.

Integrated cross-cloud architectures, leveraging AWS for compute/storage and orchestrating real-time sync between systems with CDC pipelines, boosting data flow efficiency by 35%.

Engineered MapReduce workflows using Apache Hadoop, optimizing batch data processing pipelines and reducing processing time by 60% for historical financial transactions.

Reduced cloud processing costs by 25% by configuring cost-optimized EC2 spot instances while maintaining high availability during traffic spikes.

Implemented failover and self-healing mechanisms for real-time data pipelines, ensuring 99.9% uptime and reliability of streaming applications.

Used AWS Migration Hub to coordinate and track hybrid cloud data migrations, minimizing service interruptions and accelerating modernization timelines.

Monitored job health, system metrics, and SLA adherence using AWS CloudWatch and GCP Operations Suite, establishing real-time observability and proactive alerting.

Optimized PySpark workloads on EMR, enhancing performance by 40% and reducing compute resource usage by 15% on high-volume transaction datasets.

Deployed secure and scalable APIs via GCP Cloud CDN, enabling fast, reliable access to curated data products and services for internal stakeholders and applications.

Developed custom User-Defined Functions (UDFs) in Spark for advanced data transformation logic, improving efficiency and modularity of pipeline processes. USAA, Data Engineer (San Antonio, TX) Aug 2022 - Sep 2023

Designed and developed robust ETL pipelines using Azure Data Factory (ADF), Apache Airflow, and SQL Server to automate 100+ daily workflows, ensuring high availability, data integrity, and scalability across diverse structured and semi-structured data sources.

Built optimized data models in Azure SQL Database, Azure Synapse Analytics, Cosmos DB, and SQL Server, managing over 2TB+ of data and applying best practices for normalization, indexing, and partitioning to accelerate analytics workloads.

Integrated on-premise and cloud data sources using SQL Server Integration Services (SSIS), developing complex transformation logic to support compliance and regulatory reporting for healthcare and finance use cases.

Developed secure, modular APIs using FastAPI integrated with Azure Kubernetes Service (AKS), enabling scalable microservice-based architecture and reducing backend integration effort by 40%.

Created multidimensional analytical models using SQL Server Analysis Services (SSAS), supporting advanced data slicing, dicing, and drill-downs for business analysts and financial reporting teams.

Automated report generation and distribution using SQL Server Reporting Services (SSRS), reducing manual reporting time by 70% and increasing stakeholder visibility through operational and executive dashboards.

Performed database development and tuning via SQL Server Management Studio (SSMS), enhancing stored procedure efficiency and optimizing queries in Azure SQL and Synapse Analytics, resulting in a 70% reduction in execution time.

Delivered high-throughput data processing solutions with Azure Databricks, Apache Spark, and Hadoop, cutting batch processing times by 60% while leveraging Apache Kafka and Azure Kinesis for real-time ingestion.

Implemented Role-Based Access Control (RBAC) policies within Azure Active Directory, securing access to over 100 datasets and maintaining compliance with industry standards like HIPAA and SOC 2.

Built containerized API solutions using Docker, FastAPI, and AKS, reducing deployment cycles by 50% and implementing GitHub Actions CI/CD pipelines to ensure seamless version control and release automation.

Maintained data governance and lineage tracking using Azure Purview, improving auditability and regulatory compliance for 50+ pipelines by clearly documenting data flow and transformation logic.

Standardized data transformation logic using DBT, enhancing collaboration between engineering and analytics teams while ensuring consistency and reusability in Azure Synapse and SQL Server environments.

Developed real-time ingestion pipelines using Azure Functions and Apache Kafka to support continuous data flow and near-real-time analytics, processing millions of healthcare records daily.

Deployed low-latency streaming architecture with Azure Event Hubs and Kafka for healthcare applications, enabling cross-cloud data ingestion and improving decision-making with real-time event processing.

HealthPlix, Data Engineer (Hyderabad, India) Apr 2019 - Dec 2021

Designed and built scalable ETL pipelines using AWS Glue and PySpark, processing 5+ TB of structured and semi-structured healthcare data from Amazon S3, AuroraDB, and RDS to support downstream analytics and compliance reporting.

Optimized complex SQL queries in Amazon Aurora and Redshift, applying indexing, partitioning, and query refactoring techniques to reduce query time from 15 minutes to under 3 minutes, significantly improving data throughput and performance.

Built cross-cloud data workflows by integrating AWS Glue with Google BigQuery and Google Dataflow, enabling seamless data movement and advanced analytical processing for large-scale healthcare datasets.

Developed real-time ingestion pipelines using GCP Pub/Sub to capture data from medical IoT devices and healthcare systems, enabling real-time monitoring, alerting, and decision-making.

Implemented a hybrid cloud architecture, orchestrating ETL in AWS and advanced analytics in GCP to optimize cost and performance, ensuring cross-cloud interoperability and secure healthcare data processing.

Reduced data lag by 25% by optimizing real-time streaming pipelines with GCP Pub/Sub, processing over 1M+ records per minute for near-instant insight generation.

Enabled real-time synchronization across 5+ heterogeneous data sources using Change Data Capture

(CDC) techniques, ensuring zero data loss and consistency across transactional systems.

Improved data traceability and compliance by implementing GCP Data Catalog to manage metadata, data lineage, and governance for 50+ enterprise data assets.

Built and maintained real-time dashboards in Looker, powered by BigQuery, delivering live patient insights and operational KPIs to over 50 internal stakeholders.

Automated CI/CD pipelines for data workflows using Google Cloud Build, streamlining deployment of data transformation jobs and reducing manual errors and downtime.

Migrated legacy analytics workloads using GCP Migrate for Compute Engine, modernizing infrastructure and enhancing system uptime and scalability in cloud-native environments.

Engineered live data ingestion pipelines from connected healthcare devices using GCP Pub/Sub, reducing time-to-insight by 40% and enabling real-time care delivery.

Implemented HiveQL-based data transformations within Hadoop and Spark jobs, ensuring compatibility with legacy systems while increasing batch processing speed and efficiency.

Designed robust cross-cloud data pipelines to facilitate data movement between AWS and GCP, enabling seamless integration, transformation, and analysis across environments.

Built a multi-cloud data processing framework using AWS Glue for ingestion and Google Dataflow for enrichment, resulting in enhanced scalability and streamlined workflow orchestration.

Reduced reporting latency and cost by optimizing Amazon Athena queries on S3-stored insurance data, delivering 40% faster insights for actuarial and BI teams. CERTIFICATIONS

AWS Cloud Solutions Architect April 2025

EDUCATION

Campbellsville University

Masters in Computer Science

Contact this candidate