Senior Data Engineer with Cloud & Spark Expertise

Location:

Valentine, MO, 64111

Salary:

85000

Posted:

April 30, 2026

Contact this candidate

Resume:

ArunKumar G

Email: *****.******@*****.***

Mobile: 913-***-****

LinkedIn: www.linkedin.com/in/arunkumar-gaddala-73347a185 Senior Data Engineer

PROFESSIONAL SUMMARY

Data Engineer with 5 years of experience designing and delivering scalable data pipelines and data platforms across cloud and distributed environments, supporting analytics, reporting, and real-time use cases.

Strong expertise in building batch and streaming data solutions using Python, SQL, Apache Spark, and Kafka, with a focus on reliable data ingestion, transformation, and orchestration.

Hands-on experience with cloud ecosystems including AWS, Azure, and Snowflake, developing modern data architectures such as data lakes, lakehouses, and dimensional data warehouses.

Proficient in implementing data modeling techniques, ETL/ELT frameworks, and data quality practices using tools like dbt, Airflow, and Great Expectations to ensure consistency, governance, and usability of data.

Collaborative team player experienced in working with cross-functional stakeholders to translate business requirements into technical solutions, with a strong focus on performance optimization, data security, and compliance.

Facilitated team meetings leveraging excellent written oral communication skills, enhancing collaboration and project outcomes.

Implemented innovative solutions with passion automation continual process improvement, boosting operational efficiency by 20%.

Enhanced software development processes by implementing Perl within Agile methodology, resulting in a 30% increase in project delivery speed and improved team collaboration. TECHNICAL SKILLS

Programming & Querying - Python (Pandas, PySpark), SQL (Advanced Joins, CTEs, Window Functions), Scala, Shell Scripting, Perl

Big Data & Streaming - Apache Spark, PySpark, Spark SQL, Apache Kafka, Spark Structured Streaming, Hadoop

Cloud Platforms - AWS (S3, EMR, Redshift, Lambda, Glue, MWAA), Azure (Data Factory, ADLS Gen2, Synapse Analytics, Blob Storage), GCP (BigQuery)

Data Warehousing & Storage - Snowflake, Amazon Redshift, Azure Synapse Analytics, Google BigQuery, SQL Server, Oracle Exadata

Data Modeling & Architecture - Dimensional Modeling, Star & Snowflake Schema, Kimball Methodology, Data Lake, Lakehouse, Lambda Architecture

Data Transformation Tools - dbt (Data Build Tool), Informatica IICS (CDI, CAI), SSIS, Talend

Orchestration & Scheduling - Apache Airflow (MWAA), Azure Data Factory Pipelines, Job Scheduling, SLA Monitoring

Data Quality, Governance & Monitoring - Great Expectations, Apache Atlas, Data Validation, Data Lineage, Metadata Management, AWS CloudWatch, DataDog

Databases - PostgreSQL, MySQL, Oracle, SQL Server, MongoDB, Cassandra, HBase, relational databases

API & Integration - REST APIs, SOAP APIs, Microservices-based Integration

Security & Compliance - HIPAA Compliance, PII Masking, Data Encryption (AES-256), Access Control

DevOps & CI/CD - Git, Docker, Kubernetes, Jenkins, Terraform, Azure DevOps, CI/CD Pipelines

Performance Optimization - Query Optimization, Partitioning, Clustering Keys, Indexing, Workload Management

Visualization & BI - Power BI, Tableau

System Administration & Infrastructure - Linux environment setup, Unix file systems, mount types, permissions, standard tools, pipes

PROFESSIONAL EXPERIENCE

JPMorgan Chase & Co January 2025 – Present

Data Engineer

Built a real-time data ingestion pipeline using Apache Kafka and PySpark Structured Streaming to process 15M+ daily transactions, enabling near real-time fraud signal detection with sub-second latency.

Designed a scalable Lambda architecture on AWS (S3, EMR, Redshift), reducing data availability lag for risk dashboards from 4 hours to under 10 minutes.

Developed reusable PySpark-based transformation frameworks with parameterized configurations, cutting development time for new pipelines by 40% and improving consistency across teams.

Implemented automated schema management using AWS Glue Data Catalog, allowing downstream systems to handle schema changes seamlessly and eliminating recurring manual fixes.

Integrated data quality checks using Great Expectations within Airflow workflows, enforcing 25+ validation rules to ensure accuracy before loading data into Redshift.

Partnered with compliance and reporting teams to build CCAR and Basel III-aligned data marts in Redshift, supporting riskweighted asset calculations with full data lineage and auditability.

Managed pipeline orchestration using Apache Airflow (MWAA), with SLA monitoring, retry mechanisms, and alerting, maintaining

99.6% SLA adherence.

Improved Redshift performance for reporting workloads by tuning distribution/sort keys and using materialized views, reducing report runtime from 22 minutes to under 4 minutes.

Enhanced system reliability by optimizing Linux-based processes and implementing system/architecture improvements using Perl, resulting in a 20% reduction in latency.

Streamlined deployment by configuring Linux environment setup and managing Unix file systems with orchestration tools, achieving a 30% increase in scalability.

Improved data integrity by optimizing Oracle Exadata and managing relational databases, enhancing data flows and reducing error rates by 15%.

Cognizant (Client: Anthem, Inc.) August 2021 – December 2023 Data Engineer

Designed a Snowflake-based Member 360 data warehouse, integrating claims, eligibility, pharmacy, and provider data from OLTP systems to deliver a unified view for 40M+ members.

Built and maintained dbt pipelines using a layered architecture (staging, intermediate, mart), developing 30+ modular SQL models with testing and automated documentation.

Built incremental dbt models leveraging Snowflake MERGE strategy for large-scale claims datasets (200M+ rows), reducing transformation runtime by 65% while ensuring data accuracy and consistency.

Engineered HIPAA-compliant ETL pipelines using Python, Apache Airflow, and Azure Data Factory to ingest HL7 FHIR-based eligibility data, implementing PII masking and AES-256 encryption for secure PHI handling before loading into Snowflake.

Implemented data recovery and retention strategies using Snowflake Time Travel and Fail-safe, along with Azure Blob Storage backups, reducing data incident resolution time from 6 hours to under 45 minutes.

Designed a Star Schema data model (FactClaims, DimProvider, DimMember, DimDate) for provider performance analytics, enabling Power BI dashboards used by 200+ clinical and operations stakeholders.

Developed data reconciliation frameworks using Python and Azure DevOps, validating post-load data (counts, aggregates, rejections) and automating exception reporting, improving data quality audit scores by 30%.

Optimized Snowflake performance by analyzing query profiles, implementing clustering keys, and configuring warehouse autosuspend/resume policies, reducing compute costs by 28% monthly.

Automated data handling by developing load/extract processes with scripts, leading to a 25% increase in processing speed.

Standardized system operations by managing mount types with standard tools, enhancing system uptime by 10%. SoftAge Group January 2020 – June 2021

ETL Developer

Designed and implemented ETL pipelines using Informatica Intelligent Cloud Services (IICS – CDI & CAI), integrating data from Teradata, PeopleSoft, and flat files into Oracle HCM Cloud and SQL Server for HR, payroll, and workforce analytics.

Developed scalable integration workflows using REST and SOAP APIs, along with parameterized mappings and reusable taskflows, reducing manual intervention by 35% and improving batch processing efficiency.

Optimized complex SQL queries and transformation logic to handle high-volume datasets, improving job performance, data throughput, and overall system reliability in production.

Built and maintained SQL Server database objects including tables, views, stored procedures, triggers, and indexes to support reporting, reconciliation, and downstream data consumption.

Implemented data validation and reconciliation frameworks across SIT, UAT, and production environments, ensuring data accuracy, audit readiness, and smooth release migrations.

Partnered with business and functional teams to translate HR and finance requirements into technical design documents (HLD/LLD), aligning integration solutions with enterprise data standards.

Provided production support and troubleshooting, performing root cause analysis for batch and real-time integrations, reducing SLA breaches and improving system stability.

Optimized resource allocation by coordinating jobs and processes, resulting in a 20% improvement in system efficiency.

Enhanced system security by managing permissions and utilizing pipes, reducing unauthorized access incidents by 30%.

CERTIFICATIONS

AWS Certified Solutions Architect – Associate

DP-700 - Microsoft Fabric Data Engineer Associate EDUCATION

Master's in Computer Science - University of Central Missouri

Bachelor’s in Electronics and Communication Engineering - Sreenidhi Institute of Science and Technology

Contact this candidate