Senior Data Engineer - PySpark, Snowflake, Databricks

Location:

Plano, TX, 75024

Salary:

110000

Posted:

April 30, 2026

Contact this candidate

Resume:

Rishika Kanugula — Senior Data Engineer

386-***-**** ***************@*****.***

PROFESSIONAL SUMMARY:

Senior Data Engineer offering 5 years of comprehensive experience in architecting, developing, and optimizing scalable data solutions on modern cloud platforms.

Expertise in leveraging Python and PySpark, applying advanced Spark concepts, performance tuning, and efficient large-scale data processing techniques.

Proficient in designing and implementing robust data pipelines using industry-leading modern data management platforms like Snowflake and Databricks.

Strong command of data pipeline design, development, orchestration, and monitoring strategies for both sophisticated batch and real-time streaming data.

Hands-on experience applying Medallion Architecture principles, meticulously structuring bronze, silver, and gold data layers with best practices.

Skilled in implementing autoscaling, cluster optimization, and cost-efficient data processing within distributed cloud data environments effectively.

Adept at establishing CI/CD pipelines, integrating robust DevOps practices, and utilizing version control, including exposure to Terraform concepts.

Proven ability to collaborate effectively with diverse business, analytics, and engineering teams to deliver high-quality, reusable, and scalable data solutions.

WORK EXPERIENCE:

Senior Data Engineer @ Aetna Hartford, CT Sep 2024 – Present

Designed and implemented highly scalable data pipelines using PySpark on AWS EMR for processing complex healthcare datasets efficiently and reliably.

Engineered robust data ingestion solutions for diverse data formats, including structured and semi-structured sources, into Amazon S3 data lake.

Developed comprehensive Spark-based ETL frameworks to cleanse, transform, and aggregate healthcare claims and member data for advanced analytics.

Applied Medallion Architecture principles, establishing distinct bronze, silver, and gold data layers for enhanced data quality and accessibility.

Leveraged modern data management platforms like Snowflake for advanced analytics, integrating cleansed data for comprehensive business intelligence reporting.

Optimized Spark job performance through diligent tuning of partitions, memory utilization, and execution plans within distributed data environments.

Implemented rigorous data quality checks and reconciliation reports to ensure high data accuracy and completeness across all data layers.

Integrated downstream data consumption seamlessly through direct Athena queries and advanced Tableau dashboards for critical business insights.

Established secure data environments by implementing encryption protocols and granular IAM-based access controls for sensitive patient information.

Orchestrated complex end-to-end data workflows using Apache Airflow, ensuring reliable scheduling and proactive monitoring of all pipelines.

Actively engaged in Agile development cycles, contributing to sprint planning, daily stand-ups, and continuous improvement initiatives for data solutions.

Collaborated with cross-functional teams, including business stakeholders and analysts, to gather requirements and deliver scalable data solutions effectively.

Technologies Used: Python, PySpark, Spark SQL, AWS (S3, EMR, Glue, Athena, IAM), Snowflake, Apache Airflow, Tableau, GitHub, Jenkins, Medallion Architecture

Data Engineer @ U.S. Bank Minneapolis, MN May 2022 – Jun 2023

Designed and developed efficient data ingestion pipelines to migrate critical financial data from on-premise Oracle databases to AWS S3.

Built scalable Spark-based batch processing jobs using PySpark to transform large volumes of transactional and customer data effectively.

Implemented robust data pipeline design, development, and orchestration, ensuring reliable data flow to cloud-based data platforms.

Leveraged AWS Glue Crawlers to catalog diverse datasets and maintain consistent schema definitions across the enterprise data lake.

Performed comprehensive data aggregation and enrichment to support critical risk, compliance, and regulatory reporting requirements.

Developed sophisticated SQL queries for meticulous data validation, reconciliation, and accurate reporting across various financial systems.

Implemented CI/CD pipelines using Jenkins to automate the build, test, and deployment of Spark jobs, enhancing development velocity.

Ensured data security and compliance by implementing role-based access control mechanisms through AWS IAM and Ranger policies.

Contributed to the design and implementation of cost-efficient data processing strategies within the AWS cloud environment.

Collaborated extensively with QA and business teams to meticulously identify and resolve complex data discrepancies proactively.

Participated in the strategic planning and execution of on-premise to cloud data migration initiatives with minimal operational disruption.

Applied version control best practices using GitHub for all data engineering code, ensuring maintainability and collaborative development. Technologies Used: Python, PySpark, Apache Spark, AWS (S3, Glue, EC2, IAM), Oracle, Jenkins, GitHub, Apache Airflow, Data Pipeline Orchestration, CI/CD

Junior Data Engineer @ Sam’s Club Bentonville, AR Nov 2019 – Apr 2022

Designed and developed effective ETL workflows using Informatica to process high volumes of retail sales and inventory data accurately.

Executed comprehensive data extraction from Oracle and MySQL databases, loading cleansed information into the on-premise Hadoop environment.

Developed advanced SQL and PL/SQL scripts for intricate data transformations and critical data validation processes efficiently.

Constructed and optimized Hive tables, implementing partitioning strategies to significantly improve query performance for analytics.

Assisted actively in migrating legacy ETL processes to more efficient, scalable Spark-based frameworks in a distributed environment.

Implemented robust data ingestion solutions for various flat file formats, including CSV and TXT, ensuring data integrity consistently.

Supported reporting and analytics teams by providing well-curated, reliable datasets for critical business insights and decision-making.

Implemented foundational data quality checks and error handling mechanisms to maintain high standards of data reliability.

Utilized GitHub diligently for version control of all development artifacts, facilitating collaborative and organized code management.

Participated actively in Agile development cycles, contributing to rapid iteration and continuous delivery of data solutions effectively.

Contributed to the development and maintenance of scalable data solutions, laying groundwork for future cloud migrations strategically.

Collaborated closely with engineering teams to ensure seamless integration and deployment of data processing applications effectively. Technologies Used: Informatica, Oracle, MySQL, Hadoop, Hive, Apache Spark, SQL, GitHub, Jenkins, Data Pipeline Development, Distributed Data Processing

TECHNICAL SKILLS:

Programming Languages: Python, PySpark, SQL, PL/SQL

Data Platforms: Snowflake, Databricks, Apache Spark, Apache Hive, Hadoop

Cloud Technologies: AWS (S3, EMR, Glue, Athena, Lambda, Redshift, DynamoDB, EC2, IAM)

Data Orchestration & Management: Apache Airflow, AWS Step Functions, AWS Glue, Data Pipeline Design

Database Management: Oracle, MySQL, PostgreSQL

DevOps & CI/CD: Jenkins, Docker, Kubernetes, GitHub, Version Control, Terraform Concepts

Data Visualization: Tableau

Methodologies: Agile, SDLC, Medallion Architecture

Other Tools: Confluence, JIRA

EDUCATION:

Master of Science in Computer Science @ University of Central Florida

Contact this candidate