Data Engineer Big

Location:

Houston, TX, 77054

Posted:

June 25, 2025

Contact this candidate

Resume:

Deepika Santhanakrishnan

Houston, TX – ***** 346-***-**** ******.***@*****.***

Summary

Passionate Cloud Data Engineer with 8+ years of experience in delivering scalable, automated data solutions across AWS, Databricks, and GCP. Expert in building ETL pipelines, Delta Lake based Lakehouse architectures, and managing big data analytics using PySpark, SQL, and Python. Skilled in CI/CD and Agile delivery with a strong focus on performance, reliability, and business-driven insights.

Software Skills

• Cloud Platforms: Databricks, AWS (S3, Lambda, Glue, RDS, Athena, SNS, Event Bridge, CloudWatch, IAM), Google Cloud Platform (BigQuery, Cloud Composer)

• Programming: Python, PySpark, SQL, HQL, Shell Scripting

• Big Data Technologies: Spark, Hadoop, Hive, YARN, HDFS, ORC, Parquet, Snappy

• Database: PostgresSQL, MySQL, Oracle, PostgreSQL

• Data Modeling: Normalization, Slowing Changing Dimension, Star Schema, Snowflake Schema, OLTP, OLAP

• Python Frameworks & Libs: Pandas, PySpark, Boot3, Airflow, Psycopg2, Requests, Pytest, Logging, Openpyxl, Sendgrid

• ETL Tools: AWS Glue, Informatica PowerCenter

• Agile Tools: Jira, Rally, Agile

• Version Control & CI/CD: Git, BitBucket, Jenkins, Jules, Terraform, Liquibase, CI/CD processes

• Orchestration Tool: Airflow, Autosys

• Monitoring & Logging: CloudWatch, Logging

• Domain: Wealth Management, Retail E-Commerce, Retail Banking, HR Datawarehouse Professional Experience

Deloitte / JPMorgan Chase Apr 2024-Jun 2025

Tech Lead/ Consultant

Project: Migration from Mainframe to AWS Cloud

• Led a team in migrating a legacy Mainframe system to a modern data Lakehouse architecture using Databricks Delta Lake and AWS services including S3, Glue, Lambda, Aurora PostgreSQL, and Athena, enabling a scalable, high-performance, and cost- effective data platform.

• Built Delta Lake-based ETL pipelines on Databricks (PySpark) to handle large-scale transformations with ACID transaction support, schema evolution, and time travel, ensuring data consistency and regulatory compliance.

• Designed modular and reusable ETL frameworks using PySpark in Databricks, AWS Glue, and Lambda to process and transform millions of records; data was written to Delta tables for analytical workloads and to Aurora PostgreSQL for operational use cases.

• Utilized S3 as the data lake storage layer integrated with Delta Lake, optimizing storage using ORC format and Snappy compression, reducing storage cost and improving Athena query performance.

• Developed automated data quality checks and data reconciliation scripts in PySpark (Databricks), ensuring end-to-end data integrity and reducing manual validation by 80%.

• Enabled self-service analytics by integrating Athena client with Glue jobs to query data directly from S3.

• Implemented event-driven data workflows: Lambda functions triggered by S3 events launched Glue jobs and Databricks notebooks, orchestrating the pipeline end-to-end.

• Integrated Amazon SNS with Lambda and Glue to provide real-time job alerts on success/failure, enhancing pipeline observability.

• Used Liquibase, Jules, and Bitbucket for managing database schema versioning, automated deployments, and source control.

• Conducted extensive Spark job performance tuning on Databricks and AWS Glue by refining data partitioning, caching strategies, and selecting appropriate worker node types, resulting in cost savings and faster runtimes. Deloitte / Macys Dec 2023 - Mar 2024

Python Tech Lead

• Designed and developed a Python script to extract data from Google BigQuery using its Python API, facilitating automated email reporting with SendGrid, which improved customer follow-up efficiency for delinquent accounts.

• Automated data workflows using Google Cloud Composer (managed Airflow) to orchestrate data processing and send notifications based on delinquent user activity.

• Enhanced reports by dynamically formatting Excel sheets with the Openpyxl library, providing detailed summaries and improved readability.

• Reduced manual reporting efforts by 90% and improved communication timeliness through automated email workflows. Sep 2012 - May 2016

FIS Global

Tech Lead/ Senior Software Engineer

Project: Modern Banking Platform Product Development

• Collaborated closely with business stakeholders to understand retail deposit requirements, development, testing, deployment, and post-deployment support.

• Designed and implemented scalable ETL pipelines using Python and PySpark to process millions of daily transactions.

• Performed data cleansing, transformation, and aggregation tasks on large datasets with PySpark.

• Optimized existing HQL queries and migrated them to Spark, reducing job execution times by 30%.

• Created Hive managed tables and implemented dynamic partitions with ORC file format and Snappy compression to reduce storage costs and improve processing times.

• Automated the generation of customer account statements using Hive to ensure timely and accurate reporting.

• Developed a Python script to encrypt customer data using an anonymization API.

• Utilized Jules and Bitbucket to automate deployment processes and maintain source code integrity.

• Provided production support to implementation teams and clients by resolving critical issues. Cognizant / Hartford Feb 2017 - Apr 2020

Programmer Analyst

• Analyzed and interpreted business requirements to deliver effective data warehousing solutions that improved HR processes.

• Extracted data from multiple sources, applied complex transformations, and loaded data into the HR data warehouse using Informatica PowerCenter.

• Created comprehensive code review checklists and unit test cases to ensure quality in ETL jobs.

• Developed and maintained datasheets to ensure consistency and accuracy from source to destination.

• Conducted rigorous unit testing and maintained detailed logs for all ETL mappings, ensuring seamless data flow.

• Collaborated with cross-functional teams for knowledge sharing and alignment on data integration tasks.

• Leveraged Jenkins to automate deployment processes and maintain source code integrity.

• Completed development and configuration tasks within stipulated timelines, ensuring project milestones were met. Educational Qualification

Anna University, India

Bachelor of Engineering, Computer Science and Engineering Certification

•Big Data by Trendy Tech

•SQL Hacker Rank

Apr 2020 - Dec 2023

Contact this candidate