Data Engineer Lake

Location:

San Jose, CA

Posted:

July 17, 2025

Contact this candidate

Resume:

San Jose, CA

SUSHMITHA J K

****************@*****.*** +1-408-***-****

WORK EXPERIENCE

SENIOR ASSOCIATE – DATA ENGINEER for PwC Service Delivery Center Bengaluru DEC 2021 – Jan 2024 Design, develop and maintain data pipelines for large dataset. Implement ETL processes to structure raw data for analysis. Migration of on-premise data to cloud, including stored procedure migration. Implement proactive monitoring and issue resolution in data pipelines. Develop custom data connectors and ingestion processes.

TBG Data Processing and Framework – data jobs to extract, transform, and process data from diverse sources (Luminate, s4) using customized Databricks scripts, storing it in Azure Synapse Data Lake for advanced analytics.

FOP Data Framework – executed complex transformations and enrichment with AWS Glue jobs orchestrated via Jupyter Notebooks. Loaded refined datasets into AWS Glue Data Catalog for streamlined analysis, automating the workflow with AWS Step Functions and applying advanced SQL transformations to align with business logic.

Decommission and Support – managed stored procedure and view migration from on-premises databases to Amazon Redshift using AWS Schema Conversion Tool (SCT). Employed custom Python scripts for precise column replacements and schema adjustments, ensuring seamless integration with Redshift architecture.

Data Lake Spend Analysis – created an API data processing framework using Azure Synapse Notebooks and Data Factories for extraction, transformation, and automation with Logic Apps for Azure Data Lake storage.

Technology – Azure Data Factory, Azure Databricks, Azure SQL Data Warehouse, Azure Logic Apps, Azure Synapse Analytics, AWS Redshift, AWS Glue, AWS Step Function, RDS, Jupyter Notebooks using Pyspark, AWS SCT, Python. SOFTWARE DEVELOPER for EVRY India Pvt Limited Bengaluru OCT 2018 – DEC 2021 Migrated on-prem ETLs to cloud. Designed and developed ETL integration patterns. Created a data extraction framework. Implemented data quality checks and validation routines for data accuracy and consistency. Developed a Big Data Refinery (BDR) framework. Implemented a CICD pipeline for code deployment.

Big Data Refinery – developed a metadata-driven framework using Spark on Azure HDInsight and Azure Data Lake, enabling data ingestion, cleansing, standardization, storage, and validation. The standardized data is accessible via external tables in Azure SQL Warehouse for dashboard creation.

SMB BI and Analytics – ingested semi-structured cloud data (AWS S3) into RDS (MySQL) after normalization and transformation. Processed data in RDS for BI and Analytics, generating reports for business users and downstream systems.

Information Lake – developed a metadata-driven framework using AWS EMR and Data Vault modeling, enabling data ingestion, cleansing, standardization, storage, and validation. The standardized data is accessible in Redshift (Hubs, Satellites, and Links) for dashboard creation.

Technology – Hadoop, Apache Hive, Apache Spark, Python, Oozie, Azure Databricks, AWS services: EMR, Lambda, SNS, RDS, S3, AWS Glue, Redshift Database- MS SQL, ORACLE, Azure.

EDUCATION

BACHELOR OF ENGINEERING IN COMPUTER SCIENCE

Mangalore Institute of Technology and Engineering, Moodbidri, India SEP 2014 – JUN 2018 TECHNICAL SKILLS

Technologies: Big Data, Hadoop, Databricks, Apache spark, Apache Hive.

Azure Services: Azure Data Factory, Azure Databricks, Azure HDInsight, Azure SQL Data Warehouse, Azure Logic Apps, Azure Synapse Analytics.

AWS Services: AWS Glue, AWS Lambda, Amazon S3, AWS Step Functions, Amazon Redshift, Amazon RDS, Amazon EMR, AWS SNS.

Programming Languages: Spark SQL, Python- (Pyspark, Pandas), SQL, Shell Scripting (Bash).

Contact this candidate