Data Engineer Big

Location:

Dallas, TX

Posted:

February 20, 2025

Contact this candidate

Resume:

Sushma Mothukuri

Data Engineer

Email: *****************@*****.*** Phone: +1-469-***-**** LinkedIn

Professional Summary

Data Engineering professional with 7 years of experience in Data Integration, Data Modeling, Data Warehousing, and Big Data solutions.

I am skilled in designing and building data solutions using cloud technologies like Azure (ADF, Databricks, Synapse), AWS (Glue, Redshift, Lambda, S3), and Snowflake.

Experienced in advanced SQL, specializing in query optimization, performance tuning, and efficient data retrieval across large datasets.

Proficient in using PySpark on Databricks to design and implement big data pipelines, optimizing the processing of large datasets through distributed computing and advanced data transformations.

Created dashboards using Power BI and Tableau for real-time data visualization, with experience in Google Tag Manager for tag management and Google Analytics for data analysis.

Technical Skills

Programming languages: Python, PySpark, SQL, Unix/Linux Scripting, T-SQL

Database Technologies: Oracle, MS SQL, MySQL, DB2, Cosmos DB, Mongo DB, SSMS

Operating Systems: Linux, Unix, Windows

ETL: Azure (Data Factory, Databricks, Synapse Analytics, Fabric), AWS (Glue, Lambda, Data Pipeline, Redshift), Informatica, SSIS, DBT

Big Data Technologies: HDFS, Hadoop, Hive, Kafka, Apache Spark, HBase, NoSQL

Cloud Platforms: Microsoft Azure, Amazon Web Services (AWS), Snowflake,

Cloud Data Platforms: Azure (Data Lake Storage, Lake house, SQL, Cosmos DB), Amazon (S3, Redshift, RDS, DynamoDB) Snowflake

Data Visualization and Analytical tools: Tableau, Power BI, SSRS, Google Analytics, Google Tag Manager

Development Methodologies: Agile, Waterfall

Professional Experience

McAfee Dallas, Texas Jul’2024-Present

Sr. Data Engineer

Worked to improve customer retention and renewal rates by analyzing license usage and customer data, providing the sales team with insights to target at-risk accounts.

Customer license utilization data was ingested, transformed, and loaded through robust ETL/ELT pipelines developed using SSIS, enabling comparisons of data between quarters and yearly trends.

I created tags in Google Tag Manager (GTM) to track customer actions, such as license activation and subscription upgrades, providing granular insights into product engagement.

Integrated Google Analytics data with Postgres SQL to measure customer interactions and trends, enhancing the effectiveness of marketing campaigns.

Tracked license utilization rates and churn risk, with quarterly and yearly comparisons, through automated Power BI dashboards and reports to support proactive decision-making.

DuPont Hyderabad, India Jun’2019 – July’2022

Sr. Data Engineer

Customer, material, and transactional data were processed through ETL pipelines and loaded (40+ TB) from SAP ERP, CRM, and Salesforce into Azure Data Lake (ADLS) using Azure Data Factory.

Optimized customer, material, and transactional data, including open orders, billing details, and rebates, to enhance efficiency and support decision-making at DuPont.

Worked on the Medallion Architecture in Databricks for data lake optimization, organizing raw, cleaned, and aggregated data into bronze, silver, and gold layers to enhance data quality, access speed, and governance.

CSV, JSON, Parquet, Avro and tables were used to ingest data from multiple source files and databases, ensuring efficient storage and processing for diverse data needs.

Databricks was utilized to transform large-scale transactional data from Parquet files through ETL processes using PySpark, performing multi-joins, data normalization, and optimizing performance.

Delta tables were created to handle data versioning, ensure ACID transactions, and enable time travel, with incremental loading in the Hive meta store to optimize data processing.

Leveraged Hadoop HDFS and Apache Hive in Azure Data Lake for scalable storage and data querying, along with Apache Spark in Databricks for large-scale data processing.

Developed pipelines in Azure Synapse Analytics for large-scale data processing, utilizing SQL pools and Spark for efficient querying and transformation.

Apache Kafka, Azure Event Hubs, and Spark in Databricks were used to build real-time data solutions for CRM, ERP, and POS systems, delivering immediate and scalable insights.

Designing document models in Azure Cosmos DB to efficiently store and retrieve JSON data, optimizing query performance and ensuring seamless integration with analytical workloads.

Implemented CI/CD pipelines with Azure DevOps and Git for automated ADF and Databricks deployments, improving version control and deployment efficiency.

Monitored data pipelines using Azure Application Insights, Log Analytics, and Monitor for performance tracking and alerting.

The Home Depot Hyderabad, India Feb’2017 – May’2019

Data Engineer

Migrated data to AWS S3 using AWS Data Pipeline and Database Migration Service, enhancing pipeline execution by 40%.

Optimized ETL/ELT processes with AWS Glue and Databricks, applying advanced transformations and reducing processing times by 30%.

Managed EMR clusters and leveraged PySpark/Python in Glue for complex transformations.

Leveraged Redshift external tables with partitioning and clustering, improving query performance by 25%.

Developed end-to-end pipelines by integrating AWS S3 for raw data and Snowflake for the data warehouse, improving data processing efficiency.

Automated SQL transformations in Snowflake using DBT, enabling real-time analytics and improving accuracy.

Built Power BI dashboards and reports with custom KPIs and advanced filtering, increasing actionable insights by 20%.

Automated data pipeline orchestration and monitoring using Airflow for timely execution of workflows.

JPMC Hyderabad, India Oct’2016 – jan’2017

Developer

Optimized ETL processes with parallel processing, partitioning, and incremental load, reducing processing time by 85%.

Automated Unix scripts for job monitoring and data integrity, cutting manual tasks by 90% and improving system performance.

Automated reconciliation in ETL pipelines, reducing manual validation by 80% and enhancing data quality.

Developed complex SQL queries and stored procedures for data extraction and transformation across various sources.

Implemented snowflake and star schema designs, optimizing data storage and retrieval efficiency.

Created advanced mappings in Informatica PowerCenter utilizing a range of complex transformations.

Implemented SCD Type 1 and Type 2 to streamline data updates and maintain historical data with versioned records and effective date ranges for accurate tracking and analysis.

Extensive experience with SQL, PL/SQL, query tuning, DDL scripts, and database objects, along with process improvements, data extraction, cleansing, and manipulation.

Education

Trine University, MS in Information Technology May’24 GPA: 3.9

JNTU Hyderabad, B. Tech in Computer Science Engineering May’16 GPA: 3.6

Contact this candidate