Sushma Mothukuri
Data Engineer
Email: *****************@*****.*** Phone: +1-469-***-**** LinkedIn
Professional Summary
Data Engineering professional with 7 years of experience in Data Integration, Data Modeling, Data Warehousing, and Big Data solutions.
I am skilled in designing and building data solutions using cloud technologies like Azure (ADF, Databricks, Synapse), AWS (Glue, Redshift, Lambda, S3), and Snowflake.
Experienced in advanced SQL, specializing in query optimization, performance tuning, and efficient data retrieval across large datasets.
Proficient in using PySpark on Databricks to design and implement big data pipelines, optimizing the processing of large datasets through distributed computing and advanced data transformations.
Created dashboards using Power BI and Tableau for real-time data visualization, with experience in Google Tag Manager for tag management and Google Analytics for data analysis.
Technical Skills
Programming languages: Python, PySpark, SQL, Unix/Linux Scripting, T-SQL
Database Technologies: Oracle, MS SQL, MySQL, DB2, Cosmos DB, Mongo DB, SSMS
Operating Systems: Linux, Unix, Windows
ETL: Azure (Data Factory, Databricks, Synapse Analytics, Fabric), AWS (Glue, Lambda, Data Pipeline, Redshift), Informatica, SSIS, DBT
Big Data Technologies: HDFS, Hadoop, Hive, Kafka, Apache Spark, HBase, NoSQL
Cloud Platforms: Microsoft Azure, Amazon Web Services (AWS), Snowflake,
Cloud Data Platforms: Azure (Data Lake Storage, Lake house, SQL, Cosmos DB), Amazon (S3, Redshift, RDS, DynamoDB) Snowflake
Data Visualization and Analytical tools: Tableau, Power BI, SSRS, Google Analytics, Google Tag Manager
Development Methodologies: Agile, Waterfall
Professional Experience
McAfee Dallas, Texas Jul’2024-Present
Sr. Data Engineer
Worked to improve customer retention and renewal rates by analyzing license usage and customer data, providing the sales team with insights to target at-risk accounts.
Customer license utilization data was ingested, transformed, and loaded through robust ETL/ELT pipelines developed using SSIS, enabling comparisons of data between quarters and yearly trends.
I created tags in Google Tag Manager (GTM) to track customer actions, such as license activation and subscription upgrades, providing granular insights into product engagement.
Integrated Google Analytics data with Postgres SQL to measure customer interactions and trends, enhancing the effectiveness of marketing campaigns.
Tracked license utilization rates and churn risk, with quarterly and yearly comparisons, through automated Power BI dashboards and reports to support proactive decision-making.
DuPont Hyderabad, India Jun’2019 – July’2022
Sr. Data Engineer
Customer, material, and transactional data were processed through ETL pipelines and loaded (40+ TB) from SAP ERP, CRM, and Salesforce into Azure Data Lake (ADLS) using Azure Data Factory.
Optimized customer, material, and transactional data, including open orders, billing details, and rebates, to enhance efficiency and support decision-making at DuPont.
Worked on the Medallion Architecture in Databricks for data lake optimization, organizing raw, cleaned, and aggregated data into bronze, silver, and gold layers to enhance data quality, access speed, and governance.
CSV, JSON, Parquet, Avro and tables were used to ingest data from multiple source files and databases, ensuring efficient storage and processing for diverse data needs.
Databricks was utilized to transform large-scale transactional data from Parquet files through ETL processes using PySpark, performing multi-joins, data normalization, and optimizing performance.
Delta tables were created to handle data versioning, ensure ACID transactions, and enable time travel, with incremental loading in the Hive meta store to optimize data processing.
Leveraged Hadoop HDFS and Apache Hive in Azure Data Lake for scalable storage and data querying, along with Apache Spark in Databricks for large-scale data processing.
Developed pipelines in Azure Synapse Analytics for large-scale data processing, utilizing SQL pools and Spark for efficient querying and transformation.
Apache Kafka, Azure Event Hubs, and Spark in Databricks were used to build real-time data solutions for CRM, ERP, and POS systems, delivering immediate and scalable insights.
Designing document models in Azure Cosmos DB to efficiently store and retrieve JSON data, optimizing query performance and ensuring seamless integration with analytical workloads.
Implemented CI/CD pipelines with Azure DevOps and Git for automated ADF and Databricks deployments, improving version control and deployment efficiency.
Monitored data pipelines using Azure Application Insights, Log Analytics, and Monitor for performance tracking and alerting.
The Home Depot Hyderabad, India Feb’2017 – May’2019
Data Engineer
Migrated data to AWS S3 using AWS Data Pipeline and Database Migration Service, enhancing pipeline execution by 40%.
Optimized ETL/ELT processes with AWS Glue and Databricks, applying advanced transformations and reducing processing times by 30%.
Managed EMR clusters and leveraged PySpark/Python in Glue for complex transformations.
Leveraged Redshift external tables with partitioning and clustering, improving query performance by 25%.
Developed end-to-end pipelines by integrating AWS S3 for raw data and Snowflake for the data warehouse, improving data processing efficiency.
Automated SQL transformations in Snowflake using DBT, enabling real-time analytics and improving accuracy.
Built Power BI dashboards and reports with custom KPIs and advanced filtering, increasing actionable insights by 20%.
Automated data pipeline orchestration and monitoring using Airflow for timely execution of workflows.
JPMC Hyderabad, India Oct’2016 – jan’2017
Developer
Optimized ETL processes with parallel processing, partitioning, and incremental load, reducing processing time by 85%.
Automated Unix scripts for job monitoring and data integrity, cutting manual tasks by 90% and improving system performance.
Automated reconciliation in ETL pipelines, reducing manual validation by 80% and enhancing data quality.
Developed complex SQL queries and stored procedures for data extraction and transformation across various sources.
Implemented snowflake and star schema designs, optimizing data storage and retrieval efficiency.
Created advanced mappings in Informatica PowerCenter utilizing a range of complex transformations.
Implemented SCD Type 1 and Type 2 to streamline data updates and maintain historical data with versioned records and effective date ranges for accurate tracking and analysis.
Extensive experience with SQL, PL/SQL, query tuning, DDL scripts, and database objects, along with process improvements, data extraction, cleansing, and manipulation.
Education
Trine University, MS in Information Technology May’24 GPA: 3.9
JNTU Hyderabad, B. Tech in Computer Science Engineering May’16 GPA: 3.6