Data Engineer

Location:

Katy, TX, 77494

Posted:

September 17, 2024

Contact this candidate

Resume:

Mamatha Uppu

Houston, Texas +1-346-***-****

************@*****.***

Professional Summary:

Passionate Data Engineer with over 5 years of hands-on experience in designing, building, and optimizing data solutions. Skilled in data modeling, warehousing, and creating efficient ETL pipelines. Proficient in AWS technologies such as Redshift, S3, Glue, EMR, Firehose, Lambda, IAM, Athena, RDS, DynamoDB, and Elasticsearch. Extensive experience with big data technologies like Hadoop, Hive, and Spark, and operating large data warehouses. Adept in Python, Java, Scala, and NodeJS, with strong skills in relational and non-relational databases, REST API development, and cross-functional team collaboration on AI/ML projects.

•Proficient in data modeling, warehousing, and building ETL pipelines to streamline data processing and integration across cloud environments.

Proficient in AWS technologies such as Redshift, S3, AWS Glue, EMR, Firehose, Lambda, IAM roles and permissions, Athena, RDS, DynamoDB, and Elasticsearch for scalable and secure cloud-based data solutions.

Hands-on experience with big data technologies including Hadoop, Hive, Spark, and AWS EMR for processing large-scale datasets in both real-time and batch processing environments.

Expertise in operating large data warehouses, ensuring high performance, efficient storage, and seamless data retrieval using SQL and other querying tools.

Strong programming skills in modern scripting languages such as Python, Java, Scala, and TypeScript, applied to building scalable data pipelines and performing data analysis.

Extensive experience mentoring team members on best practices in data engineering, coding standards, and system architecture.

Skilled in working with relational and non-relational databases (object storage, key-value stores, graph databases, column-family databases), ensuring efficient data storage and retrieval across various data models.

Proficient in object-oriented scripting languages such as Python, SQL, TypeScript, and Java, used for automation, data processing, and analysis.

Experience in developing and implementing REST APIs for seamless system integrations and data access.

Worked closely with cross-functional engineering teams to develop AI, machine learning, and robotic solutions, providing data support and automation for these projects.

Skills & Abilities:

AWS Technologies

Redshift, S3, Glue, EMR, Firehose, Lambda, IAM roles, Athena, RDS, DynamoDB, Elasticsearch.

Big Data

Hadoop, Hive, Spark, AWS EMR, HDFS, Pig, Airflow

Programming Languages

Python, Scala, SQL, Java

Cloud Technologies

AWS (S3, Redshift, EMR, Glue), Azure Databricks, Azure Data Lake

Data Processing

ETL pipelines, data ingestion, data transformation, real-time data streaming

Data Warehousing

Redshift, Hive, HDFS, Snowflake

Scripting

Python (Pandas, NumPy), Shell scripting for automation

Operating Systems

Linux, Windows

Analytical Tools

SAS 9.4 and SAS 9.1.3, SAS Base 9.4, SAS Macros, SAS Management Console 9.4 and 9.1, SAS Data Integration Studio 3.4 & 7.1, SAS OLAP

Cube Studio 9.1, SAS Information Map Studio 3.1, SAS Web Report Studio, SAS information Delivery Portal, SAS customer Intelligence

CI/CD tools

Jenkins, GIT, Version Control

Experience:

Client: SP Plus Corporation, Chicago, IL January 2023 to present

Roles: Data Engineering

Responsibilities:

Designed and implemented data models and optimized warehousing solutions using AWS Redshift, S3, and RDS.

Built robust ETL pipelines using AWS Glue, Lambda, and Firehose to process and transform large datasets efficiently.

Collaborated with cross-functional teams to build ETL pipelines, integrating various data sources into a centralized data warehouse on AWS Redshift.

Optimized and managed Spark jobs on AWS EMR clusters, ensuring efficient processing of large data volumes.

Worked extensively with big data technologies including Hadoop, Hive, and Spark, leveraging AWS EMR for scalable data processing.

Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation, and performed Spark application performance tuning for optimal batch intervals, parallelism, and memory usage.

Implemented data transformation and aggregation using Hive and Spark SQL for analytics and reporting.

Managed large data warehouses, writing complex SQL queries to ensure efficient data retrieval and storage.

Developed SQL scripts for automation, managed build and release processes for multiple projects using Visual Studio Team Services (VSTS).

Automated data workflows using Apache Airflow to orchestrate tasks and ensure the timely execution of jobs.

Mentored team members on best practices for data modeling, SQL optimization, and ETL development.

Collaborated with cross-functional teams on AI and machine learning initiatives, providing data pipelines to support model development.

Client: Data Recovery Centre, Australia October 2020 to July 2022

Role: AWS Data Engineering

Responsibilities:

Operated large-scale data warehouses using AWS technologies such as Redshift, Athena, and DynamoDB.

Developed and maintained ETL pipelines with AWS Glue, EMR, and Lambda to handle data ingestion and processing.

Worked extensively with Python and Scala to build scalable ETL pipelines, processing large datasets efficiently.

Implemented data processing workflows using Hadoop, Spark, and EMR for real-time and batch data ingestion.

Led initiatives to integrate both relational and non-relational databases, including object storage and key-value stores.

Created REST APIs to enable seamless interaction between different data systems and support real-time analytics.

Played a key role in mentoring junior team members on AWS infrastructure, data processing, and automation best practices.

Supported AI and machine learning projects by collaborating with cross-functional teams to provide data models and pipelines.

Client: Qualinsoft Technologies, India July 2015 to July 2017

Role: Big Data Engineering

Responsibilities:

Developed real-time and batch data processing solutions using Hadoop, Hive, and Spark to handle large-scale data ingestion and processing.

Designed and implemented data pipelines on AWS EMR to automate the ingestion, transformation, and storage of high-volume datasets.

Managed and optimized data storage in HDFS and AWS S3, ensuring data integrity and efficient retrieval for analytics.

Developed custom UDFs in Python and Scala for data transformations, enhancing data quality and processing efficiency.

Collaborated with teams to ensure data pipelines integrated seamlessly with machine learning workflows and AI models.

Implemented best practices for data partitioning, compression, and performance optimization on distributed systems.

Education:

Master of Science in Business Analytics

August 2022 – May 2024

Trine University, Angola, Indiana

Bachelor of Science in Electronics & Communication Engineering

June 2011 – May 2015

JNTUH, Hyderabad, India

Contact this candidate