Resume

Data Engineer Big data developer

Location:

Prospect Heights, IL

Salary:

Posted:

January 09, 2024

Contact this candidate

Resume:

Srimugi Mohan

https://www.linkedin.com/in/sri-mugi-m-98104779 ad2ltq@r.postjobfree.com +1-224-***-**** PROFESSIONAL SUMMARY

• Experienced professional with a background of over 7 years, proficient in the development, seamless integration, and effective deployment of Big Data and Hadoop applications.

• Proficient in Agile methodologies, specializing in ETL processes, data migration, data pipelines, and data quality enhancement.

• In-depth understanding and experience across SDLC phases, especially in Retail, Finance, and Healthcare domains. SKILL SET

Hadoop SQL Linux Shell scripting

Hive Spark Talend ETL tool

MapReduce Python Jira, GitHub, SVN

HDFS Pyspark Jenkins, Kubernetes

Yarn Cassandra Hortonworks Data Platform

HBase Snowflake Cloudera Data Platform

Sqoop AWS IBM InfoSphere DataStage tool

Oozie Microsoft Azure MySQL, PostgreSQL

WORK EXPERIENCE

Senior Consultant, Deloitte Touche Tohmatsu Limited Jul 2021 – Mar 2023 PROJECT : Data Engineer, PayPal Holdings, Inc

• Developed and maintained Hadoop-based solutions using HDFS, MapReduce, Hive, and HBase to process and analyze large-scale datasets efficiently.

• Designed and optimized Hive MapReduce and Pyspark jobs for enhanced performance and scalability.

• Utilized Apache Sqoop for seamless data import and export between RDBMS and Hadoop ecosystems.

• Optimized SQL queries, reducing execution time by 50%, resulting in faster data retrieval and improved performance.

• Integrated and processed data from different sources to ensure seamless data flow into Hadoop clusters.

• Developed Spark and Python scripts for data processing improving data processing times significantly.

• Actively engaged in tech communities, staying updated with trends and experimenting with emerging technologies, while mentoring peers to foster a culture of continuous learning.

• Collaborated with cross-functional Agile teams, contributing to the design, testing, and successful implementation of robust cloud-based solutions.

• Proficient in developing and managing on-premises solutions, utilizing on-premises infrastructure to design, implement, and optimize data pipelines, ensuring efficient data processing and storage. Associate, Cognizant Technology Solutions Oct 2019 – Jul 2021 PROJECT : Data Engineer, CVS Health Corporation

• Created and managed Hive tables to categorize and store health and wellness metrics.

• Optimized data processing by leveraging Hive's partitioning and bucketing features.

• Scheduled Hive jobs using the TIDAL scheduler for timely execution.

• Collaborated with data scientists and analysts to create data pipelines for advanced analytics.

• Developed an aggregation pipeline using HQL and Linux shell scripting to consolidate daily metrics data into a final HBase table.

• Leveraged Hive's partitioning and bucketing feature to logically organize and segregate data, enhancing parallelism significantly reducing query processing times.

• Managed code deployment through GitHub for efficient version control and collaboration.

• Worked closely with DevOps teams to integrate data engineering processes seamlessly into the CI/CD pipeline, enhancing code quality and minimizing errors using Jenkins and Kubernetes.

• Developed data aggregation pipeline using HQL and shell scripting to consolidate daily metrics data into a final HBase table.

Systems Engineer, Tata Consultancy Services Mar 2016 – Sep 2019 PROJECT 1: Data Engineer, Office Depot Inc

• Designed and implemented both managed and external tables in Hive, integrated Hive, HBase tables to facilitate the seamless exchange and storage of historical and delta data.

• Developed and optimized ETL workflows using Talend ETL tool to ensure seamless data extraction, transformation, and loading processes.

• Managed PostgreSQL databases for the purpose of organizing and storing small-scale data tables, while concurrently deploying Hadoop for distributed processing and storage of vast and complex datasets.

• Implemented, and maintained the data storage infrastructure using Snowflake, ensuring optimal organization for analytics usage.

• Orchestrated ETL workflows, automating data extraction and loading from Snowflake to Azure Blob storage and HDFS, enabling efficient data processing.

• Utilized AWS services like Amazon S3 to acquire competitor data, ensuring a robust and timely data pipeline for analytical and reporting purposes.

• Managed the scheduling and oversight of daily Oozie jobs to support data analytics teams, guaranteeing the availability of required data for analysis.

PROJECT 2: DataStage Developer, The ODP corporation

• Played a key role in the early stages of project development by actively participating in requirements gathering sessions.

• Designed and implemented DataStage ETL jobs using the IBM Infosphere WebSphere ETL tool.

• Designed and constructed sequence jobs to create structured workflows that streamlined the execution of ETL tasks, ensuring data moved efficiently from source to target systems.

• Contributed to documentation efforts, maintaining comprehensive records of job designs, mappings, and transformation rules to support knowledge sharing and future reference.

• Utilized the ESP scheduler to automate and schedule data migration jobs, guaranteeing timely execution and minimal manual intervention.

EDUCATION

Kongu Engineering College Affiliated to Anna University, India Jun 2011 – Jun 2015 Bachelor of engineering in Electronics and Communication Engineering (B.E ECE)

Contact this candidate