Big Data Engineering

Location:

Posted:

October 24, 2023

Resume:

Santhosh Guntupalli

Springfield, IL • ********.************@*****.*** • 217-***-**** • linkedin.com/in/santhoshguntupalli PROFESSIONAL EXPERIENCE

LTI MINDTREE Hyderabad, INDIA

Data Engineer SEPTEMBER 2019- JUNE 2022

Achievements:

· Managed and executed 10 complex data engineering projects, enhancing data quality and supporting data- informed decision-making.

· Developed data processing applications in Java and Scala, ensuring efficient and scalable data pipelines.

· Implemented build automation using Maven, Gradle, and SBT, resulting in a 20% reduction in project delivery time.

· Proficiently managed SQL databases, optimizing performance, data modeling, and security, contributing to a 30% improvement in query efficiency.

· Spearheaded effective management of data storage in HDFS and S3 environments, reducing costs by 15% and improving data accessibility.

· Leveraged Apache Spark and Databricks for big data processing, leading to a 25% increase in data processing speed and efficiency.

· Worked in Unix/Linux environments, scripting for automation, resulting in a 40% reduction in manual tasks.

· Designed, developed, and optimized ETL/ELT processes in data pipelines, reducing data processing time by 20%.

· Proficient in Docker and Kubernetes, enabling containerization and scaling of data applications, resulting in a 30% improvement in deployment efficiency.

· Utilized Apache Airflow, Control-M, and Arrow for workflow orchestration, ensuring timely and reliable data pipeline execution.

· Expertise in Apache Kafka and Confluent for real-time data streaming, contributing to the establishment of a robust real-time data processing infrastructure. Responsibilities:

· Collaborated cross-functionally to gather data requirements and design data models, aligning data solutions with business objectives.

· Designed and maintained data pipelines for batch and real-time processing, ensuring data availability and timeliness.

· Conducted data profiling, cleansing, and validation to uphold data quality standards, reducing data errors by 20%.

· Proactively monitored and tuned data pipelines for optimal performance and scalability.

· Maintained comprehensive documentation for data engineering processes, facilitating knowledge sharing and compliance with best practices.

· Assisted in evaluating and implementing NoSQL solutions, expanding data storage options.

· Actively participated in code reviews, mentored junior team members, and contributed to the development of best practices.

· Worked within an agile development environment, ensuring the timely delivery of projects and efficient collaboration.

· Provided on-call support for data-related issues, minimizing downtime and ensuring data integrity.

· Stayed updated with emerging data technologies and trends, continually improving data engineering practices.

SUNTECH Bengaluru, INDIA

Data Engineering intern APRIL 2019- AUG 2019

Internship Project: ETL Automation for Sales Data Processing Achievements:

· Achieved a significant 30% reduction in data processing time, resulting in more timely analytics and reporting.

· Significantly improved data quality, leading to a 25% reduction in data-related errors.

· Gained practical experience in ETL automation and data engineering, reinforcing my commitment to learning and growth as a Data Engineer.

Key Responsibilities:

ETL Pipeline Development:

· Collaborated with senior Data Engineers to design and implement ETL pipelines using Python and Apache Spark.

· Automated data extraction from various sources, including CSV files and relational databases.

· Enhanced data transformation processes to ensure data accuracy and quality. Data Quality and Validation:

· Implemented data quality checks and validation rules to identify and address missing or inconsistent data.

· Improved data accuracy, reducing data-related errors by 25%. Monitoring and Issue Resolution:

· Established monitoring using Python scripts and email alerts for data pipeline failures.

· Assisted in resolving pipeline issues promptly, minimizing downtime and ensuring data availability. Documentation and Knowledge Sharing:

· Maintained project documentation, including ETL process flow diagrams and code documentation.

· Contributed to knowledge sharing by assisting in training junior team members on ETL best practices. EDUCATION

UNIVERSITY OF ILLINOIS AT SPRINGFIELD Springfield, IL Master’s in Management Information Systems 2022-2023 SKILLS

· Technical Skills: R, Python, SQL, PL/SQL, NoSQL, C, C++, SAS, MATLAB.

· Machine learning frameworks: Scikit-learn, Tensor-flow, Pytorch

· Data Visualization: Matplotlib, seaborn, Power BI, Tableau.

· Data ware housing: Amazon Redshift

· Big Data Processing: Apache Spark, Hadoop

· Notebooks: Jupyter Notebook.

· Data Extraction and Transformation: Apache Nifi, Apache Airflow

· Cloud Services: AWS, Azure and Google Cloud

· ETL tools: Talend, Informatica

· Other: Scala, Sla, Data warehousing, Data Modelling, Data governance, Data Architecture, Product Analysis

· Languages: Fluent in English

Certifications

· Meta Data Engineer Professional Certificate

· IBM Introduction to Data Engineering

· Database structures and Management with

MySQL

· CISCO Data Analytics Essentials

· Meta Version Control

· Meta Advanced MySQL

· IBM Exploratory Data Analysis for Machine

learning

Contact this candidate