Data Engineer Big

Location:

Newark, DE

Posted:

September 12, 2024

Contact this candidate

Resume:

SAIRAGHU ADEPU

Data Engineer 302-***-**** ***********@*****.*** Newark, Delaware 19713

Linked in Sairaghu Adepu

Professional Summary

●Professional and results-oriented Data Engineer with 5+ years in IT, specializing in analysis, design, development, deployment, and maintenance of big data applications with hands-on with MapReduce, YARN, HDFS, Cassandra, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper.

●In-depth knowledge of HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce programming.

●Converting MapReduce programs to Spark transformations using Spark RDDs.

●Proficient in Spark Core, Spark SQL, Data Frames, Spark Streaming.

●Configured Spark streaming for real-time data from Kafka to HDFS using Scala.

●Executed real-time event processing using Spark Streaming and Kafka.

●Extensive experience with AWS (EC2, S3, VPC, IAM, DynamoDB, Redshift, Lambda, Event Bridge, Cloud Watch, Auto Scaling, Security Groups, CloudFormation, Kinesis, SQS, SNS).

●Knowledge of Microsoft Azure for big data processing.

●Experienced with Cloudera, Hortonworks, MapR, and Apache distributions.

●Installed, configured, supported, and managed Hadoop clusters on Apache, Cloudera, and AWS.

●Extensive experience with Spark tools like RDD transformations, DataFrame, and Spark QL.

●Hands-on with HiveQL, Pig Latin, custom MapReduce programs in Python.

●Proficient with HBase, MongoDB, Cassandra and experience with Cassandra CQL, HBase Shell, HBase Client API.

●Writing stored procedures and complex SQL queries using Oracle, SQL Server, MySQL.

●Skilled in using Python for data cleaning techniques, ensuring data accuracy and consistency.

●Experienced in data validation to maintain high-quality, reliable data for analysis and machine learning models

●Extensive experience in setting up efficient and seamless data pipelines, utilizing Python for data analysis, transformation, and integration to enhance ETL processes with flat files, XML, databases.

●Strong knowledge of Waterfall and Agile methodologies.

Education

Master’s in information systems - Wilmington University, 2023.

Bachelor of technology- JNTU Hyderabad, 2017.

Technical Skills

Category

Tools/Technologies

Big Data Frameworks

Apache Spark, Hadoop, Spark RDD, Data Frame API, Data Set API, Spark Streaming

Database & Cluster Management

Hive, Pig, Azure SQL Database, HDInsight, Databricks

Data Streaming & Processing

Spark Streaming, Kafka, EventHubs

Data Management & Querying

Cassandra, MongoDB, Hive QL

Data Visualization & Reporting

Tableau, PowerBI, Cognos

Version Control & Collaboration

GitHub, Git

Programming Languages

Python, Scala, SQL

Cloud Platforms and Services

AWS (EC2, S3, EMR, IAM, Redshift, Lambda, Glue, CloudWatch), Azure (Data Lake, Azure Data Bricks)

Experience

DATA ENGINEER Jan 2024 – Present

BIG LOTS – COLUMBOUS, OH

Proficient in managing and processing large datasets using Apache NiFi, AWS Glue, and Sqoop for efficient ETL operations and data integration.

Skilled in writing and optimizing complex SQL queries for accurate and timely reporting and optimizing Hive queries using PySpark techniques.

Developed and applied scalable data pipelines using Apache Spark, reducing processing time by 30%, streamlined data storage and retrieval processes, resulting in a 20% improvement in overall system performance.

Enhanced data storage and retrieval processes, resulting in a 20% improvement in overall system performance.

Configured AWS IAM Groups and Users for improved login authentication, performed and maintained the Hadoop cluster on AWS EMR and loaded data into S3 buckets using AWS Glue and PySpark.

Improved Spark applications using PySpark to execute complex ETL operations with precision, contributing to data processing efficiency.

Managed diverse file formats, including JSON, AVRO, and Parquet, and implemented advanced compression techniques such as Snappy within the NIFI ecosystem to streamline the processing of extensive datasets.

Demonstrated expertise in Hadoop, Spark/Scala applications, and troubleshooting production issues in large-scale data processing, Led the migration of on-premises data infrastructure to the cloud, increasing efficiency and reducing costs by 25%.

Troubleshooting production issues related to large-scale data processing for continuous system stability.

Using Git and CI/CD pipelines for DAG versioning and deployment to ensure consistency across environments.

Developing scripts and batch jobs to orchestrate Hadoop programs for automated data processing workflows.

HADOOP DEVELOPER (AZURE) Sep 2018 – Aug 2022

ANTHEM, INDIA

Utilized Bash Shell Scripting, Sqoop, AVRO, Hive, Pig, Java, and Map/Reduce to develop ETL, batch processing, and data storage functionality.

Exploited Hadoop MySQL-Connector to store MapReduce results in RDBMS.

Analyzed large datasets to determine optimal aggregation and reporting methods.

Loaded all tables from the reference source database schema into Hadoop using Sqoop.

Managed Hadoop jobs using Oozie and workflow scheduler, employing Direct Acyclic Graph (DAG) of actions with control flows.

Experienced in managing and reviewing Hadoop log files.

Involved in loading and transforming structured, semi-structured, and unstructured data from relational databases into HDFS using Sqoop imports.

Worked on extracting files from MySQL through Sqoop, placing them into HDFS, and processing them.

Developed Python scripts for automating ETL processes and data analysis, integrating SQL for efficient data manipulation.

Developed data pipelines using EventHubs, Spark, Hive, Pig, and Azure SQL Database for customer behavior and financial data analysis on HDInsight.

Created HDInsight clusters in Microsoft Azure Portal, along with EventHub and Azure SQL Databases.

Worked with clustered Hadoop on Windows Azure using HDInsight and Hortonworks Data Platform.

Experienced with Spark Streaming for data ingestion into Spark Engine and developed Spark code using Scala and Spark-SQL/Streaming. Imported data from sources like EventHubs and Cosmos into Spark RDD.

Converted Hive/SQL queries into Spark transformations using Spark RDDs and Scala.

Worked on Spark SQL and Spark Streaming, using Scala for code development and DataFrame API for data conversion.

Expertise in Hadoop Ecosystem tools including Pig, Hive, HDFS, YARN, Oozie, and Zookeeper.

Used AZCopy, Livy, Windows PowerShell, and Curl to submit Spark jobs on HDInsight Cluster.

SOFTWARE DEVELOPER June 2017 – Aug 2018

EXELON, HYDERABAD, INDIA

Utilized Java and Python to build and support software applications.

Assisted in writing clean, efficient, and maintainable code based on specifications.

Participated in testing and debugging to ensure software functionality and reliability.

Created and enhanced SQL queries to manage and retrieve data efficiently.

Continuously improved coding skills by learning new programming languages, tools, and techniques.

Sought feedback from senior developers and mentors to refine coding practices.

Assisted in the creation and maintenance of technical documentation for software applications.

Documented code and development processes for knowledge sharing and future maintenance.

Employed Jenkins and Git for continuous integration and continuous deployment (CI/CD).

Conducted thorough code reviews and unit testing to ensure high-quality code and adherence to best practices.

Certifications

Certified AWS Cloud Practitioners

Contact this candidate