SAIRAGHU ADEPU
Data Engineer 302-***-**** ***********@*****.*** Newark, Delaware 19713
Linked in Sairaghu Adepu
Professional Summary
●Professional and results-oriented Data Engineer with 5+ years in IT, specializing in analysis, design, development, deployment, and maintenance of big data applications with hands-on with MapReduce, YARN, HDFS, Cassandra, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper.
●In-depth knowledge of HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce programming.
●Converting MapReduce programs to Spark transformations using Spark RDDs.
●Proficient in Spark Core, Spark SQL, Data Frames, Spark Streaming.
●Configured Spark streaming for real-time data from Kafka to HDFS using Scala.
●Executed real-time event processing using Spark Streaming and Kafka.
●Extensive experience with AWS (EC2, S3, VPC, IAM, DynamoDB, Redshift, Lambda, Event Bridge, Cloud Watch, Auto Scaling, Security Groups, CloudFormation, Kinesis, SQS, SNS).
●Knowledge of Microsoft Azure for big data processing.
●Experienced with Cloudera, Hortonworks, MapR, and Apache distributions.
●Installed, configured, supported, and managed Hadoop clusters on Apache, Cloudera, and AWS.
●Extensive experience with Spark tools like RDD transformations, DataFrame, and Spark QL.
●Hands-on with HiveQL, Pig Latin, custom MapReduce programs in Python.
●Proficient with HBase, MongoDB, Cassandra and experience with Cassandra CQL, HBase Shell, HBase Client API.
●Writing stored procedures and complex SQL queries using Oracle, SQL Server, MySQL.
●Skilled in using Python for data cleaning techniques, ensuring data accuracy and consistency.
●Experienced in data validation to maintain high-quality, reliable data for analysis and machine learning models
●Extensive experience in setting up efficient and seamless data pipelines, utilizing Python for data analysis, transformation, and integration to enhance ETL processes with flat files, XML, databases.
●Strong knowledge of Waterfall and Agile methodologies.
Education
Master’s in information systems - Wilmington University, 2023.
Bachelor of technology- JNTU Hyderabad, 2017.
Technical Skills
Category
Tools/Technologies
Big Data Frameworks
Apache Spark, Hadoop, Spark RDD, Data Frame API, Data Set API, Spark Streaming
Database & Cluster Management
Hive, Pig, Azure SQL Database, HDInsight, Databricks
Data Streaming & Processing
Spark Streaming, Kafka, EventHubs
Data Management & Querying
Cassandra, MongoDB, Hive QL
Data Visualization & Reporting
Tableau, PowerBI, Cognos
Version Control & Collaboration
GitHub, Git
Programming Languages
Python, Scala, SQL
Cloud Platforms and Services
AWS (EC2, S3, EMR, IAM, Redshift, Lambda, Glue, CloudWatch), Azure (Data Lake, Azure Data Bricks)
Experience
DATA ENGINEER Jan 2024 – Present
BIG LOTS – COLUMBOUS, OH
Proficient in managing and processing large datasets using Apache NiFi, AWS Glue, and Sqoop for efficient ETL operations and data integration.
Skilled in writing and optimizing complex SQL queries for accurate and timely reporting and optimizing Hive queries using PySpark techniques.
Developed and applied scalable data pipelines using Apache Spark, reducing processing time by 30%, streamlined data storage and retrieval processes, resulting in a 20% improvement in overall system performance.
Enhanced data storage and retrieval processes, resulting in a 20% improvement in overall system performance.
Configured AWS IAM Groups and Users for improved login authentication, performed and maintained the Hadoop cluster on AWS EMR and loaded data into S3 buckets using AWS Glue and PySpark.
Improved Spark applications using PySpark to execute complex ETL operations with precision, contributing to data processing efficiency.
Managed diverse file formats, including JSON, AVRO, and Parquet, and implemented advanced compression techniques such as Snappy within the NIFI ecosystem to streamline the processing of extensive datasets.
Demonstrated expertise in Hadoop, Spark/Scala applications, and troubleshooting production issues in large-scale data processing, Led the migration of on-premises data infrastructure to the cloud, increasing efficiency and reducing costs by 25%.
Troubleshooting production issues related to large-scale data processing for continuous system stability.
Using Git and CI/CD pipelines for DAG versioning and deployment to ensure consistency across environments.
Developing scripts and batch jobs to orchestrate Hadoop programs for automated data processing workflows.
HADOOP DEVELOPER (AZURE) Sep 2018 – Aug 2022
ANTHEM, INDIA
Utilized Bash Shell Scripting, Sqoop, AVRO, Hive, Pig, Java, and Map/Reduce to develop ETL, batch processing, and data storage functionality.
Exploited Hadoop MySQL-Connector to store MapReduce results in RDBMS.
Analyzed large datasets to determine optimal aggregation and reporting methods.
Loaded all tables from the reference source database schema into Hadoop using Sqoop.
Managed Hadoop jobs using Oozie and workflow scheduler, employing Direct Acyclic Graph (DAG) of actions with control flows.
Experienced in managing and reviewing Hadoop log files.
Involved in loading and transforming structured, semi-structured, and unstructured data from relational databases into HDFS using Sqoop imports.
Worked on extracting files from MySQL through Sqoop, placing them into HDFS, and processing them.
Developed Python scripts for automating ETL processes and data analysis, integrating SQL for efficient data manipulation.
Developed data pipelines using EventHubs, Spark, Hive, Pig, and Azure SQL Database for customer behavior and financial data analysis on HDInsight.
Created HDInsight clusters in Microsoft Azure Portal, along with EventHub and Azure SQL Databases.
Worked with clustered Hadoop on Windows Azure using HDInsight and Hortonworks Data Platform.
Experienced with Spark Streaming for data ingestion into Spark Engine and developed Spark code using Scala and Spark-SQL/Streaming. Imported data from sources like EventHubs and Cosmos into Spark RDD.
Converted Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Worked on Spark SQL and Spark Streaming, using Scala for code development and DataFrame API for data conversion.
Expertise in Hadoop Ecosystem tools including Pig, Hive, HDFS, YARN, Oozie, and Zookeeper.
Used AZCopy, Livy, Windows PowerShell, and Curl to submit Spark jobs on HDInsight Cluster.
SOFTWARE DEVELOPER June 2017 – Aug 2018
EXELON, HYDERABAD, INDIA
Utilized Java and Python to build and support software applications.
Assisted in writing clean, efficient, and maintainable code based on specifications.
Participated in testing and debugging to ensure software functionality and reliability.
Created and enhanced SQL queries to manage and retrieve data efficiently.
Continuously improved coding skills by learning new programming languages, tools, and techniques.
Sought feedback from senior developers and mentors to refine coding practices.
Assisted in the creation and maintenance of technical documentation for software applications.
Documented code and development processes for knowledge sharing and future maintenance.
Employed Jenkins and Git for continuous integration and continuous deployment (CI/CD).
Conducted thorough code reviews and unit testing to ensure high-quality code and adherence to best practices.
Certifications
Certified AWS Cloud Practitioners