Data Engineer Big

Location:

New Jersey

Salary:

Posted:

October 01, 2024

Contact this candidate

Resume:

Naveen Raju

Data Engineer

469-***-****

*************@*****.***

Professional Summary:

• Accomplished Big Data Engineer with 8 years of hands-on experience in designing, implementing, and optimizing data-intensive applications within the Hadoop Ecosystem and Amazon Web Services (AWS). Specialized in crafting robust solutions for Big Data Analytics, Cloud Data Engineering, Data Warehousing, Data Visualization, Reporting, and Data Quality assurance.

• Demonstrates a comprehensive understanding of Hadoop architecture and AWS cloud services, including YARN, HDFS, MapReduce, Spark, EMR, S3, Redshift, Glue, and Lambda, adeptly integrating them to meet diverse business needs.

• Proven expertise in developing scalable and efficient enterprise solutions, leveraging a wide array of Hadoop components such as Apache Spark, MapReduce, HDFS, Sqoop, Pig, Hive, HBase, Oozie, Flume, NiFi, Kafka, Zookeeper, and YARN.

• Proficient in data ingestion and processing methodologies, adept at performing complex transformations, enrichments, and aggregations while ensuring data integrity and quality. Possesses a strong foundation in distributed systems architecture, parallel processing, and the Spark execution framework.

• Skilled in fine-tuning and optimizing algorithms within the Hadoop ecosystem using Spark Context, Spark-SQL, DataFrames API, Spark Streaming, MLlib, and Pair RDDs, with proficiency in both PySpark and Scala programming languages.

• Experienced in architecting and implementing end-to-end data pipelines within AWS, ensuring seamless compatibility across diverse data sources and destinations. Proficient in managing complex data integration pipelines using Spark and AWS services like EMR and Glue for efficient data ingestion, transformation, and loading.

• Demonstrated proficiency in managing data ingestion processes from various sources into HDFS using tools like Sqoop, Flume, and executing transformations using Hive, MapReduce. Skilled in managing Sqoop jobs for incremental load to populate HIVE external tables.

• Adept in leveraging AWS ecosystem components and Spark for ETL processes using Spark Core, Spark-SQL, and real-time data processing with Spark Streaming. Proficient in integrating Kafka as middleware for real-time data pipelines.

• Skilled in developing custom User Defined Functions (UDFs) and seamlessly integrating them with Hive and Pig using Java. Experienced in creating, debugging, scheduling, and monitoring workflows using Airflow and Oozie in both Hadoop and AWS environments.

• Hands-on experience in managing SQL and NoSQL databases, including MongoDB, HBase, Cassandra, SQL Server, and PostgreSQL. Proficient in database design, creation, migration, and transformation processes, ensuring optimal performance and data integrity.