Data Engineer Hadoop Developer

Location:

Kalamazoo, MI

Posted:

May 08, 2023

Contact this candidate

Resume:

Mani Nagubandi

Phone: +1-732-***-****

Email: *.*************@*****.***

Sr. Data Engineer

https://www.linkedin.com/in/mani-nagubandi-a5757b23b/

Profile Summary:

8+ years of IT experience in Software Development with 5+ years’ work experience as a Data Engineer.

Proven experience working with Apache Hadoop, Apache Spark and Apache Flink Frameworks.

Strong experience in building Batch and Streaming Data Pipelines.

Experience handling Gigabytes to Terabytes scale of data, churning insights.

Strong acumen in choosing the right tools and technologies for building scalable architectures and platform solutions.

Strong experience working in e-commerce, networking and telecommunication domains.

5+ years of experience designing large-scale Big Data platforms for Data Ingestion, Data Processing and Data Analytics.

Experience using Test-Driven development at every stage of software development.

Hands on experience on Google Cloud Platform (GCP) in all the bigdata products BigQuery, Cloud Data Proc, Google Cloud Storage, Composer (Air Flow as a service).

Experience building data models with RDBMS and MySQL databases.

Experience in using message queues like Apache Kafka.

Experience in building Cloud Solutions using AWS.

Experience in building Data-warehouses and Data-lakes using Snowflake.

Experience in Automating the CI/CD using Jenkins, Docker, and Kubernetes.

Experience working in teams following Agile and Scrum methodologies.

Technical Skills:

Big Data : Apache Hadoop (Cloudera, MapR), Apache Spark

Cloud Technologies : AWS Lambda, EMR, EC2, SQS, S3, Glue, Athena, CloudWatch, Databricks, GCP

DBMS : Amazon Redshift, Postgres, MySQL

ETL Tools : Talend, Informatica Power Center

Deployment Tools : Git, Jenkins and CloudFormation

Programming Languages : Java, Python, Scala

Scripting : Shell Scripting

Education:

Bachelors in Computer Science from Sathyabama University- 2014

Work Experience:

AT&T, Troy, MI Apr 2021- Present

Role: Sr. Cloud Data Engineer

Responsibilities:

Design, develop, implement, test, document, and operate large-scale, high-volume, high-performance big data structures for business intelligence analytics.

Designed, build and managed ELT data pipeline, leveraging Airflow, Java, DBT, Stitch Data.

Performed data visualization for different modules using Tableau and ONE Click method. Worked with different teams on defining and representing key parameters etc.

Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.

Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.

Designed, build and managed ELT data pipeline, leveraging Airflow, Java, DBT, Stitch Data and GCP solutions.

Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark databricks cluster.

Used Test driven approach for developing the application and implemented the unit tests using Java Unit test framework.

Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.

Experience in GCP Dataproc, GCS, Cloud functions, BigQuery

Parsed and transformed complex daily data feeds for multi-destination delivery Provide analytics support to leadership team by proactively asking questions and rigorously analyzing some of the most important issues impacting the future of the Consumer business.

Experience in GEP Dataproc, GCS, Cloud functions, BigQuery.

Experienced in writing Spark Applications in Scala and Java.

Written Programs in Spark using Scala for Data quality check.

Converted several Informatica workflows into Hadoop, Spark and Scala.

Developed a POC for project migration from on premise Hadoop MapR system to GCP.

Develop Cloud Functions in Java to process JSON files from source and load the files to BigQuery.

Developed Java code to gather the data from HBase (Cornerstone) and designs the solution to implement using PySpark.

Used cloud shell SDK in GEP to configure the services Data Proc, Storage, BigQuery.

Used Scala to convert SQL queries into RDD transformations in Apache Spark.

Used Kafka consumer’s API in Scala for consuming data from Kafka topics.

Design and implement ETL configuration, work with complex huge datasets, enhance query performance, migrating jobs to HDFS to improve scaling and performance are some of my other job responsibilities

Developed a data Quality framework to automate the quality checks at different layer of data transformation.

Build a frame to extract the data file from mainframe system and performance ETL using shell script and Pyspark.

Developed deployment architecture and scripts for automated system deployment in Jenkins.

Environment: Spark, PySpark, Spark SQL, Java, Cloud, GCP, EMR, Lambda, Glue, HDFS, Hive, Apache Kafka, Sqoop, Scala, Shell scripting, Linux, MySQL Oracle Enterprise DB, Jenkins, Git, Oozie, MySQL, Soap, NIFI, Cassandra and Agile

Client: Cisco, San Jose, CA Nov 2019 to Mar 2021

Role: Data Engineer

Responsibilities:

Worked on migrating the on-premises Big Data project to AWS cloud.

Developed real time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka and JMS.

Created multi-node Hadoop and Spark clusters in AWS instances to generate Terabytes of data and stored it in AWS HDFS.

Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.

Developed the ETL Data pipeline for data loading from centralized Data Lake/ AWS service S3 as a data source to PostgreSQL (RDBMS) using Spark.

Was responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.

Design and Develop ETL Processes in AWS Glue. to migrate Campaign data from external sources like S3, Text Files into AWS Redshift

Extracted required medical data from Data Lake Storage into PySpark data-frame for further exploration and visualization of the data to find insights and build prediction model.

Creating AWS Lambda functions using python for deployment management in AWS and designed, investigated and implemented public facing websites on AWS and integrated it with other applications infrastructure.

Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift.

Expertise in tools such as Tableau, SQL etc. for inspecting information, preparing dashboards, stories and making recommendations.

Data extraction, aggregations and consolidation of adobe data within AWS Glue using Pyspark.

Developed Python AWS server less lambda with concurrent and multi-threading to make the process faster and asynchronously executing the callable.

Integrated AWS Dynamo DB using AWS Lambda to store the values of items and backup the DynamoDB streams

Ingested data in mini-batches and performs RDD transformations on those mini-batches of data by using Spark Streaming to perform streaming analytics in Databricks.

Design and Develop ETL Process in AWS Glue to migrate Campaign data from external sources.

Mentor and guide analyst on building purposeful analytics tables in dbt for cleaner schemas.

Prepared dashboards using Tableau for summarizing Configuration, Quotes, Orders and other data.

Install and configure Apache Airflow for S3 bucket and Snowflake data warehouse and created dags to run the Airflow.

Construct the AWS data pipelines using VPC, EC2, S3, Auto Scaling Groups, EBS, IAM, CloudFormation, Route 53 and CloudWatch.

Very capable at using AWS utilities such as EMR, S3 and Cloud watch to run and monitor Hadoop/Spark jobs.

Migrated an existing on-premises application to AWS.

Created ETL Framework using spark on AWS EMR in Scala/Python.

Used AWS services like EC2 and S3 for small data sets.

Expertise in snowflake to create and Maintain Tables and views.

Very capable at using AWS utilities such as EMR, S3 and Cloud watch to run and monitor Hadoop/Spark jobs.

Worked with PL/SQL procedures and used them in Stored Procedure Transformations.

Used Spark SQL with Scala for creating data frames and performed transformations on data frames.

Experience in building and architecting multiple Data pipelines, end to end ETL process for Data ingestion and transformation.

AWS CI/CD Data pipeline and AWS Data Lake using EC2, AWS Glue, AWS Lambda.

Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.

During Migration upgraded the code with new spark and python versions.

Created Airflow scheduling scripts in Python.

Worked on Informatica tools such as power center, MDM, Repository manager and workflow monitor.

Performed data ingestion from multiple RDBMS, vendor files using Qlik and stored in ADLS gen2 (blob containers).

Developed Spark applications using Scala for easy Hadoop transitions.

Good Knowledge on Cloudera distributions and in Amazon simple storage service (Amazon S3), AWS Redshift, Lambda and Amazon EC2, Amazon EMR.

After the transformation data is stored in data marts and used for reporting layer. Used Snowflake to store the transformed data which is consumed by data scientists, reporting layer etc.

Environment: Spark, Pyspark, Scala, Python, Cloud, Tableau, AWS, Lambda, Glue, HDFS, Hive, Apache Kafka, Sqoop, Scala, Shell scripting, Linux, MySQL Oracle Enterprise DB, Jenkins, Git, Oozie, MySQL, Soap, AWS, NIFI

Role: Java/Hadoop Developer

Client: India Mart, Bangalore, India Jun 2016 to Aug 2019

Responsibilities:

Spearheading Big Data Project from end-to-end.

Hands-on experience on working with AWS services like Lambda function, Athena, S3 etc.

Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Flume, Oozie Zookeeper and Sqoop.

Involved in writing Client side Scripts using Java Scripts and Server Side scripts using Java Beans and used servlets for handling the business.

Developed applications in Hadoop BigData technologies- Pig, Hive, Map-Reduce, Hbase and Oozie.

Created Elastic Map Reduce (EMR) clusters and Configured the Data pipeline with EMR clusters for scheduling the task runner.

Developed Scala programs with Spark for data in Hadoop ecosystem.

Extensively involved in Installation and configuration of Cloudera distribution Hadoop 2, 3, NameNode, Secondary NameNode, JobTracker, TaskTrackers and DataNodes.

Managed and reviewed Hadoop Logfiles as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.

Provisioning of Ec2 Instances on both Windows and Linux and worked on AWS Relational Database Services, AWS Security Groups and their rules.

Implemented Reporting, Notification services using AWS API.

Developed MapReduce jobs using apache commons components.

Used Service Oriented Architecture (SOA) based SOAP and REST Web Services (JAX-RS) for integration with other systems.

Collected and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.

Involved in designing and developing the application using JSTL, JSP, Java script, AJAX, HTML, CSS and collection.

Implemented AWS EC2, Key Pairs, Security Groups, Auto Scaling, ELB, SQS, and SNS using AWS API and exposed as the Restful Web services.

Created HBasetables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.

Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and translate to MapReduce jobs.

Developed UDFs in Java as and when necessary to use in PIG and HIVE queries

Developed Java Web Applications using JSP and Servlets, Struts, Hibernate, spring, RestWebServices, SOAP.

Involved in gathering requirements and developing a project plan. Involved in understanding requirements, functional specifications, designing documentations and testing strategies.

Involved in UI designing, Coding, Database Handling. Involved in UnitTesting and BugFixing. Worked over the entire Software Development Life Cycle (SDLC) as a part of a team as well as independently.

Written SQLqueries to query the database and providing data extracts to users as per request.

Environment: Hadoop, AWS, Glue, Map Reduce, Cloud, HDFS, Hive, Pig, Cassandra, Sqoop, Oozie, SQL, Kafka, Apache Hadoop, Spark, Scala, HDFS.

Role: Software Developer

Client: First Sources, Mumbai, India Oct 2014 to May 2016

Responsibilities:

Experience in importing and exporting data from different RDBMS like MySQL, Oracle and SQL Server into HDFS and Hive using Sqoop.

Develop Web tier using Spring MVC Framework.

Perform database operations on the consumer portal using Spring Jdbc template.

Implemented design patterns in Scala for the application.

Setting up infrastructure Implementing Configuring Externalizing HTTPD mod jk mod rewrite. Mod proxy JNDI SSL etc.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Java.

Developing traits and case classes etc. in Scala.

Develop quality code adhering to Java coding Standards and best practices.

Writing complex SQL queries.

Develop GUI using JQuery, Json and Java script.

Followed agile methodology (Stand up meetings, retrospective meetings, sprint development and Pair programming).

Developed application code using Eclipse IDE and configured with Maven, Glassfish server and JUnit.

Developed Use Case Diagrams, Sequence Diagrams and Class Diagrams using Rational Rose.

Developed the controller servlet to handle the requests and responses.

Developed JSP pages with MVC architecture using Spring MVC, Servlets and Simple tags.

Configured Maven dependencies for application building processes.

Used Spring Dependency Injection to set up dependencies between the objects.

Optimized the source code and queries to improve performance using Hibernate.

Assisted other team members with various technical issues including JavaScript, CSS, JSP and Server related issues.

Contact this candidate