Data Developer

Location:

Nashville, TN

Posted:

August 05, 2018

Contact this candidate

Resume:

Madan Govindu

Spark/ Hadoop Developer

+1-214-***-**** **************@*****.***

PROFESSIONAL SUMMARY:

•Spark/ Hadoop developer having 3+ years of experience in IT with 2+ years on Hadoop and have strong experience working with programming languages: Scala, Java.

•Experience with Big Data/ Hadoop Ecosystem: Spark, Hive, Sqoop, Kafka, Oozie, HBase, MapReduce, NIFI, Tableau.

•In-depth understanding of Spark Architecture and performed several batch and real-time data stream operations using Spark (Core, SQL, Streaming).

•Experienced in handling large datasets using Spark in-memory capabilities, Partitions, Broadcast variables, Accumulators, Effective & Efficient Joins. Used Scala to develop Spark applications.

•Tested and Optimized Spark applications.

•Performed Hive operations on large datasets with proficiency in writing HiveQL queries using transactional and performance efficient concepts: UPSERTS, Partitioning, Bucketing, Windowing,etc.

•Wrote custom UDFs, UDAFs, UDTFs, and generated optimized execution plans for faster performance.

•Imported data from relational databases to HDFS/Hive, performed operations and exported the results back using Sqoop.

•Wrote custom Kafka Consumer programs in Java and implemented a pipeline: Kafka, Spark, HDFS/S3.

•Implemented NIFI data workflow in production and performed streaming and batch processing via micro-batches coming from multiple data sources. Controlled and monitored using Web UI.

•Scheduled jobs and automated workflows using Oozie.

•Experienced working on cloud AWS using EMR. Performed operations on AWS using EC2 instances, S3 buckets, performed RDS, Lambda, analytical Redshift operations.

•Used HBase to work with large sets of structured, semi-structured and unstructured data coming from a variety of sources.

•Used Tableau to generate reports and created visualization dashboards.

•Experienced working with different file formats like Parquet, Avro, CSV, JSON, Text files.

•Worked with Big Data Hadoop distributions: AWS EMR, Cloudera.

•Developed MapReduce jobs using Java to process data sets by fitting the problem into the MapReduce programming paradigm.

•Followed Agile-Scrum model and used DevOps tools like GitLab, JIRA, Confluence, Jenkins.

AREAS OF EXPERTISE:

Hadoop/ Big data

Spark, Hive, Sqoop, Kafka, YARN, NIFI, HBase, Oozie, MapReduce, Zookeeper

Programming

Scala, Java, SQL

Hadoop Distributions

Cloudera, Amazon EMR

Databases/Datawarehouses

Oracle, MySQL, HBase

Amazon Web Services

EMR, EC2, S3, Lambda, RDS, Redshift, IAM

Other tools & SDLC

Tableau, IntelliJ IDEA, Eclipse, SBT, Maven, Putty, JIRA, Confluence, Agile – Scrum

PROFESSIONAL EXPERIENCE:

Client: Change Healthcare, - Nashville, TN April 2018 – Current

Employer: Vintech Solutions Inc., - St. Louis, MO

Role: Spark/ Hadoop Engineer

Description: Change Healthcare is a healthcare technology company that offers software, analytics, network solutions, and technology-enabled services to help create a stronger, more collaborative healthcare system. Working on “Patient Identity” project which should be able to recognize identification points from sending systems and use them to determine if multiple patient records all roll up to the same person.

Responsibilities:

•Developed Spark applications and modified existing ones using Scala to meet business need to process large datasets through Data frames, RDDs and performed several transformations and actions on top.

•Made changes to the existing identity scoring algorithm and JSON configuration files to suit our need.

•Developed Spark SQL applications to perform complex data operations on structured and semi-structured data stored as Parquet, JSON, XML files in S3 buckets.

•Developed Scala scripts, UDFs using Data frames/Datasets in Spark for aggregation, queries and finding similarity on different types of datasets.

•Tuned Spark applications performance by setting correct level of Parallelism and memory tuning and using efficient concepts.

•Implemented schema extraction for Parquet and Avro file Formats in creating Hive tables.

•Used Sqoop to transfer data from EMR to MySQL (S3 -> Sqoop -> Hive (EMR staging)-> Sqoop -> MySQL).

•Performed Unit, Integration testing by mocking data.

•Experienced working with different file formats like Parquet, Avro, JSON, XML and compression tools like Snappy for efficient storage, retrieval, and processing of files.

•Involved in POC in developing a pipeline using Kafka to subscribe messages from necessary topics as client made changes on UI through apache tomcat.

•Performed UPSERTS to the data in the data lake (Linking/ Unlinking patient records).

•Created Entity-Relationship diagrams for the relational database.

•Experienced working on cloud AWS using EMR. Performed operations on AWS using EC2 instances, S3 storage, performed RDS, Lambda, analytical Redshift operations.

•Involved in client meetings, understanding business needs, gathering and analyzing functional requirements, tool selection discussions, attending on/off-shore meetings.

•Experienced with Agile Scrum methodology, GitLab, IntelliJ IDEA, Confluence, JIRA, Jenkins for the project.

Environment: Spark 2.2.0, Scala 2.11.8, Sqoop, Kafka, AWS (EMR, S3, RDS, Lambda, Redshift), IntelliJ IDEA, GitLab, Confluence, JIRA, Jenkins, Agile(Scrum).

Client: People Health Services Pvt. Ltd., - Dallas, TX. May 2017 – Mar 2018

Role: Spark/ Hadoop Developer

Description: People Health provides onsite health checks for employers such as Yahoo, Capitol One, Walmart and Google. Aim is to provide members with health-promoting wellness plans, access to preferred centers with assured quality of service. It is inspired by professionally managed global health delivery systems that provide easy to access quality healthcare for the entire community.

Responsibilities:

•Developed Spark applications using Scala.

•Used Data frames/ Datasets to write SQL type queries using Spark SQL to work with datasets.

•Performed real-time streaming jobs using Spark Streaming to analyze data on a regular window time interval for the data coming from Kafka.

•Created data pipeline: Kafka-> Spark -> HDFS along with the team.

•Collaborated with Architects to design Spark model for the existing MapReduce model and migrated them to Spark modules using Scala.

•Tested and Optimized Spark applications.

•Created Hive tables and had extensive experience with HiveQL.

•Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.

•Extended Hive functionality by writing custom UDFs, UDAFs, UDTFs to process large data.

•Performed Hive UPSERTS, partitioning, bucketing, windowing operations, efficient queries for faster data operations.

•Imported and exported data between relational database systems and HDFS/Hive using Sqoop.

•Wrote custom Kafka consumer code and modified existing producer code in Java to push data to Spark-streaming jobs.

•Scheduled jobs and automated workflows using Oozie.

•Automated the movement of data using NIFI dataflow framework and performed streaming and batch processing via micro batches. Controlled and monitored data flow using web UI.

•Worked with HBase database to perform operations with large sets of structured, semi-structured and unstructured data coming from different data sources.

•Exported analytical results to MS SQL Server and used Tableau to generate reports and visualization dashboards.

Environment: Cloudera, Spark 2.0, Hive, Hadoop, Java, Scala, Kafka, Sqoop, MapReduce, Oozie, Zookeeper, Tableau, Agile, Eclipse.

Client: People Health Services Pvt. Ltd., - Bengaluru, IN. Jan 2014 – Dec 2015

Role: Hadoop/Java Developer

Responsibilities:

•Created Hive tables, loaded data, executed HQL queries and developed MapReduce programs to perform analytical operations on data and to generate reports.

•Created Hive internal and external tables, used MySQL to store table schemas. Wrote custom UDFs in Java.

•Moved data between MySQL and HDFS using Sqoop.

•Developed MapReduce jobs in Java for log analysis, analytics, and data cleaning.

•Wrote complex MapReduce programs to perform operations by extracting, transforming, and aggregating to process terabytes of data.

•Designed E-R diagrams to work with different tables.

•Wrote many SQL, Procedures, PL/SQL, Triggers and Views on top of Oracle.

•Developed the application using Core Java, Multi-Threading, Collections, JMS, JSP, Servlet, Maven.

•Developed Java Multi-threading based archival job using executor service for Thread pooling, Callable job and Future task.

•Redesigned and improved Tracking functionality using java Multi-Threading using Servlet, concurrent queue and thread.

•Developed Junit and mocking based test code to test various modules.

•Developed RESTful web service to fetch DB data to be used from UI.

•Deployed the application on Apache Tomcat. Strong skills in OOP and design patterns.

•Involved in the implementation of the Software development life cycle (SDLC) that includes Development, Testing, Implementation, and Maintenance Support.

Environment: Java, Hive, Sqoop, MySQL, Multi-threading, JDK, JSP, JMS, Servlet, HTML, CSS, Eclipse, Tomcat, REST.

EDUCATION:

Master of Science in Applied Computer Science Dec 2017

Northwest Missouri State University, Maryville, MO GPA: 3.45

Bachelor of Technology in Compoutsre Science & Engineering Mar 2014

Jawaharlal Nehru Technological University, Anantapur, India

GPA: 3.10

Contact this candidate