Sumit Gupta
************@*******.***
PROFESSIONAL SUMMARY:
Hadoop Developer with 7 years of professional IT experience and FOUR plus years of experience in Big Data Ecosystem, related technologies and advanced analytics that includes extensive and work experience with HDFS, MapReduce, Oozie, Hive, Sqoop, Pig, Kafka, Storm, Spark, Hortonworks, Flume, HBase, Zookeeper and Cassandra.
Hands on experience in configuring and using ecosystem components like HDFS, MapReduce, Oozie, Hive, Sqoop, Pig, Hortonworks, Flume, Hbase, Zookeeper, Kafka, Storm, Spark, Ganglia and Cassandra.
Experience in AWS cloud environment.
Excellent IT experience in requirement gathering, design, development, testing, implementation and maintenance. Progressive experience in all phases of the iterative Software Development Life Cycle (SDLC).
In - depth knowledge of Statistics, Machine Learning and Data mining.
Experienced supervised learning techniques like Multi-Linear Regression, Nonlinear Regression, Logistic Regression, Artificial Neural Networks, Support Vector Machine, Decision Tree, Random Forest. Experienced with main unsupervised learning techniques.
Experience in implementing in setting up standards and processes for Hadoop based application design and implementation.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Experience in Object Oriented Analysis and Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE Design Patterns and Core Java Design Patterns.
Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
Hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Kafka, Storm, Spark, Pig, Oozie and Flume.
Good Exposure on Apache Hadoop Map Reduce programming, PIG Scripting and Distribute Application and HDFS.
Good Knowledge and understanding on MongoDB.
Experience in Designing, developing and implementing connectivity products that allow efficient exchange of data between our core database engine and Hadoop ecosystem.
Worked on NoSQL databases including HBase.
Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
Diverse experience utilizing Java tools in business, Web, and client-server environments including Java Platform, J2EE, EJB, JSP, Java Servlets, Struts, and Java Database Connectivity (JDBC) technologies.
Solid background in Object-Oriented analysis (OOAD) and design. Very good at various Design Patterns, UML and Enterprise Application Integration (EAI).
Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with Excellent Interpersonal, Technical and Communication Skills.
TECHNICAL SKILLS:
Big Data/Hadoop:: HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Hue, Oozie, Storm, Kafka, Zookeeper, Elastic Search, Redis, Flume.
Java Technologies:: Core Java, I8N, JFC, Swing, Beans.
Methodologies:: Agile, UML, Design Patterns (Core Java and J2EE).
Programming Languages:: C++, Java, Linux shell scripts.
Database:: Oracle 11g, MySQL, MS-SQL Server, MongoDB, Cassandra, Teradata.
Web Servers:: WebLogic, WebSphere, Apache Tomcat.
Web Technologies:: HTML, XML, JavaScript, AJAX, SOAP, WSDL.
Microsoft Office Tools:: Word, Excel, PowerPoint, Access, Project.
Operating Systems:: Microsoft Windows Family, Solaris (9x, 10x), Red-Hat Linux.
PROFESSIONAL EXPERIENCE:
Sr. Hadoop Developer/ Administrator Feb-2017-Till date
UnitedHealth Group,Brentwood, TN
Responsibilities:
Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology.
Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
Built data pipeline using Pig and Java MapReduce to store onto HDFS.
Loaded the customer profiles data, customer spending data, credit from legacy warehouses onto HDFS using Sqoop.
Handled importing of data from various data sources, performed transformations using Hive and MapReduce, loaded data into HDFS. And, extracted the data from MySQL into HDFS using Sqoop.
Used Pattern matching algorithms to recognize the customer across different sources and built risk profiles for each customer using Hive and stored the results in HBase.
Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
Experience in using Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
Developed Simple to complex Map and Reduce Jobs using Hive and Pig.
Worked on Apache Spark for in memory data processing into Hadoop.
Responsible for building scalable distributed data solutions using MongoDB and Cassandra.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Provided support to data analysts in running Pig and Hive queries.
Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
Applied transformations and filtered traffic using Pig.
Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
Responsible for building scalable distributed data solutions on a cluster using Cloudera Distribution.
Installed Oozie workflow engine to run multiple Hive and Pig jobs.
Setup and benchmarked Hadoop and HBase clusters for internal use.
Installed Storm and Kafka on a four node cluster.
Hands on experience working on Amazon SQS.
Responsible for building scalable distributed data solutions using Hadoop.
Performed unit testing using MRUnit.
Involved in HiveQL and Pig Latin, Importing and exporting Data from MySQL/Oracle to HiveQL, Importing and exporting Data from MySQL/Oracle to HDFS using SQOOP.
Environment: Hadoop, Hive, Zookeeper, Map Reduce, Sqoop, Pig 0.10 and 0.11, JDK1.6, HBase, Hue, Oozie, Spark, Storm, Kafka, Redis, Flume, Junit and Oracle/Informix, Mongo DB, Cassandra, AWS (Amazon Web Services), HDFS, DB2, and Cloudera.
Cerner, Kansas City, MissouriFeb Mar 2016-Feb2017
Hadoop Administrator/Developer
Responsibilities:
Experienced with loading data from UNIX file system and Teradata onto HDFS.
Experienced on loading and transforming of large sets of structured, semi structured and unstructured data from HBase through Sqoop and place in HDFS for further processing.
Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
Involved in creating Hive tables, loading data and running hive queries on the data.
Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
Worked with NoSQL database, HBase to create tables and store data.
Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Scoop.
Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
Developed Pig scripts for the analysis of semi structured data.
Developed evaluations in writing Pig's Load and Store functions.
Developed Java MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased products on the website.
Developed and involved in the industry specific UDF (User Defined Functions).
Involved in developing optimized Pig Script and testing Pig Latin Scripts.
Experience working with Apache SOLR for indexing and querying.
Written JUnit test cases for Storm Topology.
Configured the Kafka Mirror Maker cross-cluster replication service.
Monitored multiple Hadoop clusters environments using Ganglia.
Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as MapReduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
Monitored workload, job performance and capacity planning using Cloudera Manager.
Managing and scheduling Jobs on a Hadoop cluster using Oozie.
Environment: Amazon EC2, Apache Hadoop 1.0.1, MapReduce, HDFS, CentOS 6.3, HBase, Kafka, Elastic Search, Hive, Pig, Oozie, Flume, Java (JDK 1.6), Cloudera, Eclipse, Sqoop, AWS (Amazon Web Services), MongoDB, Cassandra and Ganglia.
Agilent Technologies - Santa Clara, CA Jan 2015-Mar 2016
Hadoop Developer
Responsibilities:
Responsible for coding Java Batch, RESTful Service, MapReduce Programs, Testing, Debugging, Peer Code Review, Troubleshooting And Maintain Status Report.
Requirements Study, Software Development Specification, Development and Unit Testing use of MRUnit and JUnit.
Developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
Designed and developed Oozie workflows for automating jobs.
Created HBase tables to store variable data formats of data coming from different portfolios.
Writing Hadoop MR Programs to get the logs and feed into Cassandra for Analytics purpose.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Implemented best income logic using Pig scripts.
Moving data from Oracle to HDFS and vice-versa using SQOOP.
Developed Pig scripts to convert the data from Avro to Text file format.
Developed Hive scripts for implementing control tables logic in HDFS.
Developed Hive queries and UDF's to analyze/transform the data in HDFS.
Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
Worked with different file formats and compression techniques to determine standards.
Installed and configured Hive and also written Hive UDFs.
Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis.
Designed and developed read lock capability in HDFS.
Involved in End-to-End implementation of ETL logic.
Involved in effective coordination with offshore team and managed project deliverable on time.
Worked on QA support activities, test data creation and Unit testing activities.
Environment: Java, Java Batch, RESTful Service, JAXB, Hadoop MapReduce, Pig, HBase, Sqoop, Oozie, Flume, Junit and Oracle.
Med Plus HealthCare Inc – Hyderabad, India Oct 2014- Dec 2015
Java Developer
Responsibilities:
Responsible for requirement gathering and analysis through interaction with end users.
Involved in designing use-case diagrams, class diagram, interaction using UML model.
Designed and developed the application using various design patterns, such as session facade, business delegate and service locator.
Worked on Maven build tool.
Involved in developing JSP pages using Struts custom tags, JQuery and Tiles Framework.
Used JavaScript to perform client side validations and Struts-Validator Framework for server-side validation.
Good experience in Mule development.
Developed Web applications with Rich Internet applications using Java applets, Silverlight, JavaFX.
Involved in creating Database SQL and PL/SQL queries and stored Procedures.
Implemented Singleton classes for property loading and static data from DB.
Debugged and developed applications using Rational Application Developer (RAD).
Developed a Web service to communicate with the database using SOAP.
Developed DAO (Data Access Objects) using Spring Framework 3.
Deployed the components in to WebSphere Application server 7.
Actively involved in backend tuning SQL queries/DB script.
Worked in writing commands using UNIX Shell scripting.
Involved in developing other subsystems and server-side components.
Production supporting using IBM clear quest for fixing bugs.
Environment: Java EE 6, IBM WebSphere Application Server 7, Apache-Struts 2.0, EJB 3, Spring 3.2, JSP 2.0, WebServices, JQuery 1.7, Servlet 3.0, Struts-Validator, Struts-Tiles, Tag Libraries, ANT 1.5, JDBC, Oracle 11g/SQL, JUNIT 3.8, CVS 1.2, Rational Clear Case, Eclipse 4.2, JSTL, DHTML.
CUBE IT INNOVATIONS, Hyderabad,INDIA July 2013- Sep 2014
Java Developer
Responsibilities:
Coded end to end (i.e. from GUI on Client side to Middleware to database and Connecting the back end Systems) on a subset of sub modules belonging to the above modules.
Worked extensively on Swing.
Most of the business logic is provided in Session Beans and the database transactions are performed using Container Managed Entity Beans.
Worked on Parsing of XML Using DOM and SAX.
Implemented EJB Transactions.
Used JMS for messaging with IBM MQ-Series. Written stored procedures.
Developed the Presentation layer, which was built using Servlets and JSP and MVC architecture on Web sphere Studio Application Developer (WSAD).
Mentoring other programmers. Studied the implementation of Struts.
Implemented the Security Access Control both on client and Server side. Applet signing including JAR signing.
Environment: Java, Java Swing JSP, Servlets, JDBC, Applets, Servlets, JCE 1.2, RMI, EJB, XML/XSL, Visual Age Java (VAJ).