Sachin N ****@************.***
Sr. Hadoop Developer 972-***-****
PROFESSIONAL SUMMARY
Versatile, dynamic and a technically-competent problem solver with over 7 years of experience in Hadoop & Big Data (4 years) and Java, J2EE (3+years) technologies.
Expertise in Design and Implementation of Big Data solutions in Retail, Finance and E-commerce domains.
Hands-on experience in Installation, Configuration, Support and Management of Cloudera’s Hadoop platform along with CDH3&4 clusters.
Sound knowledge of Hadoop Architecture, Administration, HDFS-Federation & High Availability and Streaming API along with Data ware housing Concepts.
Experienced in understanding complex Big Data processing needs and developing MapReduce jobs (in Java), Scala codes and modules to address those needs.
Experience with handling Data accuracy, Scalability and Integrity of Hadoop platforms.
Experience with complex data processing pipelines, including ETL and Data Ingestion dealing with unstructured and semi-structured data.
Knowledgeable of Apache Spark, Scala mainly in framework exploration for transition from Hadoop/MapReduce to Spark.
Knowledge on Designing and Implementing ETL process to load data from various data sources to HDFS using Flume and Sqoop, performing transformation logic using Hive, Pig and Integration with BI tools for Visualization/Reporting.
Solid understanding of NoSQL databases like Mongo DB, HBase and Cassandra.
Expertise in performing Large-scale Web crawling with Apache Nutch using a Hadoop/HBase cluster.
Knowledge in Job Workflow Scheduling and Monitoring tools like Oozie and Zookeeper.
Experience with working on the AWS cloud environment.
Excellent Java development skills using J2EE, spring, J2SE, Servlets, JUnit, MRUnit, JSP, JDBC.
Experienced in working with various frameworks like Struts, spring, Hibernate, EJB and JSF.
Professional knowledge of UNIX, Shell and PERL Scripting.
Knowledge of Data Warehousing and ETL Tools like Informatica and Pentaho.
Hands on knowledge of writing code in Scala.
Experience in Object Oriented Analysis and Design (OOAD) and development of software using UML Methodology.
Experienced in Agile Scrum, RUP and TDD software development methodologies.
Possess strong commitment to team environment dynamics with the ability to lead, contribute expertise and follow leadership directives at appropriate times.
Effectively used Oozie to develop automatic workflows for Sqoop, Map Reduce and Hive jobs.
TECHNICAL SKILLS:
Big Data Ecosystem: HDFS, HBase, MapReduce, Hive, Pig, Sqoop, Spark, Splunk, Impala, Kafka, Talend, Oozie, Zookeeper, Flume, Storm, AWS, EC2, EMR.
Programming Languages: Java, Scala, Python, C/C++, PL/SQL.
Scripting Languages: PHP, JQuery, JavaScript, XML, HTML, Bash, Ajax and CSS.
UNIX Tools: Apache, YUM, RPM.
J2EE Technologies: Servlets, JSP, JDBC, EJB, & JMS.
Databases: NoSQL- MongoDB & Cassandra, Oracle.
Data Integration Tools: Informatica, Pentaho.
Methodologies: Agile, Scrum, SDLC, UML, Design Patterns.
IDEs: Eclipse, NetBeans, WSAD, RAD.
Platforms: Windows, Linux, Solaris, AIX, HPUX, Centos.
Application Servers: Apache Tomcat, Web Logic, WebSphere, JBoss 4.0
Frameworks: Spring, MVC, Hibernate, Struts, Log4J, Junit, WebServices
PROFESSIONAL EXPERIENCE
Client: Panasonic Automotive, Peachtree City, GA Aug’15 – Till Date
Role: Hadoop Developer
Description: The main purpose of this project is archiving, querying and analyzing the data in Hadoop for making real-time decisions and customizing the promotional content.
Responsibilities:
Working extensively in creating MapReduce jobs for search and analytics in the identification of various trends across the data for Infotainment product line.
Working on Data analytics using Pig and Hive. Hive made it easier to extract information out of very old data.
Designing the adaptive ecosystem to ensure that the archived data was accessible using third party BI tools.
Using Oozie for workflow orchestration in the automation of MapReduce, Pig and Hive jobs.
Installing and configuring the Hadoop cluster and developing multiple MapReduce jobs in java for data cleaning and pre-processing.
Analyzing the information from the automobile-bounded unit Dedicated Short Range Communication, using Pig and Hive, makes it easier to monitor vehicles and the road status.
Responsible for optimizing data across network using Combiners, joining multiple schema datasets using Joins and organizing data using Partitions and Buckets.
Writing jobs in Scala for the company’s parallel data processing center located in the vicinity.
Moving large datasets hourly with AVRO file format and imposing Hive and Impala queries.
Working on importing data into HBase using HBase Shell and HBase Client API.
Capturing archived data from existing relational database into HDFS using Sqoop.
Installing and configuring remote Hive Metastore for both - development and production jobs as required.
Coordinating the cluster services using Zookeeper.
Improving system performance by working with the development team to analyze, identify and resolve issues quickly.
Storing the geographically pre-distributed datasets in Cassandra.
Capturing the data logs from web server into HDFS using Flume for analysis.
Writing Pig scripts and implementing business logic using Pig UDFs to pre-process the data for analysis.
Managing and reviewing Hadoop log files, thereby keeping track of nodes’ health.
Environment: CDH4 with Hadoop 2.x, HDFS, MapReduce, Pig, Hive, Oozie, Sqoop, Scala, Zookeeper, HBase, Cassandra, Flume, Servlets, JSPs, JSTL, HTML, JavaScript.
Client: Apple, Cupertino, CA Feb’14 - Jul’15
Role: Hadoop Consultant
Description: The goal of the project was to analyze and filter the data coming in from disparate sources to make important business decisions. The iAds team used this data to fine-tune the promotional content for the users.
Responsibilities:
Used Cloudera Manager for Hadoop cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster monitoring, and trouble shooting.
Developed efficient MapReduce programs for data cleaning and structuring using Java and Python.
Supported the team in Code/Design analysis, Strategy development and Project planning.
Modeled and made the data query-able using a unified query service.
Developed Hive queries for data sampling and pre-analysis before submitting to the analysts.
Implemented Kafka Storm topologies, which are capable of handling and channelizing high stream of data and integrating the storm topologies with Esper to filter and process that data across multiple clusters for complex event processing.
Registered, ingested, validated, stored and archived the data in its native form.
Used Oozie to automate and schedule business workflows invoking Sqoop, MapReduce and Pig jobs as per the requirements.
Used Cassandra to store majority of the data which needs to be divided regionally.
Worked on Splunk to leverage the archived and specialized analytics of Hadoop.
Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Cleansed, enriched, transformed, and analyzed the data through hosted compute engines.
Used Apache Spark for Performance optimization and Parallel data processing.
Developed Sqoop scripts to import and export data from relational sources and handled incremental loading on the customer and transaction data by date.
Implemented business logic by writing Pig Latin UDFs in Java and used various UDFs from Piggybanks and other sources. Developed Pig UDFs for pre-processing as well.
Created HBase tables to load large, disparate datasets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Responsible for automation to add data nodes as needed
Worked with various HDFS file-formats like Avro, Sequence File and various compression formats like Snappy, bzip2.
Identified several PL/SQL batch applications in General Ledger processing and conducted performance comparison to demonstrate the benefits of migrating to Hadoop.
Environment: Hadoop 2.0, MapReduce, HDFS, Hive, Java, Cloudera, Pig, HBase, Kafka Storm, Splunk, MySQL Workbench, Eclipse, Oracle 10g, PL/SQL, SQL*PLUS.
Client: State Farm Insurance, Bloomington, IL Oct’12 - Jan’14
Role: Hadoop/Big data Developer
Description: State Farm provides Insurance and Financial services to their customers in the United States. The main purpose of this project was to store and analyze the data in Hadoop for making real-time decisions and improving the promotional content for the customers.
Responsibilities:
Designed, developed and supported a MapReduce-based data processing pipeline to process growing number of events from log files and messages per day.
Worked closely with client development staff to perform ad-hoc queries and data analysis on newly created cross-platform datasets using Apache Hive and Pig.
Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS
Used Hive- Partitioning and bucketing, to segregate the data and analyze it.
Incorporated various job flow mechanisms in Oozie to automate the workflow for extraction of data from warehouses and weblogs.
Implemented open-source monitoring tool GANGLIA for monitoring the various services across the cluster.
Collaborated with the administration team to set up a monitoring infrastructure for supporting and optimizing the Hadoop infrastructure.
Responsible for writing complex SQL-queries involving multiple inner and outer joins.
Developed and supported a Scala-based data processing pipeline for one of the processing centers located in Sacramento.
Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and purchase histories into HDFS for analysis.
Worked with the applications team to install the operating systems, Hadoop updates, patches and version upgrades as required.
Personally dealt with the stakeholders in order to closely understand the business needs.
Computed various metrics and loaded the aggregated data onto DB2 for reporting on the dashboard.
Designed the shell script for backing up of important metadata and rotating the logs on a monthly basis.
Greatly sharpened my business acumen with knowledge on the health insurance, claim processing, fraud suspect identification, appeals process and other domains.
Environment: CDH4 with Hadoop 1.x, HDFS, MapReduce, Pig, Hive, Oozie, Sqoop, Flume, Servlets, JSPs, JSTL, HTML, JavaScript, JQuery, CSS.
Client: InfoTech Enterprises, Hyderabad, India Jul’10 - Aug’12
Role: Sr. Java/J2EE developer
Description: InfoTech Enterprises provides leading-edge engineering solutions, including product development and life-cycle support, process, network design and optimization, and data management solutions to major organizations worldwide.
Responsibilities:
Designed and developed Struts-like MVC 2 Web framework using the front-controller design pattern, which is used successfully in a number of production systems.
Spearheaded the “Quick Wins” project by working very closely with the business and end users to improve the current website’s ranking from being 23rd to 6th in just 3 months.
Normalized Oracle database, conforming to design concepts and best practices.
Resolved product complications at customer sites and funneled the insights to the development and deployment teams to adopt long term product development strategy with minimal roadblocks.
Built front end UI using JSP, Servlets, HTML and JavaScript to create user friendly and appealing interface.
Used JSTL and built custom tags whenever necessary.
Used Expression Language to tie beans to UI components.
Convinced business users and analysts with alternative solutions that are more robust and simpler to implement from technical perspective while satisfying the functional requirements from the business perspective.
Applied design patterns and OO design concepts to improve the existing Java/JEE based code base.
Identified and fixed transactional issues due to incorrect exception handling and concurrency issues due to unsynchronized block of code.
Environment: Java 1.2/1.3, Swing, Applet, Servlets, JSP, custom tags, JNDI, JDBC, XML, XSL, DTD, HTML, CSS, Java Script, Oracle, DB2, PL/SQL, WebLogic, JUnit, Log4J and CVS.
Client: CMC, Chennai, India Dec’08 - Jun’10
Role: Java/J2EE developer
Description: The Application allows online ordering through web which helps in saving time for customers. Various operations relating to their orders such as expediting, canceling, splitting the order lines can be done through the web itself. Upon request the status of order can be displayed.
Responsibilities:
Developed user interface using JSP, HTML, CSS and JavaScript.
Responsible for gathering and analyzing the requirements for the project.
Implemented the various unified modeling language diagrams like use case diagram, ER diagram for the project.
Used Dependency injection in spring for Service layer and DAO layer.
J2EE Architecture was implemented using Struts based on the MVC2 pattern.
Wrote Servlets and deployed them on WebSphere Application server.
Created the user validations on client side as well as server side.
Developed the Java classes to be used in JSP and Servlets.
Extensively used JavaScript for client side validations.
Improved the coding standards, code reuse and participated in code-reviews.
Worked with PL/SQL scripts to gather data and perform data manipulations.
Used JDBC for Database transactions.
Involved in unit testing of the application.
Developed stored procedures in Oracle.
Used Test Driven Development approach, and wrote many unit and integration tests
Involved in analyzing how the requirements related to and depended on each other.
Onsite coordination for developing various modules.
Environment: Java 1.4, JSP 2.0, Servlets 2.4, JDBC, HTML, CSS, JavaScript, WebSphere 3.5.6, Eclipse, Oracle 9i.