Anji M Email Id: ****.********@*****.***
Hadoop Developer Mobile# 415-***-****
Professional Summary:
Proactive IT developer with 8 years of working experience on development and design of various scalable systems using Hadoop Technologies on various environments.
Experience in installation, configuration, supporting and managing Hadoop Clusters using Horton works, and Cloudera (CDH3, CDH4) distributions on Amazon web services (AWS).
Extraordinary Understanding of Hadoop building and Hands on involvement with Hadoop segments such as Job Tracker, Task Tracker, Name Node, Data Node and HDFS Framework.
Extensive experience in analyzing data using Hadoop Ecosystems including HDFS, Hive, PIG, Sqoop, Flume, MapReduce, Spark, Kafka, HBase, Oozie, Solr and Zookeeper.
Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
Extensive knowledge on NoSQL databases like HBase, Cassandra, Mongo DB.
Configured Zoo keeper, Cassandra and Flume to the existing Hadoop cluster.
Have an experience in importing and exporting data using Sqoop from Hadoop Distributed File Systems to Relational Database Systems and also Relational Database Systems to Hadoop Distributed File Systems.
Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into PigLatin and HQL (HiveQL).
Hands on Experience in troubleshooting errors in HBase Shell, Pig, Hive and MapReduce.
Hands-on experience in provisioning and managing multi-tenant Cassandra cluster on public cloud environment - Amazon Web Services (AWS) - EC2, Open Stack.
Experience in NoSQL Column-Oriented Databases like HBase, Cassandra and its Integration with Hadoop cluster.
Implemented Cluster for NoSQL tools Cassandra, MongoDB as a part of POC to address HBase limitations
Planned and created answer for constant information ingestion utilizing Kafka, Storm, Spark spilling and different NoSQL databases.
Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
Solid experience on Oozie for scheduling and Zookeeper for coordinating cluster resources.
Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
Experience in working with various Cloudera distributions (CDH4/CDH5), Hortonworks and Amazon EMR Hadoop Distributions.
Knowledge in developing a Nifi flow prototype for data ingestion in HDFS.
Developed automated scripts using Unix Shell for performing RUNSTATS, REORG, REBIND, COPY, LOAD, BACKUP, IMPORT, EXPORT and other related to database activities.
Experience in analyzing, designing and developing ETL strategies and processes, writing ETL specifications, Informatics development.
Extensive experience working in Oracle, DB2, SQL Server, PL/SQL and My SQL database and Java Core concepts like OOPS, Multithreading, Collections and IO.
Good working knowledge on Object Oriented Programming.
Experienced in designing Web Applications using HTML5, CSS3, JavaScript, Json, JQuery, AngularJS, Bootstrap and Ajax under Windows operating system.
Experience in Service Oriented Architecture using Web Services like SOAP & Restful.
Learning on administration situated design (SOA), work processes and web administrations utilizing XML, SOAP, and WSDL
Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSP, JSF, Struts, Spring, Hibernate, JDBC, EJB.
Good experience in working with Tableau Visualization tool using Tableau Desktop, Tableau Server and Tableau Reader.
Have good interpersonal, communicational skills, strong problem solving skills, explore to new technologies with ease and a good team member.
Technical Skills:
Big Data Eco systems
HDFS, MapReduce, Hive, YARN, Pig, Sqoop, Kafka, Storm, Flume, Oozie, and ZooKeeper, Apache Spark, Apache Tez, Impala, Nifi, Apache Solr, Active MQ
No SQL Databases
Hbase, Cassandra, mongoDB
Programming Languages
C, C++, Java, J2EE, PL/SQL, Pig Latin, Scala, Python
Java/J2EE Technologies
Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery,AngularJS
Frameworks
MVC, Struts, Spring, Hibernate
Operating Systems
Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8
Web Technologies
HTML, DHTML, XML, AJAX, WSDL, SOAP
Web/Application servers
Apache Tomcat, WebLogic, JBoss
Version control
SVN, CVS
Network Protocols
TCP/IP, UDP, HTTP, DNS, DHCP
Business Intelligence Tools
Tableau, QlikView, Pentaho, IBM Cognos intelligence
Databases
Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata
Tools and IDE
Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer,
Cloud Technologies
Amazon WebServices(AWS), CDH3, CDH4, CDH5, HortonWorks, Mahout, Microsoft Azure Insight, Amazon RedShift
Professional Experience:
Client: Williams-Sonoma, Inc, San Francisco, CA Nov ’15 – Till Date
Role: Hadoop Developer
Williams-Sonoma, Inc., is an American public traded consumer retail company and it is a multi-channel specialty retailer of high quality products for the home, and in this project I have managed the large data regarding the customer details and products information by using Hadoop eco systems and Big Data Components.
Responsibilities
Involved in managing nodes on Hadoop cluster and monitor Hadoop cluster job performance using Cloudera manager.
Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
Involved in loading data from edge node to HDFS using shell scripting.
Created Map Reduce programs to handle semi/unstructured data like xml, json, Avro data files and sequence files for log files.
Experience in developing Pig Latin and HiveQL scripts for Data Analysis and ETL purposes and also extended the default functionality by writing User Defined Functions (UDFs) for data specific processing.
Experience in all the Latest BI Tools Tableau, Qlikview Dashboard Design, and Spotfire.
Design and Develop Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map reduce outputs.
Used pig loader for loading tables from Hadoop to various clusters.
Worked in migrating HiveQL into Impala to minimize query response time.
Created Hive tables, dynamic partitions, buckets for sampling, and working on them using Hive QL.
Connected Tableau server to publish dashboard to a central location for portal integration.
Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
Used Spark stream processing to get data into in-memory, implemented RDD transformations, actions to process as units.
Used MRUnit for unit testing and Continuum for integration testing.
Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
Worked with using different kind of compression techniques like Lzo, Snappy, Bzip2, Gzip to save data and optimize data transfer over network using Avro, Parquet, Orcfile.
Used maven to build and deploy the Jars for MapReduce, Pig and Hive UDFs.
Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
Developed Spark scripts by using Python shell commands as per the requirement.
Coordinating the team and resolving the issues of the team technically as well as functionally.
Environment: Hadoop, Scala, Map Reduce, HDFS, Spark, AWS, Hive, Cassandra, maven, Jenkins, Pig, UNIX, Python, MRUnit, Git.
Client: Veritas Technologies LLC, Mountain View, CA Nov’14 – Oct’15
Role: Hadoop Developer
Veritas Technologies LLC is specialized in storage management software industry. The purpose of this project is to maintain large amount of data by using Big Data Eco systems and I have analyzed how data is import to and export from to RDBMS to HDFS and I was responsible for adding clusters nodes, troubleshooting and data backup and log files.
Responsibilities
Responsible for building scalable distributed data solutions using Hadoop.
Worked in joining raw data with the reference data using Pig scripting.
Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
Analyzed data using Hadoop components Hive and Pig.
Stack and change extensive arrangements of organized, semi organized and unstructured information utilizing Hadoop/Big Data ideas.
Developed Oozie workflow for scheduling and orchestrating the ETL process
Created a high level design approach to build a data lake which will embrace the existing history data and also to suffice the need to process the transactional data.
Created and worked Sqoop jobs with incremental load to populate Hive External tables.
Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
Extracted the data from Teradata into HDFS using the Sqoop.
Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability.
Managed real time data processing and real time Data Ingestion in HBase and Hive using Storm.
Developed large scale streaming data analytics using Storm
Environment: Hadoop, HDFS, Pig, Hive, Oozie, HBase, Map Reduce, Sqoop, Storm, LINUX, Cloudera, Maven, Jenkins, Java, SQL.
Client: Well Care, Tampa, Florida Mar’13 – Oct’14
Role: Hadoop Developer
In this project, we actually used the core Hadoop ecosystem to build a data lake and retrieve the data of the claims and policy related information. Data is used to keep track of claims and to decommission few legacy systems. This data is then analyzed to see how competitors can make use of similar keywords to make their ads visible.
Responsibilities
Exported data from DB2 to HDFS using Sqoop and Developed MapReduce jobs using Java API.
Installed and configured Pig and wrote Pig Latin scripts.
Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts.
Developed workflow-using Oozie for running MapReduce jobs and Hive Queries.
Implementing various advanced join operations using Pig Latin.
Done the work in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
Assisted in exporting analyzed data to relational databases using Sqoop.
Involved in Develop monitoring and performance metrics for Hadoop clusters.
Worked with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN).
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Configured Hadoop system files to accommodate new sources of data and updated the existing configuration Hadoop cluster.
Actively participating in the code reviews, meetings and solving any technical issues.
Environment: Hadoop, HDFS, Hive, Flume, Sqoop, HBase, PIG, Eclipse, My SQL and Ubuntu, Zookeeper, Maven, Jenkins, Java (JDK 1.6), Oracle10g.
Client: Hexacorp Technical services, NJ Oct’11– Feb’13
Role: Java Developer
Hexacorp is a process oriented IT Services organization based on Management Systems, System Integration, System Design and this company provides software services in airlines, banking and financial services. The main aim of the project is we is maintaining the customer details and I have developed applications by using Java technologies.
Responsibilities
Effectively interacted with team members and business users for requirements gathering.
Involved in analysis, design and implementation phases of the software development lifecycle (SDLC).
Implemented presentation layer using JSP, JSP Tag Libraries (JSTL), HTML/HTML5, CSS/CSS3,Java script, JQuery and AngularJS.
Implementation of spring core J2EE patterns like MVC, Dependency Injection (DI), and Inversion of Control (IOC).
Implemented REST Web Services with Jersey API to deal with customer requests.
Developed test cases using J Unit and used Log4j as the logging framework.
Worked with HQL and Criteria API from retrieving the data elements from database.
Developed user interface using HTML, Spring Tags, JavaScript, J Query and CSS.
Developed the application using Eclipse IDE and worked under Agile Environment.
Design and implementation of front end web pages using CSS, JSP, HTML, Java Script Ajax and, Struts
Utilized Eclipse IDE as improvement environment to plan, create and convey Spring segments on Web Logic
Work cooperatively with others and take necessary steps to ensure successful project execution using strong verbal communication skills.
Environment: Java, J2EE, HTML, JavaScript, CSS, J Query, Spring 3.0, JNDI, Hibernate 3.0, Java Mail, Web Services, REST, Oracle 10g, J Unit, Log4j, Eclipse, Web logic 10.3.
Client: Choice Software Limited, Hyderabad Aug’08 – Sep’11
Role: Java Developer
Choice Software Limited is a leading IT infrastructure and data center solution provider and in this project I have developed application architecture of new systems to support the customer sales process and I have developed web based applications to manage the product details by using Java Script, HTML, CSS and I have managed some applications depends on J2EE Technologies like JDBC, SERVELETS, STRUTS.
Responsibilities
Involved in various stages of Enhancements in the Application by doing the required analysis, development, and testing.
For analysis and design of application created Use Cases, Class and Sequence Diagrams.
Developed web-based user interfaces using struts framework.
Developed and maintained Java/J2EE code required for the web application.
Handled Client Side Validations used JavaScript and Involved in integration of various Struts actions in the framework.
Involved in the development of the User Interfaces using HTML, JSP, CSS and JavaScript.
Developed, Tested and Debugged the Java, JSP and EJB components using Eclipse.
Environments: Java (JDK 1.5), J2EE, Servelets, Struts, JSP, HTML, CSS, JavaScript, EJB, Eclipse, WebLogic 8.1, Windows.