•7+ years of experience in software design, development, testing and troubleshooting of applications, Data Analysis and in Hadoopdevelopment using HDFS, Map Reduce,Hive, and HBASE.
•Responsible for designing system code, test, and document programs. Develops specifications for new and existing systems. Will deal with clients to determine needs and requirements.
•Strong core knowledge of Hadoop architecture and components in the Hadoop ecosystem.
•Understanding of RDBMs and SQL programming skills, such as Oracle, MySql, and PostgreSQL.
•Strong working experience using Hadoop Ecosystem tools such as Hive, Sqoop, Flume, Oozie, HBase, Hue and Impala.
•Proficient in creating Hive DDL’s, writing Hive complex queries.
•Experience in importing and exporting data using Sqoop from source systems to HDFS and from HDFS to source systems.
•Experience in using Flume and Kafka to load the log data, MQ messages from multiple sources into HDFS.
•Experience in analyzing data using HiveQL.
•Delivered the codes with high quality, including proper design review, unit testing, integration testing.
•Strong experience using Hive performance optimization techniques using ORC file format, distribution key, partitions and buckets.
•Experience in using different file formats like CSV, Sequence, AVRO, ORC, JSON and PARQUET files and different compression Techniques like Gzip, Bzip2, LZO and Snappy.
•Familiar with Java virtual machine (JVM) and multi-threaded processing.
•Knowledge in job workflow scheduling and monitoring usingOozie.
•Experienced in performance tuning and analysis in both relational database and NoSQL database(HBase).
•Strong experience and knowledge of real time data analytics using Spark.
•Worked with business and technology client representatives to gather functional and technical requirements.
•Assisted in gathering activities and liaising with IT operational staff and business owners as needed.
•Contributed as a POC for implementing applications on Spark frameworks, which provides high level APIs like Scala, Java, and python.
•Established the scope of the project, lead internal communication with stakeholders, and ensure delivery of the project per commitment.
•Excellent organizational, analytical, written and oral communications skills.
Big Data Technologies HDFS, MapReduce, Hive, HBase, Sqoop, Flume, Oozie, Kafka,Impala, Apache Spark, Apache Storm, YARN.
Hadoop Distributions Cloudera (CDH4/CDH5), Hortonworks Data platform 2.3.6
Languages Java, SQL, PL/SQL, Linux shell scripting.
Operating Systems Windows (XP,7,8), UNIX, RHEL 6.8/6.5.
Tools Adobe, Sql Developer, Flume, Sqoop and Storm.
J2EE Technologies JSP, Java Bean, Servlets, JPA1.0, EJB3.0, JDBC.
Databases Oracle, MySQL, DB2, PostgreSQL.
Conduent State Health Care LLC Sept 2015 to Present
Client: NH MMIS
Senior Hadoop Developer
New Hampshire (NH) Medicaid Management Information System (MMIS) is an Enterprise application for Medicaid program which is a State administered health insurance program financed and run jointly by the federal and state governments for low-income people of all ages who do not have the money or insurance to pay for health care. The goal of the Medicaid program is to provide medical and other health care services to eligible individuals so that they are able to remain as self-sufficient as possible.
Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Zookeeper, Sqoop, flume, Spark, Impala, Cassandra.
Developed multiple MapReduce jobs in PIG and Hive for data cleansing and pre-processing.
Worked on reading multiple data formats on HDFS using Scala.
Developed Spark scripts by using Scala shell commands as per the requirement.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.3 for Data Aggregation, queries and writing data back into OLTP system directly or through Sqoop.
Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, spark-SQL, Data Frame, pair RDD’s, Spark YARN.
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
Experiences in implementing Hive-HBase integration by creating hive external tables and using HBase storage handler.
Developed Scala scripts, UDF's using Data frames/SQL and RDD in Spark for Data Aggregation, and queries.
Executed queries using Hive and developed MapReduce jobs to analyze data.
Developed Pig Latin Scripts to extract the data from the web server output files to load into HDFS.
Developed Hive queries for the analysts.
Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
Developed a data pipeline using Kafka and Storm to store data into HDFS.
Performed real-time analysis on the incoming data.
Expertise in different data modeling and Data warehouse design and development.
Environment:Hortonworks Data Platform 2.3.6, Ambari, HDFS, MapReduce, Hive, SQL, HUE, Sqoop, Flume, MySQL, HBase, Oozie.
Client: Ancestery.com, New York, NY May2014 to Aug2015
•Working on data ingestion process from various source systems into HDFS using Sqoop, Flume, WebHDFS.
•Apply business logic, build Datamarts, and load data from RAW zone to Processing zone.
•Involved in loading data from LINUX file system to HDF.
•Created HBase tables to store variable data formats from millions of data rows.
•Implemented partitioning, dynamic partitions and buckets in pig and HIVE.
•Used Hive and Pig to analyze data in HDFS to identify issues and behavioral patterns.
•Created internal and external Hive tables and defined static and dynamic partitions for optimized performance.
•Created Hive Dynamic partitions to load time series data.
•Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and have a good experience in using Spark-Shell and Spark Streaming.
•Handled different types of joins in Scala like Map joins, bucker map joins, sorted bucket map joins.
•Involved in loading data from LINUX file system to HDF.
•Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL and Scala.
•Transformation of files from one format to another format using Spark Scala.
•Develop Spark code using Scala and Spark-SQL for faster testing and data processing.
Environment: Hadoop, MapReduce, HDFS, Hive, Flume, SQL language, Oracle, XML, Eclipse.
ICICI Bank Limited –Hyderabad, India May 2010 to Dec 2013
ICICI is the second largest bank in India in terms of assets and third in terms of market capitalization. It offers a wide range of banking products and financial services for corporate and retail customers through a variety of delivery channels and specialized subsidiaries in the areas of investment banking, life, non-life insurance, venture capital and asset management.
•Involved in Software Development Life Cycle (SDLC) of the application: Requirement gathering,
•Design Analysis and Code development.
•Implemented Struts framework based on the Model View Controller design paradigm.
•Implemented the MVC architecture using Strut MVC.
•Struts-Config XML file was created and Action mappings were done.
•Designed the application by implementing Struts based on MVC Architecture, simple Java Beans as a Model, JSP UI Components as View and Action Servlet as a Controller
•Designed and developed business components using Session and Entity Beans in EJB.
•Used JDBC for data access from Oracle tables.
•Worked on triggers and stored procedures on Oracle database.
•Apache Ant was used for the entire build process.
•JUnit was used to implement test cases for beans.
•Worked on Eclipse IDE to write the code and integrate the application.
•Application was deployed on WebSphere Application Server.
•Experience on SVN repositories for version controlling, and Log4J is used for Logging Errors, Exceptions.
•Coordinated with testing team for timely release of product.