REVATHI V
Email: *********.******@*****.***
HADOOP DEVELOPER
Phone: 832-***-****
Experience Summary:
Around 8 years of professional IT experience with around 5 years of experience in Big data Environment, Hadoop Ecosystem and good experience in Spark, NoSQL, Java Development.
•Hands on experience across Hadoop Eco System that includes extensive experience in Big Data technologies like HDFS, MapReduce, YARN, Spark, Sqoop, Hive, Pig, Impala, Oozie, Oozie Coordinator, Zoo-Keeper and Apache Cassandra, HBase.
•Experience in using various tools like Sqoop, Flume, Kafka, NiFi, Pig to ingest structured, semi-structured and unstructured data into the cluster.
•Designing both time driven and data driven automated workflows using Oozie and used Zookeeper for cluster co-ordination.
•Experience in Hadoop cluster using cloudera's CDH, Horton works HDP.
•Experience in working with structured data using HiveQL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
•Expertise in writing Map-Reduce Jobs in Java, Python for processing large sets of structured, semi-structured and unstructured data sets and stores them in HDFS.
•Experienced in using Pig scripts to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS.
•Worked on importing data into HBase using HBase Shell and HBase Client API. Experience in designing and developing tables in HBase and storing aggregated data from Hive Table.
•Experience working with Python, UNIX and shell scripting.
•Experience with different data formats like Json, Avro, parquet, ORC formats and compressions like snappy & bzip.
•Experience in Extraction, Transformation and Loading (ETL) of data from multiple sources like Flat files and Databases.
•Implemented Data Quality in ETL Tool Talend and having good knowledge in Data Warehousing and ETL Tools like IBM DataStage, Informatica and Talend.
•Good knowledge of cloud integration with AWS using Elastic Map Reduce (EMR), Simple Storage Service (S3), EC2, Redshift and Microsoft Azure.
•Good knowledge on Google Cloud Dataproc creation, Cluster creation and in-memory Emulator testers.
•Experience in using IDEs and Tools like Eclipse, IntelliJ, NetBeans, GitHub, Maven, SBT, CBT.
•Strong in core Java, data structure, algorithms design, Object-Oriented Design (OOD) and Java components like Collections Framework, Exception handling, I/O system, and Multithreading.
•Hands on experience in MVC architecture and Java EE frameworks like Struts2, Spring MVC, and Hibernate.
•Experience with complete Software Development Life Cycle(SDLC) process which includes Requirement Gathering, Analysis, Designing, Developing, Testing, Implementing and Documenting.
•Worked with waterfall and Agile methodologies.
•Good team player with excellent communication skills with strong attitude towards learning new technologies.
Spark & Real Time Streaming
•Hands on Experience in Spark architecture and its integrations like Spark SQL, DataFrames and Datasets APIs.
•Worked on Spark for enhancing the executions of current processing in Hadoop utilizing Spark Context, Spark SQL, Data Frames and RDD’s.
•Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL and Python.
•Hands on experience Using Hive Tables by Spark, performing transformations and Creating Data Frames on Hive tables using Spark.
•Used Spark-Structured-Streaming to perform necessary transformations.
•Expertise in converting Map Reduce programs into Spark transformations using Spark RDD's.
Technical Skills:
HADOOP
HDFS, MapReduce, Hive, beeline, Sqoop, Flume, Oozie, Impala, pig, Kafka, Zookeeper, NiFi, Cloudera Manager, Hortonworks
Spark Components
Spark Core, Spark SQL (Data Frames and Dataset), Scala, Python.
Programming Languages
Core Java, Scala, Shell, Hive-QL, Python
Web Technologies:
HTML, JQuery, Ajax, CSS, JSON, JavaScript.
Operating Systems
Linux, Ubuntu, Windows 10/8/7
Databases
Oracle, MySQL, SQL Server,
NoSQL Databases
Hbase, Cassandra, MongoDB
Cloud
AWS Cloud Formation, Azure
Version controls and Tools
GIT, Maven, SBT, CBT
Methodologies
Agile, Waterfall
IDES & Command Line Tools
Eclipse, Net Beans, IntelliJ
Professional Experience:
Hadoop Developer
Axa Equitable, Charlotte, NC Sep’16-Present
This project is to replace existing legacy applications by storing and processing the data of Billing, Payments, and Disbursements Application Databases entirely in HDFS. The entire processing in HDFS was implemented using Hadoop Stack Technologies.
Responsibilities:
•Developed an EDW solution, which is a cloud based EDW and Data Lake that supports Data asset management, Data Integration, and continuous data analytic discovery workloads.
•Developed and implemented real-time data pipelines with Spark Streaming, Kafka, and Cassandra to replace existing lambda architecture without losing the fault-tolerant capabilities of the existing architecture.
•Created a Spark Streaming application to consume real-time data from Kafka sources and applied real-time data analysis models that we can update on new data in the stream as it arrives.
•Worked on importing, transforming large sets of structured, semi-structured and unstructured data.
•Used Spark-Structured-Streaming to perform necessary transformations and data model which gets the data from Kafka in real time and Persists into HDFS.
•Implemented the workflows using the Apache Oozie framework to automate tasks. Used Zookeeper to co-ordinate cluster services.
•Created various hive external tables, staging tables and joined the tables as per the requirement.
Implemented static Partitioning, Dynamic partitioning and Bucketing in Hive using internal and external table. Created Map side Join, Parallel Execution for optimizing the Hive queries.
Developed and implemented hive and spark custom UDFs involving date Transformations such as date formatting and age calculations as per business requirements.
•Written Programs in Spark using Scala and Python for Data quality check.
•Written transformations and actions on Data Frames, used Spark SQL on data frames to access hive tables into spark for faster processing of data.
•Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
•Experience in using Google Cloud in creating Cloud Dataproc to create one or more Compute Engine Instances that will connect to a cloud Bigtable Instances and created Hadoop Clusters to run Hadoop jobs and used in-memory emulators for testing using filters.
•Used Spark optimizations techniques like Cache/Refresh tables, broadcasting variables, Coalesce/Repartitioning, increasing memory overhead limits, handling parallelism and modifying the spark default configuration variables for performance tuning.
•Performed various benchmarking steps to optimize the performance of Spark jobs and thus improve the overall processing.
•Worked in Agile environment in delivering the agreed user stories with in the sprint time.
Environment: Hadoop, HDFS, Hive, Sqoop, Oozie, Spark, Scala, Kafka, Python, Cloudera, Linux.
Hadoop Developer
Dropit, Miami, FL Dec’14-Aug’16
To implement data-lake that can handle transactional processing operations using Hadoop. Portion of RDBMS tasks are to be migrated to Hadoop for faster & efficient in-memory computations.
Responsibilities:
•Worked with product owners, Designers, QA and other engineers in Agile development environment to deliver timely solutions to as per customer requirements.
•Transferring data from different data sources into HDFS systems using Kafka producers, consumers and Kafka brokers.
•Used Oozie for automating the end-to-end data pipelines and Oozie coordinators for scheduling the workflows.
•Involved in creating Hive tables, loading data and writing hive queries, views and worked on them using Hive QL.
•Performed Optimizations of Hive Queries using Map side joins, dynamic partitions and Bucketing.
•Applied Hive queries to perform data analysis on HBase using the serde tables in meeting the data requirements for the downstream applications.
•Responsible for executing hive queries using Hive Command Line, Web GUI HUE and Impala to read, write and query the data into HBase.
•Implemented MapReduce secondary sorting to get better performance for sorting results in MapReduce programs.
•Load and transform large sets of structured, semi structured that includes Avro, sequence files.
•Worked on migration of all existed jobs to Spark, to get performance and decrease time of execution.
•Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
•Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
•Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in Amazon EMR.
• Used Microsoft Azure for building the applications and for building, testing, deploying the applications.
•Experience with ELK Stack in building quick search and visualization capability for data.
•Good Knowledge in using NiFi to automate the data movement between different Hadoop systems.
•Experience with different data formats like Json, Avro, parquet, ORC formats and compressions like snappy & bzip.
•Coordinated with the testing team for bug fixes and created documentation for recorded data, agent usage and release cycle notes.
Environment: Hadoop, Big Data, HDFS, Scala, Python, Oozie, Hive, HBase, NiFi, Impala, Spark, AWS, Linux.
Hadoop Developer
Unisys, Irvine, California Sep’13-Nov’14
Unisys offers its services to consumers and business under the branding spectrum. This project mainly deals with pulling data from various structured and unstructured data sources into Hadoop platform and standardizes all the data through a series of master data management processes and to reach the client goal of becoming effective and efficient with different geographical locations.
Responsibilities:
•Involved in loading data from UNIX file system to HDFS using Shell Scripting.
•Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa. Implemented incremental data imports into HDFS.
•Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend and generating tableau reports on top of it.
•Worked extensively in Impala Hue to analyze the processed data and to generate the end reports.
•Developed hive queries and Pig scripts for data analysis which are automated in Oozie process.
•Experience in both SQL Context and Spark Session.
•Developed custom UDF’s for pig scripts for cleaning unstructured data and used different joins and groups whenever required to optimize the pig scripts.
•Integrated Map Reduce with HBase to import bulk data using MR programs.
•Used Flume to collect, aggregate and store the web log data from different sources like web servers and pushed to HDFS.
•Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
•Implemented the workflows using Oozie to automate tasks. Used Zookeeper to co-ordinate cluster services.
•Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Environment: Hadoop, HDFS, Pig, Hive, sqoop, HBase, Flume, Spark, Shell Scripting, Scala.
Java Developer
Krish Techno Labs, Bangalore, Karnataka Apr’12-Aug’13
It is and E-commerce agency. Involved in Analysis, Design of requirements, developing new features, developing graphical User Interaction components, security pages, integrating with other tools for external data and performed unit testing and fixing bugs.
Responsibilities:
•Involved in various phases of Software Development Life Cycle like requirement gathering, design, analysis and code development
•Developed Use Cases, Class Diagrams, Activity Diagrams and Sequence Diagrams.
•Developed Java Server Pages (JSP) for the front end and Servlets for handling Http requests. Worked with Tomcat Server for deployment.
•Developed Graphical User Interfaces using XML and used JSP's for user interaction
•Used JSP custom tags and Stored Procedures in the web tier to dynamically generate web pages.
•Used SVN as version control and Ant to build the J2EE application.
•Worked on Oracle to perform DML and DDL operations.
•Involved in Unit Integration, Pre-Production testing, Client Acceptance Tests and Approvals.
Environment: Java, J2EE, Eclipse IDE, JavaScript, JSON, MySQL, PL/SQL, Web service
Jr. Java Developer
Erudex Pvt Ltd, Hyderabad, Telangana. Aug’10-Mar’12
It provides a repository of multimedia which encourages an interactive education system, where students and teachers can login and complete the provided tasks, submissions, explanations, etc. Involved in design and development of modules in the backend, creating login pages and saving and retrieving data from SQL.
Responsibilities:
•Involved in different SDLC phases involving Requirement Gathering, Design and Analysis, Development and Customization of the application.
•Designed new pages using HTML, CSS, jQuery, and JavaScript.
•Wrote database queries using SQL and PL/SQL for accessing, manipulating and updating Oracle database.
•Created database design for new tables and forms with the help of Technical Architect.
•Worked with managers to identify user needs and troubleshoot issues as they arise.
•Performing Unit testing, once the basic implementation has done.
Environment: Java, J2EE, Eclipse IDE, JavaScript, JSON, MySQL, PL/SQL, Web service
Education
Bachelor’s in Computer Science and Engineering, GITAM University, India, 2010.