Sujatha
***@******************.***
Sr. Hadoop Developer
PROFESSIONAL SUMMARY
Over 7+ years of experience in various IT sectors such as banking and telecom services, which includes hands-on experience in Big Data technologies.
5+ years of Big Data Ecosystem experience in ingestion, storage, querying, processing and analysis of big data.
In depth understanding/Knowledge of Hadoop Architecture and various components such as HDFS, YARN, Name Node, Data Node, Resource Manager, Node Manager.
Experience in building, maintaining multiple Hadoop clusters of different sizes and configuration.
Built data pipeline using Spark Architecture from Scratch on Hadoop using Yarn as a cluster management service.
Worked in Real time analytics projects with Apache Storm and Spark Streaming.
Expertise in Kafka and Data Pipelines and worked by Producer and Kafka Consumers and advanced features of Kafka like Streams and Connect.
Worked on NoSql Databases like Hbase, Cassandra and MongoDB.
Worked on Data modeling on Hbase and Cassandra in previous application.
Used Tools like Ganglia and Nagios for monitoring bigdata applications.
Actions using Spring and JBoss Application Server.
Worked on Micro Services using Spring boot and Kafka.
Expertise in Big Data technologies and Hadoop ecosystems: HDFS, YARN, Name Node, Data Node, Yarn Architecture and MapReduce programming paradigm.
Experience in Amazon AWS services such as EMR, EC2, S3, CloudFormation, RedShift which provides fast and efficient processing of Big Data.
Extensive experience in data ingestion, big data storage planning, complex transformations, data integration, analysis for Pharmaceutical, Healthcare and Retail sectors.
Experience with scripting languages (SQL, Scala, Java, Pig, Bash/Python) to manipulate data.
Procedural knowledge in cleansing and analyzing data using HiveQL, Pig Latin, and custom MapReduce programs in Java.
Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files.
Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.
Expertise with using Oracle, MySQL, DB2 databases and writing highly complex SQL queries.
Experienced in using Integrated Development environments like Eclipse, NetBeans, IntelliJ, Spring Tool Suite.
Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
Developed simple to complex MapReduce jobs using Java language.
Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase, and
Cassandra.
Experience in Data warehousing, Data Extraction, Transformation and loading (ETL) data from various sources like Oracle, Teradata, DB2, Microsoft Excel and Flat files into Data Warehouse and Data Marts using Informatica Power Center.
Excellent working knowledge of popular frameworks like Struts, Hibernate, and Spring MVC.
Experience in Agile Engineering practices.
Good experience working with Hortonworks Distribution and Cloudera Distribution.
Involved in upgrading existing MongoDB instances from version 2.4 to version 2.6 by upgrading the security roles and implementing newer features. Hands on experience in implementing MapReduce Custom File Formats, Custom Writable and Custom Practitioners.
Have an experience in Messaging and collection Frame work like Kafka and Storm.
Have an experience in using the Splunk Archive Bucket Reader with Pig.
Have an experience in using the Streaming technologies.
Strong knowledge in Hadoop cluster installation, capacity planning and performance tuning, benchmarking, disaster recovery plan and application deployment in production cluster.
Have an Experience in using the Data Integration Software Talend to provide the real-time Solutions.
Have an experience on the enterprise NoSql Data bases like Mark logic.
Strong knowledge in internals of HDFS and MapReduce framework.
Have an experience in Data formats like Sequence, Avro, Parquet.
Basic knowledge in application design using Unified Modeling Language (UML).
Good exposure to databases like MYSQL.
Have an experience in developing dashboards.
Have an experience in using the Software Development Methodologies like Agile for providing the solutions.
TECHNICAL SKILLS:
Bigdata/Hadoop Ecosystem
HDFS, Map Reduce, HIVE, PIG, HBase, Sqoop, Flume, Oozie, Spark, Storm, Kafka,
Impala
Java / J2EE Technologies
Core Java, Servest, JSP, JDBC, JNI, XML, REST, SOAP, WSDL
Programming Languages
C, C++, Java, Scala, Python, SQL, PL/SQL, Linux shell scripts
NoSQL Databases
MongoDB, Cassandra, HBase
Database
Oracle 11g/10g, DB2, MS-SQL Server, MySQL, Teradata
Web Technologies
HTML, XML, JDBC, JSP, CSS, JavaScript, AJAX, SOAP, Angular JS
Frameworks
MVC, Hibernate 3, Spring 3/2/2.5
Tools Used
Eclipse, IntelliJ, Putty, Net Beans, Tableau
Operating System
Ubuntu (Linux), Win 95/98/2000/XP, Mac OS, RedHat
Professional Experience:
Best Western, Phoenix, AZ Apr2019 - present
Sr. Hadoop Developer
Responsibilities:
Review Business requirements documents test team to provide insights into the data scenarios and test cases.
Analyzing and understanding the Business requirements and Verifying the Business requirement document and Technical design document against requirements.
Experience in Extract, Transform, and Load (ETL) Design, development and Testing.
Experience in scheduling the Workflows and monitoring them. Provided Pro-Active Production Support after go-live.
Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs for datacleaning and preprocessing.
Extracted and processed the data from Legacy systems and stored it on HDFS.
Importing and exporting data into HDFS and Hive using Sqoop.
Involved in creating Hivetables, writing complex Hive queries to populate Hive tables.
Generating user reports using HQL on the data stored on HDFS.
Experience in tuning the HQL queries to improve the performance.
Experienced in managing and reviewing Hadoop logfiles.
Load and transform large sets of structured, semi structured and unstructured data.
Supported MapReduce programs those are running on the cluster.
Involved in loading data from UNIX file system to HDFS.
Used Oozie as an automation tool for running the jobs.
Experience working on Hadoop and utilities like HDFS, Map Reduce, SQOOP, HIVE, OOZIE, KAFKA, IMPALA, HUE.
Phase I was developed to collect the data for real time analytics and Kafka and Storm to Cassandra.
Worked in writing the topologies and saving the records in Cassandra.
Worked on Connect to moved Kafka Data to HDFS for Historical Data Processing.
Performed hands-on data manipulation, transformation, hypothesis testing and predictive modeling.
Developed robust set of codes that are tested, automated, structured and efficient.
Evaluate, refine, and continuously improve the efficiency and accuracy of existing Predictive Models using Netezza.
Extensively worked with all kinds of Un-Structured, Semi-Structured and Structured data.
Developed Scala and SQL code to extract data from various databases.
Champion new innovative ideas around the Data Science and Advanced Analytics practices.
Creatively communicated and presented models to business customers and executives, utilizing a variety of formats and visualization methodologies.
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
Uploaded data to Hadoop Hive and combined new tables with existing databases. Extracted the data from Teradata into HDFS/Databases/Dashboards using SPARK STREAMING
Experience in accessing Kafka cluster to consume data to Hadoop and analyzing the data by performing Hive queries and running Pig scripts.
Create, modify and execute DDL and ETL scripts for De-normalized tables to load data into Hive and AWS Redshift tables.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Performance analysis of Spark streaming and batch jobs by using Spark tuning parameters.
Analyzed the SQLscripts and designed the solution to implement using Scala
Used Zookeeper for various types of centralized configurations. Wrote applications which connect to the Zookeeper client and that also creates Services and Jobs where each job is being assigned to a service for processing.
Designed and developed ETL workflow using Oozie and automated them using Autosys.
Wrote the Shell Scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
Environment: Apache Hadoop, Apache Kafka, Apache Zeppelin, AWS, Spark RDD, Scala, HBase, Hive, Pig, Oozier, Red shift, Zookeeper, Cloudera CDH 4/5 Distribution, Spark Streaming, Eclipse, SQL, J2EE.
Highmark Inc., Pittsburgh, PA Aug2017 – Mar2019
Hadoop Developer
Responsibilities:
Build and maintain scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
Handle the data exchange between HDFS and RDBMS using Sqoop.
Responsible for building scalable distributed data solutions using Hadoop Hortonworks Distribution.
Developed several advanced Map Reduce programs to process data files received.
Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to load data files into Hadoop.
Close monitoring and analysis of the MapReduce job executions on cluster at task level.
Import the data from different sources like HDFS/Hbase into Spark RDD.
Bulk loaded data into Cassandra using SStable loader.
Created Cassandra tables using CQL to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Developed a data pipeline using Kafka and Storm to store data into HDFS. Performed real time analysis on the incoming data.
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
Successfully integrated Hive tables and Mongo DB collections and developed web service that queries Mongo DB collection and gives required data to web UI.
Extensively worked on Hive for ETL Transformations and optimized Hive Queries.
Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
Generated the data cubes using hive, Pig, JAVA Map-Reducing on provisioning Hadoop cluster in AWS.
Developed Scala and SQL code to extract data from various databases.
Installed Oozie workflow engine to run multiple Hive and Pig jobs. Involved in converting Map Reduce programs into Spark transformations using Spark RDD's on Scala.
Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
Created HBase tables to store various data formats of data coming from different sources.
Worked with cloud services like Amazon Web Services (AWS).
Real streaming the data using Spark with Kafka and store the stream data to HDFS using Scala.
Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
Implemented POC for using APACHE IMPALA for data processing on top of HIVE.
Supporting Hadoop developers and assisting in optimization of Map Reduce jobs, Pig Latin scripts, Hive Scripts, and HBase ingest required.
Prepared Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
Implement Flume, Spark, and Spark Stream framework for real time data processing.
Developed analytical component using Scala, Spark and SparkStreaming.
Testing Hadoop components on sample datasets in local pseudo distribution mode.
Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
Environment: HDFS, Pig, Hive, Spark, Scala, Map Reduce, Flume, Sqoop, Kafka, HBase, Cassandra, Cloudera Distribution, Oozie, Ambari, Yarn, Shell scripting.
TracFone, Miami, FL May2015 – Aug2017
Hadoop Developer
Responsibilities:
Developed solutions to process data into HDFS (Hadoop Distributed File System), process within Hadoop and emit the summary results from Hadoop to downstream systems.
Developed Map Reduce Programs for data analysis and data cleaning.
Developed PIG Latin scripts for the analysis of semi structured data.
Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
Used Sqoop extensively to ingest data from various source systems into HDFS.
Developed PIG UDFs for the needed functionality such as custom Pigs loader known as timestamp loader.
Extensively used Oozie and Zookeeper to automate the flow of jobs and coordination in the cluster respectively.
Used compliant web services like REST.
Worked on the ETL tools Informatica to extract the data.
Performed Active and Passive Transformation in Informatica.
Have an experience in Transactional processing of data using Data warehouse.
Used Talend to integrate the Big data in order to simplify the development, integration and management.
Developed shell scripts, which acts as wrapper to start Hadoop jobs and set the configuration parameters.
Implemented Spark using SparkSQL for faster testing and processing of data.
Write test cases to test software throughout development cycles, inclusive of functional/unit-testing/continuous integration.
Tested the performance of the data sets on various NoSQL databases.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
Created shell scripts and updated the scripts as per the requirement.
Developed scripts for inbound and outbound of the data on servers.
Supported all the UNIX requests for various applications. Designed performance optimization involving data transmission, data extraction, business validations, service logic and job scheduling.
Written Hive queries for data analysis to meet the business requirements.
Used MongoDB for batch aggregation. Pulled data from MongoDB and processed it in Hadoop using MapReduce jobs.
Created a virtual server, called an EC2 instance, and use it as an application server in the cloud.
Created Hive tables and worked on them using Hive QL.
Load and transform large sets of structured, semi structured and unstructured data.
Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
Involved in loading data from LINUX file system to HDFS.
Responsible for managing data from multiple sources.
Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
Developed Shell scripts for automate routine tasks.
Used Oozie and Zookeeper operational services for coordinating cluster and scheduling workflows.
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Spark, Scala, Sqoop, Kerberos, Java Eclipse, SQL Server, Oozie, Zookeeper, Shell Scripting.
Eli Lilly, Indianapolis, IN Dec2013 – Apr2015
Hadoop Developer
Responsibilities:
Analyzed the Big Data business requirements and transformed it into Hadoop centric technologies.
Worked on importing and exporting data from Oracle and Teradata into HDFS and Hive using Sqoop.
Implemented Hive custom UDF's to achieve comprehensive data analysis.
Developed Pig Custom UDF's for custom input formats for performing various levels of optimization.
Worked on streaming log data into HDFS from web servers using Flume.
Implemented custom interceptors for flume to filter data as per requirement.
Used Hive and Pig to analyze data in HDFS to identify issues and behavioral patterns.
Created internal and external Hive tables and defined static and dynamic partitions for optimized performance.
Developed Pig Latin scripts for running advanced analytics on the data collected.
Configured daily workflow for extraction, processing and analysis of data using Oozie Scheduler.
Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
Got good experience with NoSQL database.
Designed and implemented MapReduce-based large-scale parallel relation-learning system.
Installed and benchmarked Hadoop/HBase clusters for internal use.
Written HBASE Client program in Java and web services.
Supported postproduction enhancements.
Experience with data model concepts-star schema dimensional modeling Relational design (ER).
Extensively used PIG to communicate with Hive using HCatalog and HBASE using Handlers.
Created Hive tables to store the processed results in a tabular format.
Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Java, SQL, Sqoop, Oozie, Unix, Linux
Axis Bank, India Jan2013 – Nov2013
Java Developer
Responsibilities:
Developed Java classes and helper classes in the business layer and tested them using Junit.
Used Eclipse Workbench views, editors, perspectives, wizards as rich client platform.
Used JDBC extensively for database transactions.
Used Rational Clear Case for version repository.
Involved in the development of interfaces for the application using JSP, Servlets, and JavaScript.
Created Java validation classes and utility classes.
Actively participated in Stress Testing of the existing business components using WebLogic Application Server.
Created Class diagrams and Sequence diagrams by using Violet integrated with eclipse.
Used Jalopy for code formatting.
Extensive usage of Rest full web services throughout modules to communicate with all external system.
Developed wrapper classes and DAO classes used for validation and data lookup.
Implemented web service over http to expose the package/product details to the front end systems.
Responsible for developing the full stack (back-end development from the Markup, JavaScript, Application Services, Database, and Build Scripts).
Used sax parser to read the xml to propagate the values to the business validation layer.
Involved in creating various reusable Helper and Utility classes which are used across all the modules of the application.
Implemented rich authentication and authorization features to ensure application is fully controlled with sophisticated and dependable security.
Used Hibernate, ORM technology for the database operations. Wrote HQL (Hibernate Query Language) queries as required.
Developed the SOAP based web service using the contract first principle by defining the XML schema.
Used Maven with Jenkins and SVN for version control.
Wrote build and deployed scripts using Shell scripts and involved in performance analysis of the application and fixed problems/suggest solutions.
Participated in reviewing the functional, business and high level design requirements. Developed the Use Case diagrams and Class diagrams.
Environment: J2EE, JDBC, Java 1.5, Servlets, JSP, Web services, SOAP, WSDL, UML, MVC, HTML, JavaScript 1.2, XML, My Eclipse.