SONAL SAXENA
acugsb@r.postjobfree.com
PROFESSIONAL SUMMARY
Over 7+ years of extensive experience in Analysis, Design, Develop, Testing and maintenance of Hadoop / Java applications
Experience in developing big data applications using Hadoop framework and its eco systems like MapReduce, HDFS, Hive, Pig, HBase, Oozie, Sqoop and Flume
Experience in processing large sets of structured, semi-structured and unstructured data sets.
Experience in developing MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables.
Experience in optimizing Map Reduce algorithms using Mappers, Reducers, combiners and practitioners to deliver the best results for the large datasets. Good experience in writing Map Reduce jobs using Java native code, Pig, Hive for various business use cases
Worked on streaming the data into HDFS from web servers using Flume
Worked extensively with Sqoop for importing/exporting data between relational sources like SQL Server/MySQL and HDFS/Hive
Designed and implemented Hive and Pig UDF's using Python for evaluation, filtering, loading and storing of data
Experience in fine-tuning performance of Hive queries and Pig scripts
The Hive tables created as per requirement were Internal or External tables defined with appropriate Static and Dynamic partitions, intended for efficiency
Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data
Developed Simple to complex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig.
Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs
Implemented Java HBase MapReduce paradigm to load raw data onto HBase database, and bulk importing data into HBase via Sqoop
Capable of building and deploying batches in shell script
Worked with different sources like Flat files, XML files, JSON, AVRO, Parquet and relational tables
Developed and Reviewed Java codes to identify any errors and fix them
Education and Certification
Bachelor of Technology in Computer Engineering
Microsoft SQL Server 2008, Database Development Certified in 2013
Oracle Certified Professional Java SE 6 Programmer in 2012
TECHNICAL SKILLS
Big Data Ecosystem
Hadoop, MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Oozie, Flume
Tools
Eclipse, ANT, SharePoint, Microsoft SQL Server Management Studio 2008, Business Intelligence Development Studio 2008, RAD, Subversion, BMS Remedy, PUTTY, WinSCP, FileZilla, Service Now
Languages
Java, Python, Unix Shell Scripting,SQL, JavaScript, XML, HTML, AWK, Shell
Environment
UNIX, LINUX, Windows
Databases
MS SQL Server, MS Access, IBM DB2
EXPERIENCE SUMMARY
Laureate Education, Baltimore, MD Jan 2015 till now
Hadoop Data Engineer
Responsibilities:
• Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem
• Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
• Developed Simple to complex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig.
• Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
• Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
• Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
• Tested Apache(TM) Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
• Used Impala to read, write and query the Hadoop data in HDFS or HBase or Cassandra
• Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources
• Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
• Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
• Installed Oozie workflow engine to run multiple Hive and Pig jobs
• Used Mahout to understand the machine learning algorithms for an efficient data processing
• Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
• Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
• Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
• Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
Environment:
Hadoop – CDH 5.0.2, PIG, Hive, Apache Sqoop, Oozie, HBase, Zoo keeper, Cloudera manager, 30 Node cluster with Linux-Ubuntu.
Java, Shell scripting, Python
Duke University Health System, Durham, NC Feb 2013 to Jan 2015
Hadoop Developer
Responsibilities:
• Migrating the data from MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
• Proposed an automated system using Shell script to Sqoop the data.
• Worked in Agile development approach.
• Created the estimates and defined the sprint stages.
• Developed a strategy for Full load and incremental load using Sqoop.
• Mainly worked on Hive queries to categorize data of different claims.
• Integrated the hive warehouse with HBase
• Written customized Hive UDFs in Java where the functionality is too complex.
• Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
• Helped generating Tableau reports for testing the connectivity to the corresponding Hive tables using Hive ODBC connector.
• Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
• Monitored System health and logs and respond accordingly to any warning or failure conditions.
Environment:
CDH 4.0, HDFS, Hive, Java, Sqoop, Cloudera CDH4, MySQL, Tableau
Investor Online Network, Englewood Cliff, NJ Jun 2012 to Jan 2013
Java & Hadoop Developer
Responsibilities:
Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables.
Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Supported code/design analysis, strategy development and project planning.
Created reports for the BI team using Sqoop to export data into HDFS and Hive.
Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Administrator for Pig, Hive and HBase installing updates.
Hive and Pig UDF's using PYTHON for evaluation of data.
Environment:
Java, Apache Hadoop, Hive, PIG, SQOOP, HBase
Eclipse, Microsoft SQL Server, Management Studio 2008, Service Now, LINUX, Sharepoint
Legal and General, Redhill, UK Mar 2009 to May 2012
Java Developer
Responsibilities:
Worked in an Agile Test driven Development environment
Implemented server side tasks using Servlets and XML.
Developed page templates using Struts Tiles framework.
Implemented Struts Action classes using Struts controller component
Created and deployed web pages using HTML, JSP, JavaScript and CSS.
Led the migration of monthly statements from UNIX platform to MVC Web-based Windows application using Java, JSP, and Struts technology.
Developed SQL statements to improve back-end communications.
Used ANT automated build scripts to compile and package the application
Incorporated custom logging mechanism for tracing errors, resolving all issues and bugs before deploying the application in the WebSphere Server.
Prepared use cases, designed and developed object models and class diagrams.
Develop and Review Java codes to identify any errors and fix them.
Quality Center and Subversion Administration
Application Environment Setup and Status Reporting.
Worked with different sources like flat files, XML files, DB2, MS SQL Server.
Extracted data from Sales department to flat files and load the data to the target database.
Written stored procedures, triggers, and cursors.
Environment:
Java, J2EE, JavaScript, Microsoft SQL Server, Management Studio 2008
Eclipse, XML, HTML, Sharepoint, FileZilla, PUTTY, UNIX, SQL, ANT
Fidelity Information Services, Tampa, FL Aug 2006 to Jan 2009
SQL Developer
Responsibilities:
Maintaining the records of Employee enrolled in different benefits for various clients.
Importing the election information for different vendors during open enrollment with precision.
SQL script to generate various reports as per the client requirement. i.e Medical election report, Plan change report etc.
Implementing triggers and stored procedures to maintian the data integrity among different systems using T-SQL.
Automated manual task to enhance the performance of system.
Function creation to provide custom functionality as per the requirement.
Aware of potential blocking, deadlocking and write code to avoid those situations.
Translate business requirements into software applications and models.
Participating in discussions involving the application creation and understand the requirements and provide the back-end functionality for the applications.
Having experience in project Quality Assurance.
Instantiating applications in different databases for development, testing, education, and deployment in a production environment
Environment:
Windows XP, SQL Server Management Studio, BMS-Ready, PUTTY,
FileZilla, Eclipse