Supriya
******.************@*****.***
Summary:
Over 8+ years of experience in overall IT, which includes hands on experience in Big Data technologies.
* ***** ** ********** ** installing, configuring, and using Hadoop ecosystem components likeMapReduce, HDFS, Hive, Sqoop, Pig, Zookeeper, Oozie and Flume.
Good Exposure on Map Reduce programming using Java, PIG Latin Scripting and Distributed Application and HDFS.
Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
Good understanding of HDFS Designs, Daemons, HDFS high availability (HA).
Expertise in data transformation &analysis using PIG, HIVE and SQOOP.
Experienced in implementing SPARK using Scala and Python.
Configured Zoo Keeper, Cassandra & Flume to the existing Hadoop cluster.
Strong understanding of Cassandra database.
Experience in importing and exporting data using Sqoop from HDFS & Hive to Relational Database Systems and vice-versa.
Worked on NoSQL databases including HBase and MongoDB.
Experienced in coding SQL, PL/SQL, Procedures/Functions, Triggers and Packages on database (RDBMS) packages like Oracle.
Experience in implementation of Open-Source frameworks like Struts, Spring, Hibernate, Web Services etc.
Extensive experience working in Oracle and My SQL database.
Familiar with data architecture including data ingestion pipeline design, Hadoop architecture, data modeling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.
Good Knowledge on Spark, Storm and HBase to do real time streaming.
In-depth knowledge of Statistics, Machine Learning, Data mining.
Experienced with big data machine learning in Mahout and Spark MLlib.
Experienced with data cleansing in writing Map Reduce jobs and Spark jobs.
Experienced with statistic tools Matlab, R and SAS.
Experienced supervised learning techniques like Multi-Linear Regression, Nonlinear Regression, Logistic Regression, Artificial Neural Networks, Support Vector Machine, Decision tree, Random Forest.
Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities.
Excellent communication skills, interpersonal skills, problem-solving skills, a very good team player along with a can do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.
Technical Experience:
Big Data
Hadoop, HDFS, MapReduce, Pig, Hive, HBase, Cassandra, Spark, Sqoop, Oozie, Zookeeper, Impala, Flume
IDE Tools
Eclipse, NetBeans
Java Technologies
Core Java, Servlets, JSP, JDBC, Collections
Web Technologies
XML, HTML, JSON, JavaScript, AJAX, Web services
Programming Languages
C, C++, Python, Core Java, JavaScript, Shell Script, Scala
Databases
Oracle, MySQL, DB2, PostgreSQL, MongoDB
Operating Systems
WindowsXP/Vista/7, Mac OSX, Linux, Unix
Logging tools
Log4j
Version Control System
SVN, CVS, GIT
Other Tools
Putty, WINSCP
Education:
Bachelor of Science in Computer Science.
Professional Experience:
Client: Goldman Sachs, New York, NY Jan2015 - Till Date
Role:Hadoop Developer
Responsibilities:
Built a suite of Linux scripts as a framework for easily streaming data feeds from various sources onto HDFS.
Wrote Interface specifications to ingest structured data into appropriate schemas and tables to support the rules and analytics.
Extracted the data from Teradata into HDFS using Sqoop and exported the patterns analyzed back into Teradata using Sqoop.
Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to load data files.
Used hive schema to create relations in pig using HCatalog.
Developed a Java MapReduce and pig cleansers for data cleansing.
Analyze and visualized data in Teradata using Datameer.
Implemented POC to migrate map reduce jobs into Spark RDD transformation using Scala.
Implemented Machine learning Models like K-means clustering using PySpark.
Used Spark to create reports for analysis of the data coming from various sources like transaction logs.
Involved in migration from Hadoop System to Spark System.
Refactored formal Hive queries to Spark SQL.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Used Oozie Operational Services for batch processing and scheduling work flows dynamically.
Used Maven extensively for building jar files of MapReduce programs and deployed to cluster.
Managed Agile Software Practice using Rally by creating Product Backlog, Iterations and Sprints in collaboration with the Product Team.
Environment – Hadoop, HDFS, Hive, Pig, MapReduce, YARN, Datameer, Flume, Oozie, Linux, Teradata, HCatalog, Java, Eclipse IDE, GIT.
Client: Emblem Health, Newyork, NY Apr 2014 - Dec 2014
Role: Hadoop Developer
Responsibilities:
Worked on writing various Linux scripts to stream data from multiple data sources like Oracle and Teradata onto the data lake.
Built the infrastructure that aims at securing all the data in transit on the data lake. This helps in ensuring at most security of customer data.
Extended Hive framework through the use of custom UDF to meet the requirements.
Actively involved in working with Hadoop Administration team to debugging various slow running MR Jobs and doing the necessary optimizations.
I have taken the initiative to learn eCDW, an AT&T custom scheduler to schedule periodical run of various scripts for initial and delta loads for various datasets.
Played a key role in mentoring the team on developing MapReduce jobs and custom UDFs.
I played an instrumental role in working with the team to leverage Sqoop for extracting data from Teradata.
Imported data from different relational data sources like Oracle, Teradata to HDFS using Sqoop.
Developed job flows in TWS to automate the workflow for extraction of data from Teradata and Oracle.
Was actively involved in building the Hadoop generic framework to enable various teams to reuse some of the best practices.
Optimized MapReduce jobs to use HDFS efficiently by using various compression mechanisms.
Helped the team in optimizing Hive queries.
Environment:
A massive cluster with High Availability of Name Nodes.
Cluster to support 400 TB of data keeping growth rate into consideration.
Secured cluster with Kerberos.
Used LDAP for role based access.
Apache Hadoop, MapReduce, HDFS, Hive, Sqoop, Linux, JSON, Oracle11g, PL/SQL, Eclipse, SVN, Teradata Client, TWS (Tivoli Work Scheduler).
Client: YUM Brands, Louisville, KY. Feb 2013 - Mar 2014
Role: Hadoop Developer
Responsibilities:
Responsible for building scalable distributed data solutions using Hadoop.
Involved in start to end process of hadoop cluster installation, configuration and monitoring.
Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
Setup and benchmarked Hadoop, HBase clusters for internal use.
Developed Simple to complex Map Reduce Jobs using Hive and Pig.
Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
Used UDF's to implement business logic in Hadoop.
Used Hive for managing customer and franchise information in the tables.
Imported data using Sqoop into Hive and HBase from existing DB2.
Used Ganglia to monitor the distributed cluster network.
Used Jira for bug tracking.
Used Git to check-in and checkout code changes.
Environment: Java, Hadoop, HDFS, MapReduce, PIG, Sqoop, Hive, HBase, DB2, Eclipse.
Client: CSX, Jacksonville, FL. Nov 2009 – Jan2013
Role: Java Developer
Responsibilities
Involved in requirements gathering and creating functional specifications by interacting with business users.
Responsible for analysis, design, development and unit testing.
Implemented the application using Spring MVC. Used Spring Batch for handling the parallel processing of batch jobs.
Followed MVC and DAO as the design patterns for developing the application. Any database calls are handled only through DAO’s.
Improved performance of application using Hibernate for retrieving, manipulating and storing millions of records.
Created web Pages using HTML, JSP, JavaScript and Ajax.
Unit and Integration testing before check in the code for the QA builds.
Involved in Production Support.
Used JavaScript for client side validations.
Used Log4j and commons-logging frameworks for logging the application flow.
Involved in developing build scripts using Ant.
Used CVS for Version controlling.
Developed JavaScript functions for the front-end validations.
Environment: Apache Tomcat, JSP, Servlets, Ajax, Eclipse, PL/SQL, Oracle, HTML, JavaScript, UML, Windows XP.
Client: Liberty Mutual, Dublin, OH. May 2008 – Oct 2009
Role: Jr. J2EE Developer
Responsibilities:
Developed User Interface screens using JSP, HTML. Used JavaScript for client side validation.
Worked with Onsite and Offshore team to coordinate the knowledge transfer and work.
Developed session beans by using EJB's for business logic at the middle tier.
Written SQL and stored procedure to extract data model from Oracle enterprise data.
Written various Java classes for registrations of users.
Used JDBC API to access database.
Involved in the generation of reports.
Performed build releases planning and co-ordination for QA testing and actual deployment.
Participated in unit testing (using JUnit) and integration testing.
Environments: Java, JDBC, XML, log4j, Ant, Oracle 9i, TOAD, Solaris, AIX, Windows.