Resume

Big data developer

Location:

Chicago, IL

Salary:

75+USD per hour

Posted:

June 21, 2017

Contact this candidate

Resume:

Farzana M

SUMMARY

* ***** ** ***** ******** development experience with Hadoop Ecosystem, Big Data and Data Science Analytical Platforms, Java/J2EE Technologies, Database Management Systems and Enterprise-level Cloud Base Computing and Applications.

Around 3years of experience in Design and Implementation of Big data applications using Hadoop stack MapReduce, Hive, Pig, Oozie, Sqoop, Flume, HBase and NoSQL Data bases.

Extensive hands on experience in writing complex Map reduce jobs, Pig Scripts and Hive data modeling.

Have good experience creating real time data streaming solutions and batch style large scale distributed computing applications using Apache Spark and Flume.

Have hands-on experience doing analytics using Apache SPARK.

Hands-on experience and in depth understanding and usage of Hadoop Architecture frameworks and various components such as HDFS and MapReduce involving daemons like Job Tracker, Task Tracker, Name Node, Data Node.

Hands on experience in major Big Data components Apache spark, Zookeeper.

Worked on using Map reduce programming model for analyzing the data stored in Hadoop.

Experience and in-depth understanding of analyzing data using HIVEQL, PIG Latin.

Worked extensively with HIVE DDLs and Hive Query language (HQLs) and Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.

In-depth understanding of NoSQL databases such as HBase.

Sound Knowledge in Map side join, Reducer side join, Shuffle & Sort, Distributed Cache, Compression techniques, Multiple Input & output.

Proficient knowledge and hands on experience in writing shell scripts in Linux.

Adequate knowledge and working experience in Agile & Waterfall methodologies.

Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.

Have a fairly good understanding of Storm and Kafka.

Hands on experience in Sequence files, Combiners, Counters, Dynamic Partitions and Bucketing for best practice and performance improvement.

Experienced in job workflow scheduling and monitoring tools like Oozie.

Experience in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as webserver, telnet sources etc.

Experience using various Hadoop Distributions (PIVOTAL, Hortonworks, MapR etc) to fully implement and leverage new Hadoop features

Experienced in requirement analysis, application development, application migration and maintenance using Software Development Lifecycle (SDLC) and Java/J2EE technologies.

Experience in development of Client/Server Technologies and Systems Software design and development using Java/ JDK, Java Beans, J2EE(TM) Technology- J2EE technologies such as Spring, Struts, Hibernate, Servlets, JSP, JBOSS, JavaScript and JDBC and web Technologies like HTML, CSS, PHP, XML.

Experienced in backend development using SQL, stored procedures on Oracle 9i, 10g and 11i

Worked on various Tools and IDEs like Eclipse, IBM Rational, Visio, Apache Ant-Build Tool, MS-Office, PLSQL Developer, SQL*Plus

Experienced working in BEA WebLogic Server and IBM WebSphere Application Server

Expertise in full life cycle development of system, requirement elicitation, making Use Cases, Class Diagram, and Sequence Diagram.

Conscientious team player and motivated to learn and apply new concepts. Always aspires to exceed client expectations and to effectively collaborate with several cross-functional teams.

Worked with geographically distributed and culturally diverse team, including roles that involve interaction with clients and team members.

SKILLS:

Big-Data /Hadoop Technologies

MapReduce, PIG, HIVE, SQOOP, FLUME, HDFS, Oozie

NO SQL Database

Hbase

Real Time/Stream processing

Apache Storm, Apache Spark

Programming Languages

JAVA, C++, C, SQL, PL/SQL

Java Technologies

Servlets, JavaBeans, JDBC, JNDI, JTA, JPA, EJB 3.0

Framework

JUnit and JTest, LDAP

Databases

Oracle8i/9i, MY SQL, MS SQL server

IDE's & Utilities

Eclipse, NetBeans

Web Dev. Technologies

HTML, XML

Protocols

TCP/IP, HTTP and HTTPS

Operating Systems

Linux, MacOS, Windows 8, Windows 7, Vista, XP, Windows 95/2000 and MS-DOS

EDUCATION:

Bachelor of Technology

Certification:

Program with PL/SQL

Test Id: OC1440517

Exam Number: 1Z0-147

PROFESSIONAL EXPERIENCE:

Waddell & Reed, KS October 2015 -Present Hadoop Developer

Client : Waddell and Reed

Working on projecting involving migration of data from the mainframes to HDFS data lake and creating reports by performing transformations on the data put in the Hadoop data lake.

Built python script to extract the data from the Hawq tables and generated a “dat” file for the downstream application

Built a generic framework to parse raw data with fixed length using python which takes JSON Layout for the fixed positions of the strings and load the data into Hawq tables.

Built generic framework that transforms two or more data sets in HDFS using python.

Built generic frameworks for Sqoop/Hawq to load data from SQL server to HDFS and HDFS to Hawq using python.

Performed extensive data validation using Hawq partitions for efficient data access.

Built generic framework that allows for us to update the data in a Hawq tables using python.

Created external tables pointing to HBASE to access table with huge number of columns.

Wrote python code using happybase library of Python to connect to HBASE and use the HAWQ querying as well.

Coordinated in all testing phases and worked closely with Performance testing team to create a baseline for the new application.

Created automated workflows that schedule jobs daily for loading data and other transformation jobs in using cisco tidal.

Created PostgreSQL functions (stored procs) to populate the data into the tables on daily basis.

Developed functions using PL python for various use cases.

Documented technical design documents and production support documents.

Worked on SSIS and SSRS tools to aid in the decommission of the data from SQL to distributed environment.

Wrote python scripts to create automated workflows.

Technology Platforms: PHD-2.0,HAWQ 1.2, SQOOP 1.4, Python 2.6, SQL

Comcast, Philadelphia, PA July 2015 –October 2015 Hadoop Developer

Project specific skills: MapReduce, Scala, Hbase, Sqoop, Play Frame Work, Akka Cluster, Shell script.

Project Description: One of the products of Comcast is its internal statistics generation tool called Touchpoints. It is used to monitor the customer paths after a ticket is logged with the customer service representative. It provides Visual interface in more robust and easy to understand form. The solution uses play framework and Scala for the frontend development. I also wrote MapReduce jobs to process the data from hdfs and put it in Hbase.

Responsibilities:

Pulled the data from data warehouse using Sqoop and placed in HDFS.

Wrote MapReduce jobs to join data from multiple tables and convert it to CSV files.

Worked with Play Framework to design the frontend of the application.

Wrote programs on Scala to support the play framework and act as code behind for the frontend application.

Wrote programs in java and at times Scala to implement intermediate functionalities like events or records count from the Hbase.

Configured multiple remote akka worker nodes and Master nodes from scratch to as per the software requirement specifications.

Also wrote some pig scripts to do ETL transformations on the MapReduce processed data.

Involved in review of functional and non-functional requirements.

Responsible to manage data coming from different sources.

Wrote shell scripts to pull the necessary fields from huge files generated by MapReduce jobs.

Converted ORC data from hive into flat file using mapReduce jobs.

Creating Hive tables and working on them using Hive QL.

Supported the existing MapReduce Programs those are running on the cluster.

Followed agile methodology for the entire project.

Prepare technical design documents, detailed design documents.

Environment: Linux - Ubuntu, Hadoop pseudo distributed mode 1.2.1, HDFS, Hive, Hortonworks, Flume, Hive.

DirectTV, LosAngeles, CA Oct 2013 –July 2015 Hadoop Developer

Project specific skills: Hive, Linux, MySQL, Sqoop, Flume, Spark,

Project Description: One of the crucial products of DirectTV is its Call Monitoring Engine for events tracking and processing. The volume of call detail records (CDRs) is huge and on the top the analysis of dropped call, irregular sound quality and measures like these on those CDR’s pose additional difficulties to the already existing problem of huge volume. As a solution we use Apache Flume to ingest millions of CDRs per second into Hadoop and also we use Apache Spark to processes those in real-time and identify any troubling patterns and thus to continuously improve call quality, customer satisfaction and servicing margins by using Hiveql. We use HDFS for long-term data retention for root cause analysis in future. Flume servers are established along the network proxy servers to gather the data and store into HDFS. The gathered data are analyzed using Spark Jobs. After processing with Spark, Hive tables are create and by using partitioning and bucketing and we have created intermediate results and also created some UDF’s to categorize different calls using Hive.

Responsibilities:

Converting the existing relational database model to Hadoop ecosystem.

Generate datasets and load to HADOOP Ecosystem

Worked with Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop.

Worked with Spark to create structured data from the pool of unstructured data received.

Managed and reviewed Hadoop and HBase log files.

Involved in review of functional and non-functional requirements.

Responsible to manage data coming from different sources.

Loaded the CDRs from relational DB using Sqoop and other sources to Hadoop cluster by using Flume.

Involved in loading data from UNIX file system and FTP to HDFS.

Designed and implemented HIVE queries and functions for evaluation, filtering, loading and storing of data.

Creating Hive tables and working on them using Hive QL.

Wrote Spark code to convert unstructured data to structured data.

Developed Hive queries to analyze the output data.

Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.

Had to do the Cluster co-ordination services through ZooKeeper.

Collected the logs data from web servers and integrated in to HDFS using Flume.

Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.

Design and implement Spark jobs to support distributed data processing.

Supported the existing MapReduce Programs those are running on the cluster.

Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.

Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.

Followed agile methodology for the entire project.

Installed and configured Apache Hadoop, Hive and Pig environment

Prepare technical design documents, detailed design documents.

Environment: Linux - Ubuntu, Hadoop pseudo distributed mode 1.2.1, HDFS, Hive 0.12, Flume, Hortonworks, Spark, Flume, Hive.

NetCracker, Atlanta, GA Jan 2013 -Oct 2013

Hadoop Developer

Project specific skills: Hive, Pig, Linux, MySQL, Sqoop, MapReduce and Flume.

Project Description: NetCracker uses the employee browsing habits at work to draw reports regarding the productivity of the employees. Hence we collected data and transmitted the data into Hbase/HDFS. We used Hadoop ecosystem to collect the Big Data and analyze the information. Flume servers were established along the network proxy servers to gather the data and store into HDFS. The gathered data was analyzed using Map Reduce Jobs and presented for further processing.

Responsibilities:

Utilized Flume to filter out the input data read to retrieve only the data needed to perform analytics by implementing flume interception.

Used Flume to transport logs to HDFS

Worked on Pig script to count the number of times a particular URL was opened in a particular duration. Later a comparison of the count of various other URL’s shows the relative popularity of that particular website among employees.

Hive was used to pull out additional analytical information.

Worked on Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.

Involved in moving all log files generated from various sources to HDFS for further processing through Flume

Worked on Hue interface for querying the data.

Involved in writing MapReduce programs for analytics

Also used MapReduce for structuring the data coming from flume sinks.

Managing and scheduling Jobs on a Hadoop cluster using Oozie.

Generated the datasets and loaded to HADOOP Ecosystem.

Performed the installation, configuration and used the Hadoop ecosystem components such as Map Reduce, HDFS, Pig, Hive, Scoop, Flume, HBase.

Environment: Hadoop, Cloudera Manager, Map Reduce, Hive, Flume, Pig.

Cognosante-Houston, TX Feb2010-Dec 2012

Java Developer

Project specific skills:JSP, JSF, HTML,CSS,PL/SQL,UML

Project Description: The Office of Cognosante is at the forefront of the administration's health IT efforts and is a resource to the entire health system to support the adoption of health information technology and the promotion of nationwide health information exchange to improve health care.

Responsibilities:

Worked with several clients with day-to-day requests and responsibilities.

Involved in analyzing system failures, identifying root causes and recommended course of actions.

Integrated Struts, Hibernate and JBoss Application Server to provide efficient data access.

Involved in HTML page Development using CSS and JavaScript.

Developed the presentation layer with JSF, JSP and JAVA Script technologies.

Designed table structure and coded scripts to create tables, indexes, views, sequence, synonyms and database triggers. Involved in writing Database procedures, Triggers, PL/SQL statements for data retrieval.

Developed the UI components using JQuery and JavaScript Functionalities.

Designed database and coded PL/SQL stored Procedures, triggers required for the project.

Used Session and Faces Context of JSF Objects for passing content from one Bean to other.

Designed and developed Session Beans to implement business logic.

Tuned SQL statements and Web Sphere application server to improve performance, and consequently met the SLAs.

Created the EAR and WAR files and deployed the application in different environment.

Engaged in analyzing requirements, identifying various individual logical components, expressing the system design through UML diagrams using Rational Rose.

Involved in running shell scripts for regression testing.

Extensively used HTML and CSS in developing the front-end.

Designed and Developed JSP pages to store and retrieve information.

Environment: Java, J2EE, JSP, Java Script, JSF, Spring, XML XHTML, Oracle9i, PL/SQL, SOAP Web service, Web Sphere, Oracle, JUnit, SVN.

AVIVA, Hyderabad, India Apr2007 –Dec2009

Graduate Trainee/Programmer Analyst

Project specific skills: PL/SQL, Linux, SDLC, ANT, Java (core)

Project Description: Workforce Management System (WMS) is a web-based application used by the employees of AVIVA to update their weekly timesheets and submit them either weekly or monthly to historical database (HDB) for approval. Managers access the WMS to approve/revert through the web application. Thus, HDB contains the time sheet data filled in by employees, which acts as a source for measuring the productivity of a particular employee for that particular month. This data is extracted using PL/SQL program that runs weekly. The extraction program extracts the timesheet and shift related data from HDB applies transformation rules and inserts the data into the Integration database (IDB).

Responsibilities:

Prepared program Specification for the development of PL/SQL procedures and functions.

Created Custom Staging Tables to handle import data.

Created custom triggers, stored procedures, packages and functions to populate different database.

Developed SQL* loader scripts to load data in the custom tables.

Run Batch files for loading database tables from flat files using SQL*loader.

Created UNIX Shell Scripts for automating the execution process.

Developed PL /SQL code for updating payment terms.

Created indexes on tables and Optimizing Stored Procedure queries.

Design, Development and testing of Reports using SQL*plus.

Modified existing codes and developed PL/SQL packages to perform certain specialized functions/enhancement on oracle application.

Created Indexes and partitioned the tables to improve the performance of the query.

Involved in preparing documentation and user support documents.

Involved in preparing test plans, unit testing, System integration testing, implementation and maintenance.

Environment: Oracle 9i/10g, PL/SQL, SQL*Loader, SQL Navigator, SQL*Plus, UNIX, Windows NT, Windows2000.

Contact this candidate