Data Developer

Location:

San Mateo, CA

Posted:

July 19, 2017

Contact this candidate

Resume:

Prudhvi Krishna

Senior Hadoop Developer

****************@*****.*** 415-***-****

PROFESSIONAL SUMMARY:

7+ years of IT experience, including 5 years work experience as a Hadoop consultant in big-data conversion projects gathering and analyzing customer’s technical requirements and .

Working experience on Cloudera, Horton Works Hadoop distribution.

Good Domain knowledge on Insurance, Banking and E-commerce.

In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, yarn, hive, SQOOP, HBase, Flume, Oozie, Name Node, Data Node, and Map Reduce concepts.

Experience in working with Map Reduce programs using Apache Hadoop to analyze large data sets efficiently.

Hands on experience in working with Ecosystems like Hive, Pig, SQOOP, Map Reduce, Flume, Oozie. Strong knowledge of Pig and Hive's analytical functions, and writing custom UDFs.

Experience in importing and exporting/importing data using SQOOP from HDFS to Relational Database Systems and vice-versa.

Good knowledge in Spark and spark components like Spark, SparkSQL.

Experienced in developing simple to complex Map/Reduce jobs, Hive and Pig to handle files in multiple formats (JSON, Text, XML, Avro, Sequence File and etc.)

Expertise in J2EE Frameworks, Servlets, JSP, JDBC, XML. Familiar with System Programming by using C, C++.

Extensive in-depth knowledge in OOAD concepts, Multithreading, Activity Diagrams, Sequence Diagrams and Class Diagrams using UML.

Experience using Design Patterns (Singleton, Factory, Builder) including MVC architecture.

Have very good exposure to the entire Software Development Life Cycle.

Excellent organizational and interpersonal skills with a strong technical background.

Quick Learner and ability to work in challenging and versatile environments and Self-motivated, excellent written/verbal communication skills.

Good experience in performing and supporting Unit testing, System Integration testing (SIT), UAT and production support for issues raised by application users.

Education:

Master’s in Computer Science

GPA : 3.4

TECHNICAL SUMMARY:

Languages/Scripting :

Java, Python, Pig Latin, Scala, HiveQL, SQL LINUX shell scripts, Java Script.

Big Data Framework/Stack :

Hadoop HDFS, MapReduce, YARN, Hive, Hue, Impala, SQOOP, Pig, HBase, Spark, Kafka, Flume, Oozie, Zookeeper, KNIME etc

Hadoop Distributions:

Apache Cloudera CDH5, Hortonworks HDP2.X

RDBMS :

Oracle, DB2, SQL Server, MySQL

No SQL Databases :

HBase, MongoDB

Software Methodologies :

SDLC- Waterfall / Agile, Scrum, JIRA

Operating Systems :

Windows XP/NT/7/8, REDHAT, Centos, Mac

IDE’s

Net beans, Eclipse

File Formats

XML, Text, Sequence, JSON, ORC, AVRO, and Parquet.

PROFESSIONAL EXPERIENCE:

Wells Fargo - New York, NY 2016 August – present

Hadoop/Spark Developer

Description:

Project is with Wells Fargo to analyze the sentiment of people towards the bank and its products. The data is stored in HDFS and processing is done using Spark. Hive is used for batch processing pipelines and HBase is used for making data available for users for random reads and writes.

Responsibilities:

Used Cloudera distribution for hadoop ecosystem.

Converted MapReduce jobs into Spark transformations and actions using Spark RDDs in python.

Written Spark jobs in python to analyze the data of the customers and sales history.

Used Kafka to get data from many sources into HDFS.

Involved in designing the row key in HBase to store Text and JSON as key values in HBase tables.

Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis. Worked with large Distributed systems for data storage.

Created hive external tables to perform ETL on data that is generated on daily basics.

Created HBase tables for random lookups as per requirement of business logic.

Performed transformations using spark and loaded data into HBase tables.

Performed validation on the data ingested to filter and cleanse the data in Hive.

Created SQOOP jobs to handle incremental loads from RDBMS into HDFS.

Imported data as parquet files for some use cases using SQOOP to improve processing speed for later analytics.

Collected log data from web servers and pushed to HDFS using Flume.

Worked on Agile Scrum methodology. We also used JIRA for bug tracking.

Environments: Hadoop, Hive, Flume, REDHAT6.x, Shell Scripting, Java, Eclipse, HBase, Kafka, Spark,

Python, Oozie, Zookeeper, CDH5.x, HQL/SQL, Oracle 11g.

Reyes Holdings, Rosemont, IL May 2014 -June 2016

Hadoop Developer

Description:

Reyes Holdings, aligned with leading brewers and foodservice providers, delivers the best-known brands and widest variety of food and beverage items to retailers around the world. I worked as a Hadoop Developer in Data Insights team where I performed analysis on big data sets on Hadoop clusters and helped the organization get advantage by finding out the customer trends which helped in market targeting, brands popularity region wise and advertisement investment allocation.

Responsibilities:

Work on the POC for Apache Hadoop framework initiation.

Work on Installed and configured Hadoop1.x MapReduce, HDFS, developed multiple MapReduce jobs in java for data cleaning and preprocessing.

Importing and exporting data into HDFS and HIVE using Sqoop.

Involve in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.

Implement Partitioning, Dynamic Partitions, Buckets in HIVE.

Responsible to manage data coming from different sources, worked with large distributed systems to store the data.

Monitor the running MapReduce programs on the cluster.

Responsible for loading data from UNIX file systems to HDFS.

Install and configure Hive and also write Hive UDFs.

Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.

Implement the workflows using Apache Oozie framework to automate tasks.

Develop scripts and automated data management from end to end and sync up b/w all the clusters.

Manage IT and business stakeholders, conduct assessment interviews, solution review sessions.

Review the code developed and suggest any issues w.r.t customer data.

Use SQL queries and other tools to perform data analysis and profiling.

Mentor and train the engineering team in use of Hadoop platform and analytical software, development technologies and also follow Agile methodology. JIRA is used for tacking.

Environment: Apache Hadoop, Java (jdk1.6), DataStax, Flat files, Oracle 11g/10g, MySQL, Toad 9.6, Windows NT, Centos, Sqoop, Hive, Oozie.

Hadoop Developer Sept 2013 - April 2014

Kaiser Permanente – Oakland, California.

Description:

Kaiser Permanente is physician-owned organizations, which provides medical care for Health Plan members in each respective region. This project is on Pharmacy application named ‘PIMS’ where Clinical Data Repository (CDR) contains all the Historical and batch Data. It is used to identify insights like maximum wait time of the patient at the pharmacy.

Responsibilities:

Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data.

Importing and exporting data into HDFS using SQOOP and Kafka.

Created Hive tables and working on them using Hive QL

Created partitioned tables in Hive for best performance and faster querying.

Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.

Worked on Hive UDF’s using data from HDFS.

Performed extensive data analysis using Hive.

Executed different types of joins on Hive tables.

Used Impala for faster querying purposes.

Created indexes and tuned the SQL queries in Hive.

Involved in scheduling Oozie workflow engine to run multiple Hive jobs

Develop HiveQL scripts to perform the incremental loads.

Worked on different Big Data file formats like text, sequence, avro, parquet and snappy compression.

Involved in identifying possible ways to improve the efficiency of the system.

Involved in generating the data cubes for visualizing. Worked in Agile process and used to follow Scrum Stand up process.

Environment: Hadoop, Hive, Pig, SQOOP, Kafka, Oozie, Impala, Flume, MySQL, Zookeeper, HBase, Cloudera Manager, Map Reduce.

Hadoop Developer. June 2012 – July 2013

Citi Bank – Tampa, Florida.

Description:

Citibank is an American international banking and financial services holding company with "hub quarters" throughout the country. This project is on understanding the market needs by analyzing the unstructured web logs data and provide accurate insights to help make the right decision on offers, benefits, and schemes.

Responsibilities:

Responsible to manage data coming from different sources

Involved in loading and transforming large sets of structured, semi structured and unstructured.

Developed Pig UDFs in Python for preprocessing the data.

Extensively Worked on Flat files.

Performed Joins, Grouping, and Count Operations on the Tables using Impala.

Developed pig Latin scripts for validating different query modes.

Worked on creating the workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.

Created SQOOP jobs to export analyzed data to relational database.

Created Hive tables, loaded data and wrote Hive queries that run within the map.

Implemented bucketing, partitioning and other query performance tuning techniques.

Generated various reports using Tableau with Hadoop as a source for data.

Environment: Hadoop, Map Reduce, Hive, Pig, Tableau, Python, SQOOP, Oozie, Impala, Flume, MySQL, Zookeeper, HBase, Cloudera Manager.

Century National Insurance, NYC, NY Dec 2010 – May 2012

Java Developer

Description:

The New York Motor Vehicle Commission (MVC) is the government agency responsible for titling, registering and providing plates and licensing drivers in the U.S state of New York. It also provides online support for Renewing titles, registrations, licenses etc.

Responsibilities:

Involved in deployment of full Software Development Life Cycle (SDLC) of the tracking system like Requirement gathering, Conceptual Design, Analysis, Detail design, Development, System Testing and User Acceptance

Worked in Agile Scrum methodology

Involved in writing exception and validation classes using core java

Designed and implemented the user interface using JSP, XSL, DHTML, Servlets, JavaScript, HTML, CSS and AJAX

Developed framework using Java, MySQL and web server technologies.

Developed and performed unit testing using JUnit framework in a Test-Driven environment (TDD).

Validated the XML documents with XSD validation and transformed to XHTML using XSLT

Implemented cross cutting concerns as aspects at Service layer using Spring AOP and of DAO objects using Spring-ORM

Spring beans were used for controlling the flow between UI and Hibernate

Services using SOAP, WSDL, UDDI and XML using CXF framework tool/Apache Commons

Worked on database interaction layer for insertions, updating and retrieval operations of data from data base by using queries and writing stored procedures

Wrote Stored Procedures and complicated queries for IBM DB2. Implemented SOA architecture with Web

Used Eclipse IDE for development and JBoss Application Server for deploying the web application

Used Apache Camel for creating routes using Web Service

Used JReport for the generation of reports of the application

Used Web Logic as application server and Log4j for application logging and debugging

Used CVS version controlling tool and project build tool using ANT

Environment: Java, HTML, CSS, JSTL, JavaScript, Servlets, JSP, Hibernate, Struts, Web Services,, Eclipse, JBoss, JSP, JMS, JReport, Scrum, MySQL, IBM DB2, SOAP, WSDL, UDDI, AJAX, XML, XSD, XSLT, Oracle, Linux, JBoss, Log4J, JUnit, ANT, CVS

Intercom, Hyderabad, Andhra Pradesh Jun 2009 to Oct 2010

Java Developer

Responsibilities:

Involved in designing and developing enhancements per business requirements with respect to front end JSP development using Struts.

Implemented the project using JSP and Servlets based tag libraries.

Conducted client side validations using JavaScript.

Coded JDBC calls in the Servlets to access the Oracle database tables.

Generate SQL Scripts to update the parsed message into Database.

Worked on parsing the RSS Feeds (XML) files using SAX parsers.

Designed and coded the java class that will handle errors and will log the errors in a file.

Developed Graphical User Interfaces using struts, tiles and JavaScript. Used JSP, JavaScript and JDBC to create Web Servlets.

Utilized the mail merge techniques in MS Word for time reduction in sending certificates.

Involved in documentation, review, analysis and fixed postproduction issues.

Worked on bug fixing and enhancements on change requests.

Designed the various animations with different graphics using with Macromedia Flash MX with Action Script 1.0, Photo Impact and GIF Animator.

Understanding the customer requirements, mapping them to functional requirements and creating Requirement Specifications.

Developed web pages to display the account transactions and Application UI creation using GWT, Java, JSP, CSS and web standards improving application usability always meeting tight deadlines

Responsible for the configuration of Struts web based application using struts-config.xml and web.xml

Modified Struts configuration files as per application requirements and developed Web services for non-java clients to obtain user information details pertaining to that account using JSP, DHTML, Spring Web Flow and CSS.

Environment: HTML/CSS/JavaScript/JSON, JDK 1.3, J2EE, Servlets, Java Beans, MDB, JDBC, MS SQL Server, JBoss, I frameworks & libraries Struts, Spring MVC, JQuery, MVC concepts, XML, SVN.

Contact this candidate