Krishna Polavarapu
Sr. Hadoop Developer
Email: ************@*****.***
Ph #:832-***-****
Professional Summary
7+ years of extensive IT experience in all phases of Software Development Life Cycle (SDLC) with skills in data analysis, design, development, testing and deployment of software systems.
3+ years of strong experience, working on Apache Hadoop ecosystem components like MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Oozie, Zookeeper, Flume with CDH4&5 distributions and EC2 cloud computing with AWS.
Key participant in all phases of software development life cycle with Analysis, Design, Development, Integration, Implementation, Debugging, and Testing of Software Applications in client server environment, Object Oriented Technology and Web based applications.
Strong in Developing MapReduce applications, Configuring the Development Environment, Tuning Jobs and Creating MapReduce Workflows.
Experience in performing data enrichment, cleansing, analytics, aggregations using Hive and Pig.
Knowledge in Cloudera Hadoop distributions and few other majorly used distributions like Horton works and MapR.
Hands on experience in working with ClouderaCDH3 and CDH4 platforms.
Proficient in big data ingestion and streaming tools like Flume, Sqoop, Kafka, and Storm.
Experience with different data formats like Json, Avro, parquet, RC and ORC. Compressions like snappy &bzip.
Experienced in analysing data using HQL, PigLatin and extending HIVE and PIG core functionality by using custom UDFs.
Good Knowledge/Understanding of NoSQL data bases and hands on work experience in writing applications on NoSQL databases like Cassandra and MongoDB.
Good knowledge on various scripting languages like Linux/Unixshell scripting and Python.
Good knowledge of Dataware housing concepts and ETL processes.
Involved in importing Streaming data using FLUME to HDFS and analysing using PIG and HIVE.
Experience in importing streaming data into HDFS using Flume sources, and Flume sinks and transforming the data using Flume interceptors.
Experienced in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
Configured Zookeeper to coordinate the servers in clusters to maintain the data consistency.
Used Oozie and Control-M workflow engine for managing and scheduling Hadoop Jobs.
Diverse experience in working with variety of Database like Teradata, Oracle, MySql, IBM DB2 and Netezza.
Good knowledge in understanding Core Java and J2EE technologies such as Hibernate, JDBC, EJB, Servlets, JSP, JavaScript, Struts and spring.
Experienced in using IDEs and Tools like Eclipse, Net Beans, GitHub, Jenkins, Maven and IntelliJ.
Implemented POC to migrate map reduce programs into Spark transformations using Spark and Scala.
Ability to spin up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
Strong team player, ability to work independently and in a team as well, ability to adapt to a rapidly changing environment, commitment towards learning, Possess excellent communication, project management, documentation, interpersonal skills.
Technical Skills:
Big Data
Hadoop, HDFS, MapReduce, Hive, Sqoop, Pig, HBase, Flume, Zookeeper, Oozie, Impala, Kafka, Spark
Databases
SQL Server, MySQL, Oracle, Hbase, Netezza
Languages
SQL, PL/SQL, HTML, Java,J2EE, JSP, Servlets, Hibernate, JDBC JSP, UNIX Shell Scripting, Python
Tools
Eclipse, NetBeans, IntelliJ, Maven, Anthill, SQLExplorer, TOAD
Version Control
GitHub, SVN
Operating Systems
Windows Server 2008/2012, UNIX, LINUX
Packages
MS Office Suite, MS Visio, MS Project Professional
Other Tools
Putty, WINSCP, EDI(Gentran), Stream weaver, Amazon AWS
Technologies Interested to Learn:
Elastic Search, Splunk, Kibana.
Professional Experience:
Client: AXA Insurance, Charlotte, NC Aug 15 – Till Date
Role: Hadoop Developer
Project Description:
The AXA Insurance a leading financial protection company since 1859 which offers a variety of solutions that can help with your financial planning needs, including life insurance and annuities for retirement. This project mainly deals with Anti Money laundering program where transactions are monitored. Data from different sources systems are loaded into Hadoop Data Lake and shared to AML team.
Responsibilities:
Worked extensively on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming.
Analysed data which need to be loaded into hadoop and contacted with respective source teams to get the table information and connection details.
Used Sqoop to import data from different RDBMS systems like Oracle, DB2 and Netezza and loaded into HDFS
Created Hive tables and partitioned data for better performance. Implemented Hive UDF's and did performance tuning for better results.
Developed Map-Reduce programs to clean and aggregate the data.
Implemented optimized map joins to get data from different sources to perform cleaning operations before applying algorithms.
Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and monthly batch cycles.
Continuous coordination with QA team, production support team and deployment team.
Implemented test scripts to support test driven development and continuous integration.
Created analysis documents to understand table types (Truncate and load or incremental load), frequency of updates, source data base connection details etc.
Worked on documenting all tables created to ensure all transactions are drafted properly.
Analysed data by performing Hive queries and running Pig Scripts to study transactional behaviour of policies and plans.
Developed shell scripts to move files (received through SFTP) from landing zone server to HDFS, update the file tracker and send mails after the execution is complete.
Implemented POC to introduce Spark Transformations.
Participated in design and implementation discussion for developing Cloudera 5 Hadoop eco system and supported team when there are updates in Cloudera versions.
Worked in Agile development environment having KANBAN methodology. Participated in daily scrum and other design related meetings.
Environment: Hadoop, CDH, Map Reduce, Hive, Pig, Sqoop, HBase, Java, Spark, Oozie, Linux, Python, DB2, Oracle
Client: The Gordian Group, Greenville, SC Nov 13 – Jul15
Role: Hadoop Developer
Project Description:
The Gordian Group is a Construction Data management company mastered in providing construction procurement and complete cost data to its clients through Books, CD's and Web Application. Project IO is an analytics initiative from TGG to improve the way data is provided to current and future clients.
Responsibilities
Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
Developed Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
Involved in loading data from UNIX file system to HDFS.
Worked on analysing data with Hive and Pig and real time analytical operations using Hbase.
Create views over HBase table and used SQL queries to retrieve alerts and meta data.
Worked with HBASE NOSQL database.
Helped and directed testing team to get up to speed on Hadoop Data testing.
Worked on loading and transforming large sets of structured, semi structured and unstructured data.
Implemented Mapreduce secondary sorting to get better performance for sorting results in Map Reduce programs.
Worked on User Defined Functions in Hive to load data from HDFS to run aggregation function on multiple rows.
Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
Exploring with Spark, improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
Coordinated with testing team for bug fixes and created documentation for recorded data, agent usage and release cycle notes.
Created different UDF’s and UDAF’s to analyse partitioned, bucketed data and compute various metrics for reporting on dashboard and stored them in different summary tables.
Used Oozie Workflow engine to run multiple Hive and Pig jobs.
Created stored procedures, triggers and functions to operate on report data in MySQL.
Wrote backend code in Java to interact with the database using JDBC.
Environment: Hadoop, Map Reduce, Hive, Pig, Sqoop, HBase, Java, Oozie, Linux,UNIX.
Client: Huntington Bank, Columbus, OH Mar13 –Oct13
Role: Hadoop Developer
Project Description:
Developing Web application to consume incoming market trading data on a daily basis. and perform batch processing on trades to arrive at a risk weighted average. Considering factors such as collateral, exposure and counter party credit rating values. The web interface is to generate reports and provide a break down analysis of calculations and rules applied to arrive at the risk weighted average.
Responsibilities
Involved in analysing requirements and establish development capabilities to support future opportunities.
Involved in sharing data to teams which analyse and prepare reports on Risk management.
Handled importing of data from various data sources, performed transformations using PIG, MapReduce, loaded data into HDFS and extracted data from MySQL into HDFS using SQOOP.
Worked on streaming the analyzed data to the existing relational databases using SQOOP by making it available for visualization and report generation to the BI team.
Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
Involved in End to End implementation of ETL logic.
Effective coordination with offshore team and managed project deliverable on time.
Worked on QA support activities, test data creation and Unit testing activities.
Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis.
Designed and developed read lock capability in HDFS.
Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
Analysed Web server log data using Apache Flume.
Environment: Hadoop, Map Reduce, Hive, Pig, Sqoop, HBase, SQL, Oozie, Linux, UNIX.
Client: Hitachi solutions, India Dec12-Feb13
Role: SQL Developer
Project Description:
Hitachi Data Systems is a company that provides modular mid-range and high-end computer data storage systems, software and services. The project was to design, build a service and web application to manage orders, products, and generate reports.
Responsibilities:
Involved in development of Software Development Life Cycle (SDLC) and UML diagrams like Use Case Diagrams, Class Diagrams and Sequence Diagrams to represent the detail design phase.
Create new tables, views, indexes and user defined functions.
Perform daily database backup & restoration and monitor the performance of Database Server.
Actively designed database to fasten certain daily jobs and stored procedures.
Optimized query performance by creating indexes.
Developed Stored Procedures, Views to be used to supply data for all reports. Complex formulas were used to show derived fields and to format data based on specific conditions.
Involved in Administration of SQL Server by creating users & login ids with appropriate roles & grant privileges to users and roles. Worked on authentication modules to provide controlled access to users on various modules
Create joins and sub-queries for complex queries involving multiple tables.
Developed stored procedures and triggers using PL/SQL in order to calculate and update tables to implement business logic.
Responsible for report generation using SQL Server Reporting Services (SSRS) and Crystal Reports based on business requirements.
Developed complex SQL queries to perform efficient data retrieval operations including stored procedures, triggers etc.
Designed and Implemented tables and indexes using SQL Server.
Environment: Eclipse, Java/J2EE, Oracle, HTML, PL/SQL, Oracle, XML, SQL.
Client: Lloyds Banking Group, India Mar10 – Nov12
Role: Programmer Analyst/ SQL Developer
Project Description:
This project is mainly used to reduce manual process of account closure for almost four million accounts. The banker and user can easily access person loans, mortgages, debit, credit and all the accounts related to the user and settle them automatically before the closure of checking or saving account is done. Our team will support them by sorting data pulling updated records on weekly and monthly basis.
Responsibilities:
Developed SQL Scripts to perform different joins, sub queries, nested querying, Insert/Update and Delete data in MS SQL database tables.
Experience in writing PL/SQL and in developing and implementing Stored Procedures, Packages and Triggers.
Experience on modeling principles, database design and programming, creating E-R diagrams and data relationships to design a database.
Responsible for designing advance SQL queries, procedure, cursor, triggers.
Build data connection to database using MS SQL Server.
Worked on project to extract data from XML file to SQL table and generate data file reporting using SQL Server 2008.
Used Tomcat web server for development purpose.
Involved in creation of Test Cases for Unit Testing.
Environment: PL/SQL, My SQL, SQL Server 2008(SSRS & SSIS), Visual studio 2000/2005, MS Excel.
Client: Farmers Insurance, India Jun09 – Mar 10
Role: Java Developer
Project Description:
FIC is an American multinational financial services corporation with its data centres spread across Oklahoma and Chicago, United States. It facilitates insurance policies and electronic funds transfers throughout world through their payment gateways. FIC has vast database of seamless customers whose policies need to tracked and maintained every day.
Responsibilities:
Responsible for understanding the scope of project and requirement gathering.
Developed the web tier using JSP, Struts MVC to show account details and summary.
Created and maintained the configuration of the Spring Application Framework.
Implemented various design patterns - Singleton, Business Delegate, Value Object and Spring DAO.
Used Spring JDBC to write some DAO classes which interact with database to access account information.
Mapped business objects to database using Hibernate.
Involved in writing Spring Configuration XML files that contains declarations and other dependent objects declaration.
Used Tomcat web server for development purpose.
Involved in creation of Test Cases for Unit Testing.
Used Oracle as Database and used Toad for queries execution and also involved in writing SQL scripts, PL/ SQL code for procedures and functions.
Developed application using Eclipse and used build and deploy tool as Maven
Used Log4J to print logging, debugging, warning, info on server console.
Environment: Java, J2EE, JSON, LINUX, XML, XSL, CSS, Java Script, Eclipse