Data Developer

Location:

Cincinnati, OH

Posted:

January 15, 2019

Contact this candidate

Resume:

BHARATH SUDHINI

+1-682-***-**** /*******.*********@*****.***

Hadoop (Big data)/Spark Developer

Over all 9 Years of IT experience which includes Hadoop Eco system components, Java development.

Having 4.5 years of experience on Big Data Hadoop echo systems, Hive, Impala, Map Reduce, Sqoop, oozie, spark sql, spark, Python, pig, Hbase, Kafka.

Having 2 years of experience in python and pyspark.

One-year experience in AWS

Worked on data operation to ingest data in data lakes

Experience in Amazon AWS cloud which includes services like: EC2, S3, EMR.

Experience in deployment of Hadoop Ecosystems like Map Reduce, Yarn, Sqoop, Flume, Pyspak,Pig, Hive, Hbase, SPARK, SCALA, Cassandra, Zoo Keeper, Storm, Impala, Kafka

Drawing on Experience in all aspects of analytics/data warehousing solutions (Database issues, Data modeling, Data mapping, ETL Development, metadata management, data migration and reporting solutions) I have been key in delivering innovative database /data warehousing solutions to the Retail, Pharma and Finance Industries.

Worked on existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, experienced in maintaining the Hadoop cluster on AWS EMR.

Expertise in Nosql and Relational databases, Mysql, Hbase, Oracle database and data integrating with hive database.

Worked creating receivers and topics in Kafka streaming.

Experience in reporting tools like SAP BO, tableau to visualise the to integrate with Hadoop eco systems.

Experienced with Shell Scripting and Linux Architecture

Having good experience on Python scripting language to use in spark development life cycle like

Developed multiple programs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.

Having good experience in GIT and Maven

Worked on SparkSql to create data frames on the data coming from hdfs with the different file formats like ORC, JSON, PARQUETAVRO and storing the data back to hdfs.

Worked on Spark RDDS to do the transformations on the data.

Experience in importing data from different sources like HDFS/Hbase into Spark RDD.

Exported the required data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis

Solid understanding of the Hadoop distributing file system data handling in the hdfs which is coming from other sources.

Worked on data processing and transformations and actions in spark by using Python (Pyspark) language.

Flexible and versatile to adapt to any new environment

Education Summary:

B. Tech from Raja Mahindra from college from JNTU University.

Technical Skills

Environments

Hadoop/Bigdata, Hortonworks, cloudera aws

Programming Languages

Python, Scala, Shell Scripting, SQL, Java.

Reporting Tools

Business Objects, Tableau

Database/NO SQL

Mysql, Oracle 11i,SQL server

Hadoop

Pyspark, Spark streaming, Hive, Pig, Hbase, Impala, Scala, Spark, Sqoop, Oozie, Storm, Kafka, Team city, Rabbitmq, java services, PCF

Operating Systems

Linux, ubuntu

84.51(Kroger), Cincinnati, OH august ‘2018 to till Now

PySpark/Hadoop Developer

Responsibilities:

Migrating Legacy applications into Bigdata cluster and its ecosystem.

Transform and analyze the data using Pyspark, HIVE, based on ETL mappings

Used rabbitmq, mongodb for getting massages and storing the documents.

Application performance tuning to optimize resource and time utilization.

Design application flow and implement end to end from gathering requirements, Build Code, perform testing, deploying into production

Developed pyspark programs and created the data frames and worked on transformations.

Worked spark transformations on source files to load the data into in hdfs.

Developed performance tuning in spark program for different source systems domains and inserted into harmonized layer.

Automated scripts using oozie and implement in production.

Developed automic scripts for scheduling oozie, Sqoop jobs daily or weekly basis.

Worked on agile environment,Jira,GitHub version control and team city for continuous build.

Environment:SPARK,python,HDFS,Sqoop,cloudera,pyspark,jira,github,teamcity,rabbitmq, java services,PCF.

Vantiv, Cincinnati, OH Feb ‘2018 to aug 2018

Spark/Hadoop Developer

Responsibilities:

Ingest data from different sources to BDA to build Enterprise Bigdata data warehouse.

Migrating Legacy applications into informatica IDQ leveraging the Bigdata cluster and its ecosystem.

Transform and analyze the data using Spark, HIVE, PIG based on ETL mappings

Used DataStage, informatica BDM and Exadata to perform ETL and prepare data lakes for various domains

Extract data to HDFS from Teradata/Exadata using Sqoop for settlement and billing domain.

Application performance tuning to optimize resource and time utilization.

Design application flow and implement end to end from gathering requirements, Build Code, perform testing.

Performing functional and regression testing in support of quality of IT products for business users creating

Developed spark programs and created the data frames for hive tables

Worked spark transformations on source table to load the data into harmonized tables in hive

Developed performance tuning in spark program for different source systems domains and and inserted into harmonized layer.

Automated scripts using oozie and implement in production.

Environment: Hive, SPARK, python, SPARKSQL, HDFS SAPBO, Sqoop, cloudera, pyspark

VISA, FOSTER CITY JUNE‘2017 to FEB 2018

Spark/Hadoop Developer

Responsibilities:

• Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.

• Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, Oozie, Zookeeper, Sqoop and Spark with Cloudera distribution.

• Hands on experience on Cloudera Hue to import data on to the Graphical User Interface.

• Developed Spark Applications by using Scala and Implemented Apache Spark data processing project to handle data from various RDBMS sources.

• Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame and Dataset.

• Developed Preprocessing job using Spark Data frames to flatten JSON documents to flat file.

• Used HiveQL to analyze the partitioned and bucketed data, Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business specification logic.

• Experience in using Avro, Parquet, RC File and JSON file formats, developed UDFs in Hive and Pig.

• Worked with Log4j framework for logging debug, info & error data.

• Developed Sqoop and Kafka Jobs to load data from RDBMS, External Systems into HDFS and HIVE.

• Developed Oozie coordinators to schedule Sqoop, Hive scripts to create Data pipelines.

• Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

• Generated various kinds of reports using Power BI and Tableau based on Client specification.

• Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes.

• Worked with SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: Spark, Spark SQL, HDFS, Hive, Sqoop, Scala, Shell scripting, Linux, Oozie, Tableau.

Environment: Hive, SPARK, python,, SPARKSQL, HDFS, Sqoop, cloudera, pyspark

National oil varco FEB ‘2016 to MAY 2017

Spark/Hadoop lead Developer

Responsibilities:

Using Sqoop the structured data in Mysql has been brought into HDFS.

Written a code using SCALA for different transformations in SPARK. And creating RDD’s over that.

Did the integration with spark and Kafka to get the message from Kafka servers?

Worked on consuming Kafka massages to spark and created D streams for the data aggregations on it

Processing the data (this is customer transaction related data) and developed the views day wise, weekly and monthly transaction summary customer wise branch wise and zonal wise.

Applied the hive views, so as to acquire the different types of transactions as per our requirement.

Created RDD to do the transformation on the data which we get from different data sources.

Developed spark accumulators to analyze the data recursions which we written on the data transformations.

Created Kafka producers to get the data from different servers and published to topics

Maintaining Kafka streaming flow.

Developed on broadcast variable to put the data on single variable and used it for entire process.

Created persist and cache to store the require RDDs on the memory.

Developed shell script to pull data from HDFS and apply the incremental and full load to the Hive tables.

Responsible for preparing and presenting the data metrics as an input to senior management for user data based upon the age, demographics and other user criteria.

Experienced with Shell Scripting and Linux Architecture

Setup Monitoring shell scripts which helped prevent financial loss for client

Created shell Scripts to log failed transactions and find their root cause.

Uploading the processed data into SAP BO for report generation.

Analyzing buying patterns of customers using Customer logs in the form of JSON file format

Sending the analysed data to hdfs for further use

Analyzed test results, including user interface data presentation, output documents, and database field values, for accuracy and consistency.

Developed data requirements, performed database queries to identify test data, and to create data procedures with expected results.

Environment: Hive, SPARK, SCALA, SPARKSQL, HDFS SAPBO, Sqoop, Kafka, SPARK streaming, cloudera

Client: UHG, INDIA JUL ‘2014 to DEC’2015

Spark/Hadoop Lead developer

Responsibilities:

Responsible for writing Spark jobs to handle files in multiple formats (JSON, Text, and Avro)

Created external tables and managed to as per with requirements.

Extensively used Korn Shell Scripts for doing manipulations of the flat files, given by the share brokers.

Prepared ETL standards, Naming conventions and wrote ETL flow documentation for Stage, ODS and Mart.

Wrote custom support modules for upgrade implementation using Pl/Sql, Unix Shell Scripts

Developed Sqoop scripts to import and export data from and to relational sources by handling incremental data loading on the customer transaction data by date.

Worked on spark accumulators to analyze the data recursions and which are variables that are only “added” to, such as counters and sums.

Written a code using python for different transformations in SPARK. And creating RDD’s over that.

Worked on persist and cache to store the require RDDs on the memory and to use when it is necessary on the other transformations

Worked on different rdds to transform the data coming from different sources and transform the data into required format

Developed different action in SPARK to retrieve the results of the data sources which will be required transformed format

Developed spark coding using python scripting to analyze the data we are getting from different sources

Worked on scala code for different transformations and actions.

Created data frames in a SPARK SQL from data in HDFS and did transformations analyzed the data and stores back into the HDFS

Developed transformations and actions using python programming

Integrated the spark with Hadoop eco systems and stores the data Hadoop file system(HDFS)

Worked extensively on creating combiners, Partitioning, distributed cache to improve the performance of Map Reduce jobs.

Worked on broadcast variable to put the data on single variable and used it for entire process.

Involved in loading and transforming large sets of structured, semi structured data from databases into HDFS using Sqoop imports.

Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, JSON, CSV formats

To stay at par with the developments in Hadoop and worked with a multiple data sources

Importing and exporting different kinds of data like incremental, updated, and column base data from RDBMS to HIVE

Environment: Spark, python, Hdfs, Sparksql, oracle, Pyspark, Kafka, Tableau, Impala and Hive, Hortonworks

Infor India Private Limited NOV ‘2012 to JUN’ 2014

Java Developer

Responsibilities:

Involved in development and maintenance of Product development and fixed the issues.

Implemented new Functional module using the J2EE and customized framework (OA).

Developed new screens for using JSP and Servlets.

Customized Business module using the EJB and JAVA.

Worked on database tables creation and transformations on that using SQL

Having good experience SQL /PLSQL

Involved in SCRUM meetings and developed and fixed the issues.

Developed Framework Manager Models and analyzed those models in analysis studio.

Fixed the standard issues and client generated issues.

Involved in maintaining and developing the metadata model using Framework Manager.

Installing and configuring Applications.

Environment: Java, Open Architecture, Linux, Core Java (OOPs and collections), J2EE Framework, JSP,

Servlets, ANT, MAVEN, GIT, Java Script, Shell scripting, oracle sql

Principia SFP, INDIA JAN ‘2011 to OCT’ 2012

Java developer

Responsibilities:

Analysis of the specifications provided by the clients.

Involved in bug fixes as well as enhancements to the existing project.

Preparing the High level design as per the requirement.

Worked on exception handling using Core Java

Worked on multithreading in Java

Worked on sql queries for database development and alterations

Worked on stored procedures using sql

Proficiency in SQl/PLSQ

Developing the application and do the unit testing.

Planning and scheduling various releases based on the customer requirements.

Environment: CoreJava, C++, Sql, Tortoise SVN, JDBC, Hibernate, J2EE

Luensen Technologies, India JAN ‘2009 to OCT ‘2010

PL/SQL Programmer

Responsibilities:

Worked on building up the database in Oracle

Created Data Structures. i.e. tables & views and applied the referential integrity.

Worked as an administrator and assigned rights to the users, groups for accessing the database.

Responsible for creating and modifying the PL/SQL Procedure, Function, Triggers according to the business requirement.

Created Indexes, Sequences and Constraints.

Created Materialized views for summary tables for better Query performance.

Identified source system, their connectivity, related tables and fields and ensured data consistency for mapping.

Worked closely with users, decision makers to develop the Transformation Logic to be used in Informatica Power Center.

Converted the business rules into technical specifications for ETL process for populating fact and dimension table of data warehouse.

Created mappings, transformations using Designer, and created sessions using Workflow Manager.

Created staging tables to do validations against data before loading data into original fact and dimension tables.

Involved in loading large amounts of data using utilities such as SQL Loader.

Designed and developed Oracle Reports for the analysis of the data.

Environment: Visual Basic 6.0, Oracle 8i, PL/SQL, Crystal Report 6, Erwin, Windows NTs

Contact this candidate