Data Developer

Location:

Hyderabad, Telangana, India

Salary:

$15000

Posted:

April 06, 2020

Contact this candidate

Resume:

Karthik Potharaju

Sr. Hadoop/Big data Developer

Email: *********@*****.***

Ph: 601-***-****

LinkedIn:

Professional Summary

Over 8+ years of overall IT experience in a variety of industries, which includes hands on experience of around 5+ years in Big Data technologies (1.0 and 2.0) and designing and implementing Map Reduce MR1 and MR2 architectures.

Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.

Good knowledge of Hadoop Development and various components such as HDFS, Job Tracker, Task Tracker, Data Node, Name Node and Map-Reduce concepts.

Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, HBase, Zookeeper, Sqoop, Oozie, Cassandra, Flume and Avro.

Experience in installation, configuration, Management, supporting and monitoring Hadoop cluster using various distributions such as Apache and Cloudera.

Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.

Involved in project planning, setting up standards for implementation and design of Hadoop based applications.

Written MapReduce programs with custom logics based on the requirement and writing custom UDFs in pig and hive based on the user requirement.

Involved in the pilot of Hadoop cluster hosted on Amazon Web Services (AWS).

Implemented NOSQL databases like HBase, Cassandra and MongoDB for storing and processing different formats of data.

Implemented Oozie for writing work flows and scheduling jobs. Written Hive queries for data analysis and to process the data for visualization.

Installed Spark and performed analyzing HDFS data and then, by caching a dataset in memory to perform a large variety of complex computations interactively

Experience in importing and exporting the different formats of data into HDFS, HBASE from different RDBMS databases and vice versa.

Developed applications using Spark for data processing.

Knowledge in Performance tuning of pySpark scripts

Replaced existing map-reduce jobs and Hive scripts with Spark Data-Frame transformation and actions for the faster analysis of the data.

Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.

Experience in developing ETL data pipelines using pyspark

Java, Python & Other Experience in installing and setting up Hadoop Environment in cloud though Amazon Web services (AWS) like EMR and EC2 which provide efficient processing of data.

Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.

Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.

Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.

Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.

Experience using middleware architecture using Sun Java technologies like J2EE, Servlets, and application servers like Web Sphere and Web logic.

Used Different Spark Modules like Spark core, Spark RDD's, Spark Data frame, Spark SQL.

Converted Various Hive queries into Spark transformations and Actions that are required.

Experience in working on apache Hadoop open source distribution with technologies like HDFS, Map-reduce, Python, Pig, Hive, Hue, HBase, SQOOP, Oozie, Zookeeper, Spark, Spark-Streaming, Storm, Kafka, Cassandra, Impala, Snappy, Green plum and MongoDB, Mesos.

Technical Skills

Hadoop Components

HDFS, Hue, MapReduce, PIG, Hive, HCatalog, Hbase, Sqoop, Impala, Zookeeper, Flume, Kafka, Yarn, Cloudera Manager, Kerberos,pyspark

Spark Components

Apache Spark, Data Frames, Spark SQL, Spark, YARN, Pair RDDs

Web Technologies / Other components

J2EE, XML, Log4j, HTML, XML, CSS, JavaScript,

Server Side Scripting

UNIX Shell Scripting.

Databases

Oracle 10g, Microsoft SQL Server, MySQL, DB2, Teradata

Programming Languages

Java, C, C++, Scala, Impala, Python.

Web Servers

Apache Tomcat, BEA WebLogic.

IDE

Eclipse, Dreamweaver

OS/Platforms

Windows 2005/2008, Linux (All major distributions), Unix.

NoSQL Databases

Hbase, MongoDB.

Methodologies

Agile (Scrum), Waterfall, UML, Design Patterns, SDLC.

Currently Exploring

Apache Flink, Drill, Tachyon.

Conduent, Madison, MS May 2017 – Till date

Sr. Hadoop / Spark Developer

Responsibilities:

Developed simple to complex MapReduce streaming jobs using Java language for processing and validating the data.

Developed data pipeline using MapReduce, Flume, Sqoop and Pig to ingest customer behavioral data into HDFS for analysis.

Developed MapReduce and Spark jobs to discover trends in data usage by users.

Implemented Spark using Python and Spark SQL for faster processing of data.

Developed functional programs in SCALA for connecting the streaming data application and gathering web data using JSON and XML and passing it to FLUME.

Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.

Used the Spark -Cassandra Connector to load data to and from Cassandra.

Real time streaming the data using Spark with Kafka.

Experienced in working with Amazon Web Services (AWS) EC2 and S3 in Spark RDD

Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, MapReduce and then loading data into HDFS.

Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.

Configured other ecosystems like Hive, Sqoop, Flume, Pig and Oozie.

Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis

Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.

Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.

Developed Pig Latin scripts to perform Map Reduce jobs.

Developed product profiles using Pig and commodity UDFs.

Worked on scalable distributed data system using Hadoop ecosystem.

Developed Hive scripts in HiveQL to De-Normalize and Aggregate the data.

Created HBase tables and column families to store the user event data.

Written automated HBase test cases for data quality checks using HBase command line tools.

Created UDF's to store specialized data structures in HBase and Cassandra.

Create and configured the AWS RDS/Redshift to use Hadoop Ecosystem on AWS infrastructure

Scheduled and executed work flows in Oozie to run Hive and Pig jobs.

Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra.

Used Tez framework for building high performance jobs in Pig and Hive.

Configured Kafka to read and write messages from external programs.

Configured Kafka to handle real time data.

Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka through persistence of data into HBase.

Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume.

Written Storm topology to emit data into Cassandra DB.

Written Storm topology to accept data from Kafka producer and process the data.

Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.

Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.

Developed interactive shell scripts for scheduling various data cleansing and data loading process.

Performed data validation on the data ingested using MapReduce by building a custom model to filter all the invalid data and cleanse the data.

Experience with data wrangling and creating workable datasets.

Developed schemas to handle reporting requirements using Jaspersoft.

Environment: Hadoop, MapReduce, Spark, Pig, Hive, Sqoop, Oozie, HBase, Zookeeper, Kafka, Flume, Solr, Storm, Tez, Impala, Mahout, Cassandra, Cloudera manager, MySQL, Jaspersoft, Multi-node cluster with Linux-Ubuntu, Windows, Unix.

ADT, Boca Raton, FL Aug 2016 – May 2017

Sr. Hadoop Developer/Big data developer

Responsibilities:

Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis

Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and Sqoop.

Implemented real time analytics pipeline using Confluent Kafka, storm, elastic search, splunk and green plum.

Design and develop Informatica BDE Application and Hive Queries to ingest to Landing Raw zone and transform the data with business logic to refined zone and to Green plum data marts for reporting layer for consumption thro Tableau.

Installed, configured, and maintained big data technologies and systems. Maintained documentation and troubleshooting playbooks.

Automated the installation and maintenance of Kafka, storm, zookeeper and elastic search using salt stack technology.

Developed connectors for elastic search and green plum for data transfer from a kafka topic. Performed Data Ingestion from multiple internal clients using Apache Kafka Developed k-streams using java for real time data processing.

Responded to and resolved access and performance issues. Used Spark API over Hadoop to perform analytics on data in Hive.

Exploring with Spark improving the performance and optimization of the existing algorithms Hadoop using Spark context, Spark-SQL, Data Frame, Spark YARN.

Imported and exporting data into HDFS and Hive using SQOOP & Developed POC on Apache-Spark and Kafka. Proactively monitored performance, Assisted in capacity planning.

Worked on Oozie workflow engine for job scheduling Imported and exported data into MapReduce and Hive using Sqoop.

Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS Good understanding of performance tuning with NoSQL, and SQL Technologies.

Knowledge in Performance tuning of pySpark scripts

Design/Develop framework to leverage platform capabilities using MapReduce, Hive

UDFs.

Worked on data transformation pipelines like Storm. Worked with operational analytics and log management using ELK and Splunk. Assisted teams with SQL and MPP databases such as Green plum.

Worked on Salt Stack automation tools. Helped teams working with batch-processing and tools in Hadoop technology stack (Map Reduce, Yarm, Pig, Hive, HDFS)

Environment: Java, Confluent Kafka, HDFS, Storm, Elastic Search, Salt Scripting, Green plum, k-streams, k-tables, splunk, Hadoop..,pyspark

Continental North American, Chicago, IL Jan 2015 – Aug 2016

Hadoop Developer

Responsibilities:

Involved in loading data from UNIX file system to HDFS. Imported and exported data into HDFS and Hive using Sqoop.

Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.

Devised procedures that solve complex business problems with due considerations for hardware/software capacity and limitations, operating times and desired results.

Analyzed large amounts of data sets to determine optimal way to aggregate and report on it. Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.

Responsible for building scalable distributed data solutions using Hadoop. Worked hands on with ETL process.

Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.

Handled Imported of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.

Extracted the data from Teradata into HDFS using Sqoop. Analyzed the data by performing Hive queries and running Pig scripts to know user behavior like shopping enthusiasts, travelers, music lovers etc.

Exported the patterns analyzed back into Teradata using Sqoop. Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

Installed Oozie workflow engine to run multiple Hive. Developed Hive queries to process the data and generate the data cubes for visualizing.

Environment: Hive, Pig, Apache Hadoop, Cassandra, Sqoop, Big Data, HBase, Zookeeper, Cloudera, Centos, No SQL, sencha extjs, java script, Ajax, Hibernate, Jms, web logic Application server, Eclipse, Web services, azure, Project Server, Unix, Windows.

Intense Technologies,India June 2012 – Dec 2014

Java Developer

Responsibilities:

Individually worked on all the stages of a Software Development Life Cycle (SDLC).

Used JavaScript code, HTML and CSS style declarations to enrich websites.

Implemented the application using Spring MVC Framework which is based on MVC design pattern.

Developed application service components and configured beans using (applicationContext.xml) Spring IOC

Designed User Interface and the business logic for customer registration and maintenance.

Integrating Web services and working with data in different servers.

Involved in designing and Development of SOA services using Web Services.

Understanding the requirements from business users and end users.

Working with XML/XSLT files.

Experience creating UML class and sequence diagram.

Experience in Creating Tables, Views, Triggers, Indexes, Constraints and functions in SQL Server2005.

Worked in content management for versioning and notifications.

Environment: Java, J2EE, JSP, spring, Struts, Hibernate, Eclipse, SOA, WebLogic, Oracle, HTML, CSS, Web Services, JUnit, SVN, Windows, UNIX.

Contact this candidate