G.PRAVEENKR
HADOOP/SPARK DEVELOPER
*************@*****.*** +1-779-***-****
OBJECTIVE:
oTo utilize my technical and management skills for achieving the target and developing the best performance.
I would like to implement my innovative ideas, skills and creativity for accomplishing the projects.
SUMMARY
o3+ years of overall experience in IT Industry which includes experience in, Hadoop Administration and Big data technologies and web applications in multi-tiered environment using Hadoop, Spark, Hive, HBase, Pig, Sqoop and Kafka.
oHaving strong technical skills in Core Java with working knowledge.
oHands on experience in programming languages like C, C++ to create new applications.
oC++ developer with experience in object-oriented analysis and design (OOAD).
oExperience in managing and reviewing Hadoop log files.
oBuilding a Data Quality framework, which consists of a common set of model components and patterns that can be extended to implement complex process controls and data quality measurements using Hadoop.
oImplemented the Spark Scala code for Data Validation in Hive
oWorking experience on Hortonworks distribution and Cloudera Hadoop distribution versions CDH4 and CDH5 for executing the respective scripts.
oExtract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
oExperience in setting up HIVE, PIG, HBASE, and SQOOP on Linux Operating System.
oExperience on Apache Oozie for scheduling and managing the Hadoop Jobs. Extensive experience with Amazon Web Services (AWS).
oWork with Data Engineering Platform team to plan and deploy new Hadoop Environments and expand existing Hadoop clusters.
oManaging and scheduling Jobs on a Hadoop cluster using Airflow DAG.
oLoaded the data into Spark RDD and do in memory data Computation to generate the Output response.
oWorked on extracting files from MongoDB through Sqoop and placed in HDFS and processed.
oHands on experience in loading data from UNIX file system to HDFS. Also performed parallel.
oWorked with Agile methodology and Involved in daily Scrum meetings, Sprint planning. Development process tools like Jira.
oProcedures, Functions, Packages, Views, materialized views, function-based indexes and Triggers, Dynamic SQL, ad-hoc reporting using SQL.
oInvolved in all Software Development Life Cycle (SDLC) phases of the project from domain sharing, knowledge requirement analysis, system design, implementation and deployment.
oGood Experience of working on Linux and windows operating systems.
oProvide high-level customer support to remote clients using a support e-ticketing system.
oCollected, organized, and documented infrastructure project attributes, data, and project metrics.
oPerform testing, install, configure, and troubleshoot various software programs. Write, modify, and maintain software documentation and specifications.
oProcessed data load requests, manually entered data, reconciled data conflicts, and created data extracts and reports.
oInvolved in all Software Development Life Cycle (SDLC) phases of the project from domain knowledge sharing, requirement analysis, system design, implementation and deployment.
oExperience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
oExperienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
Programming Languages : Java, Scala, Unix Shell Scripting,
Big Data Ecosystem : HDFS, HBase, Map Reduce, Hive, Pig, Spark, Kafka,
Sqoop, Impala, Cassandra, Oozie, Zookeeper, Flume.
DBMS : Oracle 11g, MySQL,
Modeling Tools : UML on Rational Rose 4.0
Web Technologies : HTML5, CSS3.
IDEs : Eclipse, Net beans, WinSCP, Visual Studio and Intellij.
Operating systems : Windows, UNIX, Linux (Ubuntu), Solaris, Centos.
Servers : Apache Tomcat
Frameworks : MVC, Maven, ANT.
PROFESSIONAL EXPERIENCE
Client: TSC, Brentwood, TN
Role: Internship
Project Description:
Tractor supply is the retail chain of stores that offers, products for home important agriculture, lawn and garden maintenance and livestock equine and pet care. It is leading USA retailer in its market. It was founded in 1939
Which was operating 1700 stores in 49 states. It has the revenue of US 1.7 billion.
oDeveloped solutions to process data into HDFS (Hadoop Distributed File System), process within Hadoop and emit the summary results from Hadoop to downstream systems.
oUsed to manage and review the Hadoop log files.
oInvolved in loading data from UNIX/LINUX file system to HDFS.
oImporting and exporting data into HDFS, Pig, Hive and HBase using SQOOP.
oDeveloped Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
oUsed Different Spark Modules like Spark core, Spark SQL, Spark Streaming, Spark Data sets and Data frames.
oUsed Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
oWorked on different file formats like Text files, Parquet, Sequence Files, Avro, Record columnar RC), ORC files.
oExperienced in implementing Spark RDD transformations, actions to implement business analysis.
oUnderstood complex data structures of different type (structured, semi structured) and de-normalizing for storage in Hadoop.
oInvolved in Agile methodologies, daily Scrum meetings, Sprint planning.
oDeveloped Kafka producer and consumer components for real time data processing.
oExtract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
oConsumed the data from Kafka queue using Spark. Configured different topologies for Spark cluster and deployed them on regular basis.
oTested Apache Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
oUsed Maven for building and deployment purpose.
Environment: Hadoop, MapReduce, HDFS, Spark, AWS, Hive, Java, Scala, Kafka, SQL, Pig, Sqoop, HBase, Zookeeper, MySQL, Jenkins, Git, Agile.
Client: Rythmos, India,
Role: Hadoop Developer
Project Description:
Rythmos is a business and technology consulting firm specializing in Data, Integration, Loyalty, Marketing, and
Big data and experts in the collective tools, technologies, and processes that can transform organization into
an agile, real-time, data-driven enterprise built on a scalable big data technology stack, leveraging an adaptive
integration layer to enable quick and easy multi-channel solutions.
Responsibilities:
oResponsible for building scalable distributed data solutions using Hadoop.
oThis project will download the data that was generated by sensors from the cars activities, the data will be collected in to the HDFS system online aggregators by Kafka.
oExperience in creating Kafka producer and Kafka consumer for Spark streaming which gets the data from different learning systems of the patients.
oSpark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model.
oUsed Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
oExperience in AWS to spin up the EMR cluster to process the huge data which is stored in S3 and push it to HDFS. Implemented automation and related integration technologies.
oImplemented Spark SQL to access hive tables into spark for faster processing of data.
oInvolved in Converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
oUsed Apache Oozie for scheduling and managing the Hadoop Jobs. Extensive experience with Amazon Web Services (AWS).
oDeveloped and updated social media analytics dashboards on regular basis.
oMonitored workload, job performance and capacity planning using Cloudera Manager.
oWorked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance Involved in moving all log files generated from various sources to HDFS for further processing through Flume and process the files by using some piggy bank.
oUsed Flume to collect, aggregate and store the web log data from different sources like web servers, mobile and network devices and pushed into HDFS. Used Flume to stream through the log data from various sources.
oUsing Avro file format compressed with Snappy in intermediate tables for faster processing of data. Used parquet file format for published tables and created views on the tables.
oCreated sentry policy files to provide access to the required databases and tables to view from impala to the business users in the dev, test and prod environment.
Environment: Hadoop, MapReduce, Cloudera, Spark, Kafka, HDFS, Hive, Pig, Oozie, Scala, Eclipse, Flume, Oracle, UNIX Shell Scripting.
EDUCATION:
oMaster of Science in Computer Technology, Eastern Illinois University, Charleston, IL.
oBachelor of Technology in Computer Science in Jawaharlal Nehru Technological University, India.