Hadoop Developer

Location:

Montreal, QC, Canada

Posted:

October 07, 2021

Contact this candidate

Resume:

Krishna prathi Big Data Developer

************@*****.***

647-***-****

Professional Summary:

Around 5+ years of professional experience involving project development, implementation, deployment, and maintenance using Big Data technologies in designing and implementing complete end-to-end Hadoop based data analytics solutions using HDFS, MapReduce, Spark, Scala, Yarn, Kafka, PIG, HIVE, Sqoop, Flume, Oozie, Impala, HBase. System architect with expertise in coding, Application design, Defined architectures, and successful project leadership.

Research, experiment, and utilize leading Big Data methodologies, such as Hadoop, Spark,

Apache, Redshift, Netezza, Pig, Hive, and Microsoft Azure.

Experience working on various Cloudera distributions like (CDH 4/CDH 5), Map-R distributions, Hortonworks distributions, and Knowledge of Amazon EMR Hadoop distributors.

Hands-on experience on Enterprise Data Lake to provide support for various use cases including Analytics, processing, storing, and reporting of voluminous, rapidly changing, structured and unstructured data.

Good knowledge of Scala's functional style programming techniques like Anonymous Functions

(Closures), Currying, Higher-Order Functions, and Pattern Matching.

Developed Python and Pyspark programs for data analysis on MapR, Cloudera, Hortonworks Hadoop clusters.

Worked on Jenkins for continuous integration and End-to-End automation for a poll the build and deployments by managing different plugins Maven and Ant.

Strong working experience in planning and carrying out of Teradata system extraction using Informatica, Loading Process and Data warehousing, Large-scale Database Management, and Reengineering.

Strong Knowledge of Architecture of Distributed Systems and Parallel processing, In-depth understanding of MapReduce programming paradigm and Spark execution framework.

Experienced in working with data architecture including pipeline design of data ingestion, Architecture information of Hadoop, data modeling, machine learning, and advanced data processing.

Strong Experience in working with Databases like Oracle 10g, DB2, SQL Server 2016, and MySQL

and proficiency in writing complex SQL queries.

Experienced in designing, built, and deploying a multitude of the application utilizing almost all the AWS stack (Including EC2, R53, S3, RDS, DynamoDB, SQS, IAM, and EMR), focussing on high- availability, fault tolerance, and auto-scaling.

Technical Skills:

Bigdata/Hadoop

Technologies

Hadoop, HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume,

Spark, Kafka, Storm, Drill, Zookeeper, and Oozie

Languages

Python (Scikit-learn, NumPy, Pandas, Seaborn), R (ggplot2), Hive,

Impala, Spark.

NO SQL Databases

Cassandra, HBase, MongoDB, MariaDB

Business Intelligence Tools

Tableau Server, Tableau Reader, Tableau, Splunk, SAP Business

Objects, OBIEE, SAP Business Intelligence, QlikView.

Core Skills

Data Analysis, Web Analytics, Data Visualization, Business Analysis,

Business Intelligence, Hadoop, Big Data

Reporting Tools

MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports

XI, SSRS, Cognos 7.0/6.0.

Databases

Microsoft SQL Server 2008,2010/2012, MySQL 4.x/5.x, Oracle 11g,

12c, DB2, Teradata, Netezza

EDUCATION:

Bachelor’s in Electronics and communication Engineering May 2010 – April 2014

Pondicherry University, pondicherry, India.

PROFESSIONAL EXPERIENCE:

Client: Aviva insurance Canada, Toronto, ON. May 2020 – Present Role: Big data Developer

Roles & Responsibilities:

Responsible for creating scripts/jobs to migrate data from Amazon S3 to the Hadoop platform and vice versa.

Responsible for developing & implementing end-to-end solutions using Hadoop, Spark, Flume, Hive, Pig, Sqoop, Cassandra, HBase, MongoDB, spark, zookeeper, AWS.

Developed Spark jobs using Scala and Python on top of Yarn/MRv2 for interactive and Batch Analysis.

Monitored cluster for performance and, networking and data integrity issues and responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.

Responsible for creating scripts/jobs to migrate data from Amazon S3 to the Hadoop platform and vice versa.

Developed ETL Applications using Hive, Spark, and Impala & Sqoop for Automation using Oozie. Used Pig as an ETL tool to do transformations, event joins, and some pre-aggregations before storing the data onto HDFS.

Hands-on experience in Python Pyspark programming on Cloudera, Horton Works, and MapR Hadoop Clusters, AWS EMR clusters, AWS Lambda functions, and CFT'S.

Created detailed AWS Security groups which behaved as virtual firewalls that controlled the traffic allowed reaching one or more AWS EC2 instances.

Worked on scalable distributed computing systems, software architecture, data structures, and algorithms using Hadoop, Apache Spark, and Apache Storm, etc., and ingested streaming data into Hadoop using Spark, Storm Framework, and Scala.

Created Hive External tables and loaded the data into tables and query data using HQL and worked with application teams to install an operating system, Hadoop updates, patches, version upgrades as required.

Environment: Hadoop, MapReduce, Hive, Pig, Sqoop, Spark, Spark-Streaming, R, Spark SQL, AWS EMR, AWS S3, AWS Redshift, Python, Scala, Pyspark, MapR, Zookeeper, Cloudera, Oracle, Kerberos, and RedHat 6.5.

Client: Scotiabank, Toronto, ON Mar 2019 – Mar 2020 Role: Big data Developer

Roles & Responsibilities:

Design and implement workflow jobs using Talend & Unix / Linux scripting to perform ETL on Hadoop

platform.

Developed java Map Reduce programs using core concepts like OOPS, Multithreading, Collections, and IO.

Gathered requirements, developed, and deployed a big data solution using Big Insights (IBM Hadoop Distribution).

Installed and configured Hadoop MapReduce, HDFS, and developed multiple MapReduce jobs in Java

for data cleansing and pre-processing.

Worked with application teams to install an operating system, Hadoop updates, patches, version upgrades as required

Helped the team to increase cluster size from 35 nodes to 118 nodes. The configuration for additional data nodes was managed using Puppet.

Responsible to manage data coming from different sources and involved in HDFS maintenance and

loading of structured and unstructured data.

Supported HBase Architecture Design with the Hadoop Architect team to develop a Database Design in HDFS.

Created Cassandra tables to load large sets of structured, semi-structured, and unstructured data coming from Linux, NoSQL, and a variety of portfolios.

Involved in creating data models for customer data using Cassandra Query Language.

Hands-on writing Map Reduce code to make semi-structured data as structured data and for inserting data into HBase from HDFS.

Environment: Apache Hadoop, HIVE, PIG, HDFS, Zookeeper, Kafka, Java, UNIX, MYSQL, Eclipse, Oozie, Sqoop, Storm, Map Reduce, Sqoop Zookeeper, Hadoop 2.3, YARN, Ambari.

Client: Wells Fargo, India Apr 2015 – May 2018

Role: Big data/Hadoop Developer

Roles & Responsibilities:

Maintained data pipeline up-time while ingesting streaming and transactional data sources using spark, redshift, S3 and python.

Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.

Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.

Used Flume to collect, aggregate, and store the weblog data from different sources like web servers and network devices and pushed to HDFS.

Supported MapReduce Programs and distributed applications running on the Hadoop cluster and

scripting Hadoop package installation and configuration to support fully automated deployments.

Performed Java MapReduce programs on log data to transform into a structured way to find user location, age group, spending time.

Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports by our BI team.

Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop, and Distcp) as well as system specific jobs (such as Java programs and shell scripts).

Enumerated Hive queries to do an analysis of the data and to generate the end reports to be used by business users. Experience with streaming toolsets such as Kafka, Flink, spark streaming, and or NiFi/StreamSets.

Creating Autosys Jill code to schedule the jobs and creating dependency on Upstream

applications in Production.

Used Sqoop to import the data onto Cassandra tables from databases and importing data from

various sources to the Cassandra cluster using Java API.

Environment: Hortonworks platform, Cloudera platform, AWS Stack, Java, Linux, HDFS, Kafka, Eclipse, Hadoop, Apache, HIVE, PIG, HDFS, Zookeeper, MYSQL, Eclipse, Oozie, Sqoop, Storm, Map Reduce, Sqoop.

Contact this candidate