Post Job Free

Resume

Sign in

Data Project

Location:
Montreal, QC, Canada
Posted:
August 21, 2017

Contact this candidate

Resume:

Lakhwinder Singh Email: ac1xak@r.postjobfree.com

#*, **** *** **. ******* Mobile: +1-438-***-****

Montreal, Quebec, CA

H3C-6L7

Summary:

Hadoop HDFS, Hive, Sqoop, PIG, Spark, Flume, Oozie, MapReduce, Kafka and Hbase Implementation.

Comprehensive knowledge of Hadoop Architecture and its various components – HDFS, MapReduce, Name Node, Data Node, Job Tracker, Task Tracker, Secondary Name Node and YARN.

Data Ingestion of near real time data using Flume.

Deep understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance

Worked on reading flat files and hive tables into Spark RDDs (Resilient Distributed Dataset) and performed various transformations and actions for data analysis.

Extracted the data from Relational databases into HDFS using Sqoop and vice versa.

Knowledge of Flume, Zookeeper and other components from Hadoop ecosystem.

Experience in analyzing data using HiveQL, PIG Latin.

Good knowledge in collecting and storing stream data like log data in HDFS using Apache Flume.

Well understanding of shell scripting with Bash.

Familiarity with Amazon Web Services- Data warehousing techniques like Amazon Redshift, Database (S3, DynamoDB), Amazon EC2, Amazon Cloudfront, Amazon EMR, Jaspersoft, Dockers- Containers.

Hands on Experience with -MS SQL server Integration Services (SSIS) to write Packages for ETL, MS SQL Server Analysis Services (SSAS) and MS SQL Server Reporting Services (SSRS).

Proficient with core java and C++ .

Academics:

Master(Electrical & computer Engg.) from Concordia University, Montreal January 2015 - August 2016

B.Tech(ECE) from Rayat And Bahra Institute of Engineering and Nano-Technology, affiliated to PTU Punjab with 72% Jan 2010 - August 2014

Technical Skills:

•Big Data/Hadoop eco system : HDFS, HIVE, Pig, Sqoop, Hbase, Flume, Spark, Oozie.

•DB Languages : MySQL.

•Operating Systems : Windows, Linux, Mac.

•Programming languages : Java, C++, VHDL.

•Softwares : Eclipse, IntelliJ, Formality, Cadence, Netbeans, Microsoft. office, MS excel, MS powerpoint, MS outlook, Xilinx.

MS Visual Studio, SQL server 2008, Shell Scripting(Bash).

Project Profile:

Project: - Uber Data Analysis. (UberX) Absolute Integration tech. (Feb - July 2017)

Platforms : Apache HDFS, MapReduce, Bash (shell scripting), Oozie.

Roles & Responsibilities:

•Performed analysis on the Uber dataset in Hadoop using MapReduce.

•Designed java code for mapper and reducer class.

•Involved in coding for java.

•Built a jar file for the java program.

•Moved dataset files from local system to remote system (Virtual Machine) using FileZilla.

•Dataset from remote system local path transferred to HDFS.

•The jar file was Run as a normal Hadoop program by passing the input dataset and the output file path.

Project :- Sentiment Analysis on Demonetization(CNBC News)

Absolute Integration tech. (Aug 2016 - Dec 2016)

Platforms : Apache HDFS, Apache Pig, Oozie, Bash (shell scripting)

Roles & Responsibilities:

Wrote the Apache PIG scripts to process the HDFS data.

Loaded the data into pig using PigStorage.

Loaded the dictionary into pig.

Performed a map side join by joining the tokens statement and the dictionary contents.

Calculated the Average rating of the tweet using the rating of each word.

Filtered out the positive and negative tweets (i.e the tweets in favour or against demonetization).

Project :- Aviation(Indigo) Data Analysis

Absolute Integration tech. (Feb 2016 - July 2016)

Platforms : Apache HDFS, Apache Pig, Bash, MS SSIS, MS SSRS.

Roles & Responsibilities:

•Extracted data using SSIS, cleansed it.

•Analyzed Aviation data to understand the contents and structure.

•Developed PIG queries to find the top 5 most visited destinations.

•The month with the most number of cancellations due to bad weather.

•Top ten origins with the highest AVG departure delay.

•The route (origin & destination) which has seen the maximum diversion.

• Generated reports using SSRS.

Project: - Hive – Real Estate Analysis - DB Reality

Absolute Integration tech.(Sept 2015- Jan 2016)

Platforms : Apache HDFS, Apache Hive, Bash, AWS

Roles & Responsibilities:

Created External Hive tables to store the processed results in a tabular format.

Loaded data from S3 path into External Hive table.

Copied data from local DB to Dynamo DB and Redshift for analysis.

Applied queries from which some are listed below.

Elected out city wise list all the Condos which are not less than ten thousand.

Analyzed the cheapest Condo in Banglore, name the city, street and price for the Condo.

Used Jaspersoft for reporting purpose.

Project: - Hardware Design Verification. Concordia (April- Aug 2016)

Tools : Formality & Cadence.

Removed bugs and errors in the logic- level and Gate- level for formal and informal hardware techniques using tools cadence and formality.

Project: - Designed a MIPS 32bit processor with 5 stage pipeline Concordia (Jan- May 2015)

Tools : Xilinx ( VHDL)

With the 5 stage pipelined architecture, designed a 32bit MIPS processor in VHDL. Even in this project applied control commands and removed bugs like RAW(read after write), WAR(write after read), RAR(read after read).

Project: - Citizen Card(Adhar Card) system (Software Developer- Java)

TCIL-IT (Jan 2014 -Nov 2014)

Platforms : Java, MySQL.

Roles & Responsibilities:

It was Java based project. That was every citizen have a unique id to which all the Gas, Electricity, hydro, cellphone, driver license, car and house loan and registration were linked to. Data entries were managed on database side with the help of MySQL.

Languages: English, Beginner French, Hindi, Punjabi.

References available on request



Contact this candidate