Data Engineer

Location:

Jersey City, NJ, 07307

Salary:

$80000

Posted:

September 24, 2017

Contact this candidate

Resume:

Email : ********@*****.***

Phone : 201-***-**** Vivek Ganjave

** ******* ***, ****** ****, NJ 07307

www.linkedin.com/in/vivekganjave

SUMMARY:

• Master’s candidate with 4+ years of work experience in Data Warehousing/ETL and BI using SQL and Tableau.

• Expertise in applying analytical and efficient solution to business problems as per client’s changing requirements.

• Awarded Teradata Employee of the Quarter Award for Q4 2014 for sales data performance optimization.

• Lead 8 resource team and worked on requirement gathering for strategic implementation of business problems.

• Certified Teradata 12 SQL Technical Specialist.

EDUCATION:

Master of Science in Computer Science Stevens Institute of Technology New Jersey May 2017 Coursework: Big data, Data Warehousing, Text Mining, Business Intelligence, Data Mining, Web Analytics GPA: 3.8/4.0 Bachelor of Computer Engineering University of Mumbai India June 2012 Coursework: Data Warehousing, Data Mining, Distributed Computing, DBMS, Data Structures GPA: 3.8/4.0 SKILLS:

• Databases: Teradata, MySQL, SQL Server, Oracle • Languages: Python, SQL, Shell Scripting.

• Big Data: Hadoop HDFS, Spark, Hive, Sqoop, Pig • Operating Systems: Linux, Windows, Unix.

• ETL/BI tools: Teradata utilities, SSIS, Informatica, Tableau • Tools: MySQL Workbench, Toad. WORK EXPERIENCE:

Automatic Data Processing(ADP) R&D Analyst June 2017 – August 2017

• Built ingestion utility based on Sqoop for migrating data from existing EDW to Hadoop Data Lake.

• Designed and Implemented Sqoop incremental ingestion utility from EDW to Data Lake to handle delta in source.

• Designed workflow/scheduling using Oozie to create pipeline for Hadoop data lake.

• Responsible for analysis of payroll data using python and tableau using agile methodology.

• Integrated data from source system using Informatica tool, performed source to target mapping and transformation. Stevens Institute of Technology Big Data Engineer Graduate Assistant August 2016 – May 2017

• Designed and maintained data and metadata into partitioned hive tables.

• Analyzed data dependencies, relationships and data flow for multiple datasets in Hadoop environment.

• Used SQL to extract data from multiple sources, cleansed & transformed data for further analysis.

• Designed dashboard for stock market data visualization using tableau. Teradata Corporation Data Engineer June 2012 – December 2015 Project Name: Pfizer, State street bank, American Express, Grupo Bimbo

• Onsite to Mexico to implement Data Model using denormalized snowflake schema for efficient reporting.

• Developed Process automation using Unix Shell script, stored procedure which saved 40 hours per week of manual work.

• Integrated data sets involving complex mappings from multiple systems and designed cross-functional ETL solutions.

• Analyzing End to End ETL failures and providing quick resolutions to maintain smooth flow of dependent jobs.

• Reporting using Tableau for the client to review progress on Weekly, Monthly and Quarterly with AMEX client.

• Analyzed high CPU consuming SQL queries and tune it for efficient retail business reporting in holiday period.

• Index analysis, skew analysis, monitoring bad running jobs from viewpoint and providing recommendation to client.

• Excellence Performance Award from Client IT Manger in 2014 for Developing technical solutions to business problems. ACADEMIC PROJECT : January 2016-May 2017

Predict a Cardiac Event using Dobutamine Stress Echocardiography - R

• Implemented Principal Component Analysis for reducing dimensionality and correlation in the data.

• Performed Logistic Regression and LDA using clean and relevant variables. Data Mining Analysis and Prediction on Finance Credit Risk - Python

• Analysis of Credit history data and predict risk factor for bank using python.

• Data cleaning of variables for normality, Analyzed and predicted credit risk factor using KNN algorithms. Performance Comparison: Hive vs Impala vs Spark-SQL

• Performed performance bench marking based on query response for Hive, Impala and Spark-SQL execution engines.

• optimized Hive ecosystem using methods like Partitioning, ORC storage format and setting parameter to force map join.

Contact this candidate