Email : ac2f13@r.postjobfree.com
Phone : 201-***-**** Vivek Ganjave
** ******* ***, ****** ****, NJ 07307
www.linkedin.com/in/vivekganjave
SUMMARY:
• Master’s candidate with 4+ years of work experience in Data Warehousing/ETL and BI using SQL and Tableau.
• Expertise in applying analytical and efficient solution to business problems as per client’s changing requirements.
• Awarded Teradata Employee of the Quarter Award for Q4 2014 for sales data performance optimization.
• Lead 8 resource team and worked on requirement gathering for strategic implementation of business problems.
• Certified Teradata 12 SQL Technical Specialist.
EDUCATION:
Master of Science in Computer Science Stevens Institute of Technology New Jersey May 2017 Coursework: Big data, Data Warehousing, Text Mining, Business Intelligence, Data Mining, Web Analytics GPA: 3.8/4.0 Bachelor of Computer Engineering University of Mumbai India June 2012 Coursework: Data Warehousing, Data Mining, Distributed Computing, DBMS, Data Structures GPA: 3.8/4.0 SKILLS:
• Databases: Teradata, MySQL, SQL Server, Oracle • Languages: Python, SQL, Shell Scripting.
• Big Data: Hadoop HDFS, Spark, Hive, Sqoop, Pig • Operating Systems: Linux, Windows, Unix.
• ETL/BI tools: Teradata utilities, SSIS, Informatica, Tableau • Tools: MySQL Workbench, Toad. WORK EXPERIENCE:
Automatic Data Processing(ADP) R&D Analyst June 2017 – August 2017
• Built ingestion utility based on Sqoop for migrating data from existing EDW to Hadoop Data Lake.
• Designed and Implemented Sqoop incremental ingestion utility from EDW to Data Lake to handle delta in source.
• Designed workflow/scheduling using Oozie to create pipeline for Hadoop data lake.
• Responsible for analysis of payroll data using python and tableau using agile methodology.
• Integrated data from source system using Informatica tool, performed source to target mapping and transformation. Stevens Institute of Technology Big Data Engineer Graduate Assistant August 2016 – May 2017
• Designed and maintained data and metadata into partitioned hive tables.
• Analyzed data dependencies, relationships and data flow for multiple datasets in Hadoop environment.
• Used SQL to extract data from multiple sources, cleansed & transformed data for further analysis.
• Designed dashboard for stock market data visualization using tableau. Teradata Corporation Data Engineer June 2012 – December 2015 Project Name: Pfizer, State street bank, American Express, Grupo Bimbo
• Onsite to Mexico to implement Data Model using denormalized snowflake schema for efficient reporting.
• Developed Process automation using Unix Shell script, stored procedure which saved 40 hours per week of manual work.
• Integrated data sets involving complex mappings from multiple systems and designed cross-functional ETL solutions.
• Analyzing End to End ETL failures and providing quick resolutions to maintain smooth flow of dependent jobs.
• Reporting using Tableau for the client to review progress on Weekly, Monthly and Quarterly with AMEX client.
• Analyzed high CPU consuming SQL queries and tune it for efficient retail business reporting in holiday period.
• Index analysis, skew analysis, monitoring bad running jobs from viewpoint and providing recommendation to client.
• Excellence Performance Award from Client IT Manger in 2014 for Developing technical solutions to business problems. ACADEMIC PROJECT : January 2016-May 2017
Predict a Cardiac Event using Dobutamine Stress Echocardiography - R
• Implemented Principal Component Analysis for reducing dimensionality and correlation in the data.
• Performed Logistic Regression and LDA using clean and relevant variables. Data Mining Analysis and Prediction on Finance Credit Risk - Python
• Analysis of Credit history data and predict risk factor for bank using python.
• Data cleaning of variables for normality, Analyzed and predicted credit risk factor using KNN algorithms. Performance Comparison: Hive vs Impala vs Spark-SQL
• Performed performance bench marking based on query response for Hive, Impala and Spark-SQL execution engines.
• optimized Hive ecosystem using methods like Partitioning, ORC storage format and setting parameter to force map join.