Big Data Engineer

Location:

Buffalo, NY

Posted:

March 29, 2020

Contact this candidate

Resume:

Tushar Tanwar linkedin.com/in/tushartanwar • 716-***-**** • ************@*******.***

EDUCATION

State University of New York, Buffalo June 2020

Masters of Science in Management Information Systems GPA: 3.97/4.0

Courses: Distributed Computing & Big Data, DBMS, Data Visualization using Tableau, Data driven analysis using Python

ITM University, Gurgaon, India June 2015

Bachelor of Technology in Mechanical Engineering GPA: 7.42/10

TECHINCAL SKILLS

Big Data & Reporting Tools: Spark (pyspark), Hive, Sqoop, Mapreduce, Pig, HBase (Phoenix), Tableau, MS Excel, SSIS, SSAS

Databases: Teradata, Oracle, SQLite3

Languages: Shell scripting, Python, Core Java

Familiar With: AWS, Pandas, Scrapy, MatplotLib, Seaborn, Data Structures & Algorithms

CERTITIFCATIONS

AWS Certified Solutions Architect – Associate Amazon Web Services March 2020

Databricks Certified Associate Developer for Apache Spark 2.4 (CRT020) Apache Spark January 2020

Tableau Desktop Specialist Certified Tableau Software December 2019

MapR Certified Hadoop Developer (MCHD) Apache Hadoop Distribution February 2016

WORK EXPERIENCE 4 Years

Programmer Analyst – Big Data American Express Gurgaon, India May 2018 – July 2019

Tech Stack – Hive, Sqoop, Spark, Tableau, Python, Shell Scripting, Teradata, SSIS, SSAS, Agile, GIT

Responsible for building, optimizing and automating Big Data ETL pipelines on credit & fraud risk data to provide business insights to the business partners through tableau dashboards and excel pivot reports.

Collaborated with business partners to gather requirements and design the complete big data pipeline.

Worked with Sqoop to ingest and retrieve data from Teradata into HDFS and Hive tables.

Optimized existing hive processes performance by 50% using techniques like partitioning, bucketing and better file formats.

Migrated hive processes to Spark in python using Spark SQL & Data Frames for data aggregations & transformations.

Developed purge scripts for efficient utilization of space on HDFS by automatically purging irrelevant hive tables.

Designed a modular and scalable framework in python for handling restart-ability and logging of an ETL pipeline.

System Engineer – Big Data Tata Consultancy Services Noida, India Sept 2015 – April 2018

Tech Stack – Hive, Sqoop, Spark, Pig, Map Reduce, Teradata, HBase/Phoenix, Java, Shell Scripting, Python, GIT, Jira, Agile

Collaborated with a global team of 20+ on the migration project for a US based pharmaceutical client to migrate the EDW systems built on top of Teradata to handle huge amount of pharmacy and retail pharmacy data to Hadoop Data Lake.

Devised data quality and validation scripts in Java and Pig to clean and validate the ingested data.

Created Hive and Spark scripts to perform data compaction of delta records and data transformation.

Designed and developed scripts in shell to monitor the availability of files on HDFS for further processing.

Developed Pig scripts to run queries over HBase Phoenix table and load data in a file.

Served as a single point of contact through full SDLC for a CTO level visibility project on customer segment targeting; gathered requirements from lead architect, designed, developed and deployed the system within stringent deadline.

Led a team to design and implement the next phase and provided knowledge transfer to support team and engineers.

ACADEMIC PROJECTS

Hadoop Streaming API to filter out non-English words from a dataset of folk songs Nov 2019

Used python programming to write Map Reduce program for filtering out non-English words from the data on HDFS.

Segregated the words per-file and used techniques to handle special characters and punctuation marks in the text.

Predicting the Best Playing XI Football Team through Data Science Techniques Nov 2019

Developed python code to scrape data from a football website using python module Scrapy.

Cleaned and normalized the data using Pandas and stored it in SQLite3 tables.

Formulated the best playing XI by predicting the playing position of every player using KNN classifier.

Visualized the predicted team using MatplotLib, Seaborn and python image processing module PIL.

Contact this candidate