Data Engineer

Location:

New York City, NY

Posted:

September 14, 2020

Contact this candidate

Resume:

PRANAV PATEL

201-***-**** *********@*****.***

linkedin.com/in/pranav-patel-5182a013a

EDUCATION

New York University Tandon School of Engineering, New York, NY Master of Science in Electrical and Computer Engineering 05/2019 Gujarat Technological University, Gujarat, India

Bachelor of Engineering in Instrumentation and Control Engineering 06/2017 TECHNICAL SKILLS

• Programming Languages: Python, C, C++, Matlab

• Platforms and Tools: Windows OS,MAC OS, LinuX

• Databases: Oracle SQL,MySQL

• Frameworks: Azure, SnowFlake, AWS, PySpark, Airflow, Alteryx, Hadoop EcoSystem, Hive

• Trainings and Certifications: Cloudera University Developer Training for Hadoop and Spark, Microsoft AZ 900, SnowPro Core PROFESSIONAL EXPERIENCE

Data Engineer, 84.51/Kroger, Cincinnati, OH 12/2019 - 08/2020

• Engage with platform owners/operators at early stage to influence its form and features across expected user base.

• Gather requirements from stakeholders and design the solutions aligned to the requirements, platform architecture and infrastructure

• Implemented PySpark and SparkSQL for faster testing and processing of data worked on migrating Oracle queries and Alteryx workflows from on-premises environment into PySpark transformation in Azure.

• Engage with principal engineers/architects to cross - pollinate and leverage design patterns and practices for big data engineering.

• Develop, test, and enhance data stores (tables, views, files) fitting to design and architecture of data solution.

• Design data quality checks and investigate performance issues, and identify optimization measures including but not limited to coalescing fragmented data, optimizing resource usage, re-partitioning, indexing, bucketing/distribution and developed data quality checks for daily aggregated Kroger customer data.

• Developed PySpark applications using Data frames and Spark SQL API for faster processing of data.

• Developed highly optimized Spark applications to perform various data cleansing, validation, transformation and summarization activities according to the requirement.

• Worked closely with a team of Data Scientists to improve and deliver Sales Forecasting model and established automation data delivery workflow for components of the daily sales forecasting model by using Airflow

• Involved in scheduling Airflow workflow engine to run multiple Spark jobs using python.Involved in story-driven Agile development methodology and actively participated in daily scrum meetings.

• Used version control tools such as Github to pull data from Upstream to local branch, check conflict, cleaning and reviewing the codes of other developers.

Graduate Assistant, NYU IT, New York, NY 07/2018- 05/2019

• Handled importing data from different data sources into HDFS using Sqoop and performing transformation using Hive and then loading the data into HDFS.

• Used Hive to analyze the portioned and bucketed data and compute various metrics for reporting.

• Developed Hive Scripts in Hive QL to de-normalize and aggregate the data.

• Scheduled and executed workflows in Oozie to run various jobs. Data Science Engineer, 7Span Technologies, Ahmedabad, GJ, India 05/2015 - 07/2017

• Extracted files from MySQL, Oracle and Teradata through Sqoop and placed in HDFS in the Cloudera Environment.

• Loaded data into the cluster from dynamically generated files using Flume and from relational database systems using Sqoop.

• Transform data from various resources, organize data and extract features from raw data and handled importing data from various data sources and perform transformations using Hive, MapReduce and load data into HDFS.

• Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing Hive queries to further analyze the logs to identify issues and behavioral patterns. ACADEMIC PROJECTS

Restaurant Recommender Chat Bot (AWS, Python) 05/2018

• Built a Chat-Bot to provide recommendations to users by fetching data from Yelp API by integrating AWS Lex to interact with the user and store the preferences of the user and developed code on AWS Lambda console to call the Yelp API.

• Used AWS SQS to store multiple requests and AWS Cloud Watch to execute the lambda functions as well as integrated AWS SNS to send recommendations to the user via email.

Contact this candidate