Information Technology - Data Engineer

Location:

Tempe, AZ

Posted:

February 12, 2024

Contact this candidate

Resume:

Bhakti Santosh Patil

602-***-**** *******@***.*** bhakti10 BhaktiSP10

SUMMARY

Enthusiastic problem solver with over 4 years of experience in the IT industry, seeking full-time opportunities. EDUCATION

Arizona State University Tempe, AZ

Masters degree in Information Technology (GPA: 4/4) Aug ’22 - May ’24

• Relevant Coursework: AWS-Cloud Architecture, Big Data, NLP, Statistics Vidyalankar Institute of Technology- University of Mumbai Mumbai, India Bachelor of Engineering (Hons.) in Information Technology (GPA: 8.32/10) Aug ’14 - May ’18 TECHNICAL SKILLS

• Programming: Python, Scala, Bash/Shell Scripting, Groovy, Java, JavaScript, R, PySpark

• Tools,Technologies & Public Cloud: Apache Nifi, Kafka, RoboMongo3T,Databricks, Tableau, Postman, Jenkins, Docker, Kubernetes, Hadoop, Zookeeper, Pandas, Matplotlib, Elastic Stack (ELK), Prometheus, Git, JIRA, Grafana, Gradle, AWS, Google Cloud Platform, Google Colab, Azure - Data Factory, Azure Synapse, ER tools, Jupyter Notebook, Sklearn, MLib, Cmak, Putty, Linux based systems

• Skills: ETL pipelining, Data cleaning/Pre-processing, Data Wrangling, ML Flow, Data Mining, Data Streaming, Big Data Integration, CI/CD, Solution Architecture, Data Analytics/Visualization

• Web & Database Technologies: SQL, RDBMS, NoSQL, Cosmos DB, MySQL, PostgeSQL,Microsoft SQL Server, SAP HANA, MariaDB, MongoDB, Couchbase, Cassandra, Impala, HiveQL, HBase, AWS-RDS, HTML5, CSS3

• File Systems and File Types: HDFS, JSON, CSV, ORC, Parquet, AVRO, PIFF WORK EXPERIENCE

NXP Semiconductors Chandler-AZ, United States

Analytics & Data Engineer Intern May ’23 - Aug ’23

• Designed and developed data pipelining concepts for storing data retrieved from data sources in Data Lake and Data Warehouse along with developing analytical solutions for better data visibility.

• Cleaned and Processed over 1.37 GB semiconductor manufacturing data usable by Data Science teams for finding anomalies or outliers. Monitored 7 data flows to avoid discrepancies and maintain smooth functioning of developed systems.

• Worked in collaboration with IT operations/Admin, Analytics, Data Engineer and Data Science associates for creating, designing and processing data for value adding insights. Reliance Industries Mumbai, India

Data Engineer Aug ’18 - June ’22

• Developed project concepts, created data pipelines, performed analytical solutions, and maintained optimal workflow with automation in building business applications.

• Defined and developed optimized programming logics for managing 70% real-time ecommerce data and 30% offline store data by building complex ETL pipelines and monitoring operational or data level issues.

• Worked as a senior developer being a part of data lake and data science team in creating end-to-end solutions for customer, ecommerce, SCM and Finance Business analyst’s teams for data visualization. Asmita Solutions Pvt. Limited Mumbai, India

Software Development Engineer Intern Sep ’17 - Mar ’18

• Created an application for Dispensaries using the MEAN stack.

• Interacted with local Dispensary and gathered real world data of over 1000+ patients and doctors. PROJECT HIGHLIGHTS

Class Probe Flows and Kafka Monitoring May’23 - Aug’23

• Worked on NIFI with Groovy scripting for creating Class Probe data workflows by processing different JSON, XML and CSV file formats and developed raw, temporary and transactional tables in Impala including wafer testing data insights.

• Monitored over 3 FDC flows containing 0.19 GB data files processed using Spark Streaming and Kafka to identify data loss at Kafka topic and HDFS levels.

Customer 360 2021 - 2022

• Achieved a 360-degree view of any customer by sourcing, cleaning and processing of data as per the KPI’s for future shopping prediction done by the customer. Used MongoDB and HANA stored procedures to get data in HDFS and prepared aggregated data set by using spark scala and python.

• Processed raw customer data with 120+ formats with the usage of star and snowflake scheme, having a quantum of over 170 million generated by offline and e-commerce sites and stored the resultant JSON into Elastic Search which contributed the company revenue by 30%.

Deal of the Day 2021 - 2022

• Worked as a Data Engineer head. Extracted 3 million customer data from HANA, joined them against the user requested real- time JSON requests from Kafka to create a consolidated view of customers buying products at the time of sale and minimalized the result set from 70-100 tickets generated to 1 record for each customer by tuning queries.

• The output was a consolidated single granularity level, csv file obtained from 5 different schemas. Jiomart Dashboard 2019 - 2021

• Extraction and aggregation of 42 tables with over 500+ million near real-time data from MySQL, PostgreSQL-VPN servers, Kafka topics and other data sources using Nifi, spark-scala and Presto for user level visualization on Tableau/PowerBI/Dashboards. Scheduled the jobs using Airflow to run at a time rate of 15 mins. 70% of application data was used for visualization, contributed to increase the revenue of grocery, digital, fashion and lifestyle business formats. Dispensary Management System 2017 - 2018

• Collected real-time data of over 0.5 million patients and doctors and used MEAN stack to store it in MongoDB with extended JSON format, created UI representation using AngularJS with PrimeNg and back-end processing using ExpressJS.

Contact this candidate