Data Sales

Location:

Posted:

October 15, 2020

Resume:

BRIJESH GOHARIA

*****************@*****.*** +1-703-***-**** linkedin.com/in/bkg318 github.com/brijeshgoharia Data-driven engineer with 3+ years of experience in Analytics with a solid background in Python, SQL, R & Cloud Engineering. Proficient in Data Analytics techniques for quantitative analysis such as Data Pipelining, Data Modeling and Data Warehousing. In- depth knowledge of IT business process and implementation of Software Development Lifecycle using agile methodology EDUCATION

Lehigh University, Master’s in Industrial & Systems Engineering, USA (GPA: 3.8) Aug’18 – May’20

(E-Business Enterprise Application, Predictive Analytics, Optimization Models and Applications, Mining of Large Datasets) Data Engineering, Nanodegree, Udacity

(Data Modeling, Cloud Data Warehouse, Data Lakes with Spark, Data Pipelines with Airflow) University of Mumbai, Bachelor’s in Mechanical Engineering, India (GPA: 3.7) Aug’12 – May’16 TECHNICAL SKILLS

Programming Skills: Python, R, SQL, VBA, SAS, Cassandra AWS services: Redshift, Airflow scheduler, S3, EC2, Lambda, EMR, RDS Big Data: Spark, MapReduce Analytics/ ETL: Tableau, MS Excel, Qlik Sense ERP Systems – SAP S/4 HANA, Salesforce, ZOHO Optimization Modeling - AMPL, Excel Solver EXPERIENCE

Associate Intern, Business Intelligence, KMK Consulting Inc, USA July’20 – Present

● Developed in-depth understanding of the pharmaceutical industry including sales measurement, field force activities, alignments, deciling & targeting, incentive compensation

● Explored data to create valuable reports for senior stakeholders to improve sales planning operations & sales force effectiveness

● Analyzing massive data sets to identify growth opportunities & measure performance that drive business decisions Data Analytics Consultant, Wacker Chemical Corporation (Lehigh ESC), USA Jan’19 – May’20 Predictive Analytics for Barge Scheduling

● Developed a predictive model to optimize Barge Scheduling of raw materials from Houston to the Calvert City plant

● Enhanced scheduling by implementing a classification model & incorporating anomaly finding with projected savings of

$60,000 for Q1 2020

● Queried historical data using SQL and performed EDA to find the seasonality & trends improving the Barge scheduling by 21%

● Compared weather data with Barge transit data using Pandas DataFrame to find geological anomalies impacting transit delay Process Automation

● Developed dashboards and macros in Excel using VBA to improve Job Scheduling system and reducing lead time by 12%

● Achieved 93% on time delivery by tracking inventory using dashboard visualization of backorders and missed orders

● Collected data from SAP sales order database by coding SQL queries to build data pipelines for delivery scheduling application

● Pre-processed data using Spark & Python applications on EMR Cluster which reduced compute time from hours to minutes Sales & Marketing Analyst, Think and Learn Pvt. Ltd., India Mar’17 –May’ 18

● Developed sales model to penetrate new area of opportunities resulting in $40,000 quarterly revenue generation

● Created comprehensive competitor analysis to implement new lead generation strategies resulting in 2-fold increase in leads

● Utilized customer relation management software (Lead Squared CRM) to effectively manage and analyze customer interactions

● Analyzed and prepared sales reports and identified sales patterns using Tableau dashboards ACADEMIC PROJECTS

Financial Institution Marketing Data Analysis – Recommendation Engine Sept’19-Dec’19

● Preprocessed the Bank data and performed EDA to find the best strategies for future marketing campaigns using Apache Spark

● Performed feature selection by visualizing different aspects (seasonality, occupation, Loans, etc.) using matplotlib, seaborn

● Built classification models (Clustering, Logistic Regression) to help right product offering to the customers of Santander Bank Analysis and Prediction of Consumer Behavior Aug’18–Nov’18

● Developed a model to predict whether the customer would buy organic foods by data pre-processing & its assessment on R

● Analyzed the anomalies in data using data visualization ggplot 2, eliminated those with least impact on prediction model

● Achieved an accuracy of 82% by using Logistic Regression, Multiple linear regression, KNN & Neural Network models DATA ENGINEERING PROJECTS

Project 1 Relational & NoSQL Databases - Data Modeling with PostgreSQL and Apache Cassandra Developed a relational database using PostgreSQL to model user activity data for a music streaming app:

● Created database using both PostgreSQL and Apache Cassandra

● Developed a Star Schema database using optimized definitions of Fact and Dimension tables (Normalization) for Postgres

● Developed denormalized tables optimized for a specific set of queries and business needs (Apache Cassandra)

● Built out an ETL pipeline to optimize queries to understand songs played/listened by the users Proficiencies include: Python, PostgreSQL, Apache Cassandra, Star Schema, ETL pipelines, Normalization, Denormalization Project 2 Cloud Data Warehousing on AWS cloud

Created a database warehouse utilizing Amazon Redshift:

● Created a Redshift Cluster, IAM Roles, Security groups

● Developed an ETL Pipeline that copies data from S3 buckets, staged the data to warehouse hosted on Amazon Redshift to be processed into a star schema

● Developed a star schema with optimization to specific queries required by the data analytics team Technologies used: Python, Amazon Redshift, AWS CLI, Amazon SDK, SQL, PostgreSQL Project 3 Data Lake - Spark

Scaled up the current ETL pipeline by moving the data warehouse to a data lake:

● Created an EMR Hadoop Cluster

● Developed the ETL Pipeline copying datasets from S3 buckets, data processing using Spark and writing to S3 buckets using efficient partitioning and parquet formatting

● Fast-tracked the data lake buildout using (serverless) AWS Lambda and cataloging tables with AWS Glue Crawler Technologies used: Spark, S3, AWS EMR, Athena, Amazon Glue, Parquet Project 4 Capstone: COVID19 Data Pipeline and Visualization

● The purpose of this project is to process Covid-19 data in an attempt to analyze & visualize the data to derive statistical information & study patterns in Covid cases in the USA & worldwide. Technologies used: Rest API, Amazon Elasticsearch service and Kibana, Python, Pandas CERTIFICATIONS

AWS Solutions Architect Associate (Udemy): Outcomes: AWS and Serverless fundamentals and real-world solution architecture Analytics Edge (MITx): Outcomes: Data mining, Dynamic Optimization, Simulation Marketing Analytics (Berkeleyx): Outcomes: Market Insight, Market Segmentation, Strategic Metrics, Price Analytics Leadership Skills

● Teaching Assistant, Information Systems Analysis and Design Feb’19 - May’20

● Graduate Research Assistant, Department of Decision and Technology Analytics Aug’19 - May’20

● Graduate Student Mentor, Data Analytics and Design Club Aug’19 - May’20

Contact this candidate