Post Job Free
Sign in

Data Analyst Python

Location:
Chicago, IL
Posted:
December 27, 2020

Contact this candidate

Resume:

HIMANSHU MISHRA

Chicago, IL ***** +1-312-***-**** ****************@*****.*** linkedin.com/in/himanshumishra11 Git: hmishra0 EDUCATION

Master of Data Science 3.54/4.0 GPA Aug 2018 - May 2020 Illinois Institute of Technology, Chicago, IL

Bachelor of Engineering in Computer Science 3.7/4.0 GPA Jul 2011 - Jun 2015 Rajiv Gandhi Technical University, Bhopal, India

SKILLS

Machine Learning: Statistical Analysis, Regression Analysis, Predictive Modeling, Data Analysis & Visualization, Hypothesis Testing, Association Rules, Decision Trees, Random Forest, Ensemble Learning, Neural Networks, Clustering, Natural Language Processing Technical Skill: Python (Pandas, NumPy, Scikit-Learn, matplotlib, TensorFlow, NLTK), R, SQL, Advanced Excel Tools/Platforms: Microsoft Azure, GCP, Apache Spark, GitHub, MS – PowerPoint, Version Control - Rational ClearCase & ClearQuest, Tableau, Google Analytics, Jira, ETL.datastage, Big Query, Oracle PROFESSIONAL EXPERIENCE

Data Analyst III Accuity, Chicago, IL Dec 2020 – Present

• Identifying the discrepancies between legacy and new files, by writing comparison scripts

• Identifying the source of discrepancies by mapping any discrepancies with the business requirements, by understanding the requirements

• Work with various stakeholders from business and technology to ensure that any gaps are correctly assigned to the relevant owners Data Scientist – Intern Center for Neighborhood Technology, Chicago, IL Sep 2019 – Dec 2019

• Description: Evaluating the disparity in utility bills among various communities of Chicago & identifying the key drivers of disparity

• Interacted with 10+ community organizers and two government departments for data collection, resulting in 27K record of utility bills

• Performed exploratory data analysis using Pandas library (Python) and observed significant disparity among different community bills, the key drivers of disparity were meter type and penalties. Non-metered billing was 1.6 to 3 times higher than metered billing Data Scientist – Intern Ricoh USA, Chicago, IL Jul 2019 – Aug 2019

• Aggregated data from various sources like Google Analytics & CRM systems using BigQuery and performed EDA on 110K records

• Wrote Standard SQL scripts to create various Google Analytics segments in BigQuery for analysis and developing reports in Power BI

• Developed machine learning based customer scoring model in R to identify potential customers with a recall of 86% and deployed on GCP

• Implemented NLP based matching algorithm in Python to match names from different data files without a company key, N-grams, TF-IDF and Cosine similarity, resulting increased matching rate to 40% from a base matching rate of 8%

• Performed data cleaning and visualization for 10K ad campaign responses for new products in MS-Excel collected via Google Forms

• Worked on building standard and ad-hoc dashboards to provide key insights and monitor KPI’s using Data Studio Data Analyst IBM Global Business Services, Kolkata, India Apr 2016 – May 2018 Decision Support System for Retail

• Performed data cleaning and wrangling on 3M records and defined Recency, Frequency and Monetary variables using R & SQL

• Built an unsupervised machine learning model (K-medoid clustering) to define the clusters of customers with high, medium and low values and visualized using ggplot2

• Developed business enhancement POCs for impact of weather on sales prediction and price sensitivity of demand by utilizing Python Business Intelligence Analytics for Retail

• Generated market sales reports and modified changes in existing business intelligence reports using Tableau Server and Desktop

• Designed and tested ETL jobs in various data layers using ETL.datastage and deployed in production environment with 100% success rate

• Overhauled the existing code in data layers by utilizing Oracle SQL, resulting in reduced data load issue by 20% quarterly

• Performed root cause analysis and filed RCA report for various development/deployment related failure in data warehouse Leadership & Automation

• Led deployment team to prepare deployment strategies, analysis order for deployment, and assigning task to team members using Jira

• Established a 10-member team and coordinated with four stakeholders towards the successful launch of the monthly business newsletter

• Automated incident change request in BMC Remedy tool using IBM Bluemix to reduce the manual work Data Analyst Define InfoTech, Bhopal, India Jun 2015 – Mar 2016

• Evaluated demographics and survey data and extracted relevant attributes of the targeted student and educator/company population

• Performed exploratory data analysis in Python to identify key drivers of enhancing student’s inflow and evaluate the performance gaps identified by the educators/companies

• Initiated client engagements by identifying problem spaces involving market research, data gathering and delivering a proof of concept with a low turnaround time

PROJECTS

• Lane Detection using Deep Learning: Built a CNN based lane detector in Python by utilizing HDFS & Kafka for data processing & real-time streaming and deployed it on a Spark Cluster

• Performed exploration and exploratory data analysis on 800k records and designed an interpretable machine learning model (logistic regression) with 81.57% accuracy and 99.47 % sensitivity - Profit $1.9 million (expected)



Contact this candidate