Post Job Free

Resume

Sign in

Data Scientist Machine Learning

Location:
New York, NY
Salary:
$25/hr
Posted:
February 19, 2024

Contact this candidate

Resume:

Shikhar Johri

646-***-**** New York, USA ad3rkc@r.postjobfree.com Linkedin: @shikhar-johri GitHub: @johrilab SUMMARY

An experienced Data Scientist with 3+ years of research and industry experience. Skilled in data infrastructure, data models, leveraging statistical analysis, data visualization, AWS, SQL, Tableau and LLMs to drive revenue and innovation. Seeking to apply extensive skills in a challenging data-driven role, translating business problems to data-driven solutions. SKILLS

• Programming Languages: Python, SQL, PySpark, R, D3, MATLAB, Shell/Bash Scripting

• Libraries & Frameworks: Pandas, Numpy, Rshiny, Matplotlib, Pytorch, Tensorflow, Scikit-Learn, NLTK, OpenCV

• Tools: AWS (Glue, S3, Athena, EC2, EMR, Airflow, Redshift), GCP, Docker, Tableau, Git, Jira, Confluence WORK EXPERIENCE

Tata Consultancy Services Karnataka, IN

Data Scientist Python, Computer Vision, Pytorch, PyQt, Confluence Sep 2021 - Jul 2023

• Automated blood sample reporting 20x faster (800ms/sample) with pre-trained YOLOv4, saving Roche $400K annually.

• Improved data quality with augmentation and pre-processing for Johnson & Johnson's catheter tube inspection system, achieveing 10x faster processing with 25% higher precision via Facebook AI’s Detectron2 model.

• Created UI toolkit to convert 2D CT scans to 3D visualizations, enhancing medical diagnostics and implant development. Data Engineer Python, PySpark, SQL, AWS, NLP, Recommender System Nov 2020 - Sep 2021

• Deployed AWS data lake with ETL pipeline for 40TB Parquet data using PySpark, SQL, and Pandas, centralizing data and creating Master Data Management for Genentech, utilized by client partners.

• Developed sales analysis utility with A/B testing drawing 400+ new users and generating $2M+ within the first quarter.

• Reduced third-party storage costs by 90% and enhanced security via tiered data distribution across business channels.

• Conducted EDA on profile data, presented insights to develop Employee Recommender System, driving $1M+ revenue. Thapar University Punjab, IN

Data Analysis Intern Python, Machine Learning, Tensorflow, Feature Analysis Jun 2019 - Oct 2019

• Published a novel Adaptive Multilevel Ensemble Machine Learning model for COVID-19 diagnosis (98% accuracy) and feature analysis study to discover genetic biomarkers for Multiple Sclerosis, validated via hypothesis testing.

• Introduced TensoPIT Bloom Filter with Kalman Filter-based space forecasting for efficient real-time data caching. Northeast Big Data Innovation Hub New York, US

Graduate Student Assistant Python, R Studio, Excel, Tableau Sep 2023 - Present

• Led the 'Covid Information Commons' research student working group, guiding cross-functional teams on a global level on data visualization, dashboards and exploratory data analysis, to derive key data points for pandemic response policies. PROJECTS & PUBLICATIONS

• Career Compass: Quarto book dashboard for exploratory data analysis of job market dynamics using R and D3.

• Explainable AI for Transaction Fraud Detection: Interpretable machine learning solution leveraging Decision Trees, Random Forest, SHAP and AutoML. Conducted in-depth financial risks/feature analysis to pinpoint crucial indicators.

• GenAIJudge for Harvard Hackathon: Implemented and fine-tuned GPT-4 to assess challenges, evaluate projects using dynamic scoring rubrics, to streamline judges' tasks for proposal selection.

• RealFlow Assist: Google’s Gemini LLM based speech to text utility for realtime chatbot suggestions and de-escalation.

• A novel ML-based analytical framework for automatic detection of COVID-19, IMA, 2021 (doi.org/10.1002/ima.22613)

• Serum and Cerebrospinal Fluid Cytokine biomarkers for diagnosis of Multiple Sclerosis, Mediators of Inflammation, 2020 (doi.org/10.1155/2020/2727042)

• Image Super-Resolution using GAN: 2-stage Semantic Information GAN to improve resolution. (pending publication) EDUCATION

Columbia University New York, US

M.S. in Data Science, GPA- 3.7/4.0 Aug 2023 - Dec 2024 (Expected)

• Courses: Exploratory Data Analysis & Visualization, Probability & Statistics, Forecasting, Machine Learning, Finance

• Honors: R.G.S. Scholarship ($120K); Student Assistant at NE Big Data Hub; Data Science Student Council Amity University Rajasthan, IN

B.Tech. in Computer Science & Engineering (Minor in Management), GPA- 3.86/4.0 Aug 2016 - Jun 2020

• Courses: Data Structures, Artificial Intelligence, Database Management Systems, C/C++, Image Processing

• Honors: President at Robotics Automation and IoT Lab; Mentor in NJACK & NWoC; Equitech Futures fellowship



Contact this candidate