Resume

Engineer Data

Location:

San Jose, CA

Posted:

October 06, 2020

Contact this candidate

Resume:

Jyotirmoy Sundi

San Jose, CA

Phone: 650-***-****

adgp4d@r.postjobfree.com

Education

BTech, Computer Science, NIT Silchar India (8.14/10)

MS in Computer Science SUNY, Stony Brook University, USA (3.71/4.0)

Udacity (Machine Learning Nanodegree in Progress)

Udacity (Deep Learning Course in Progress)

Relevant Coursework

Machine Learning, Artificial Intelligence, Web Technologies, Advanced

Algorithms, Discrete Mathematics, TensorFlow, Optimization Solvers, various in Python/Machine Learning/Deep Learning in udemy.com Programming Languages/Tools Used

Java, Scala, Spark, Spark-SQL, BigQuery, Python, Jupyter, Deep Learning, LSTM, CNNS, forecasting. XGBoost, Prophet, LightGBM, various clustering techniques hierarchical probabilistic models like GMM, MINLP Optimizers, CPLEX, Pandas, NumPy, Seaborn, AWS, MLib, Hadoop, Cascading, Hive, HBase, JavaScript, R, Eclipse, IntelliJ, jQuery, Material Design, AngularJS, Scalding, Sci-kit, Storm, CDAP, MySQL, DynamoDB, Amazon EMR, Kafka, Backbone, High charts, Elasticsearch, Play framework, Nginx

Director [Current] / Manager /Principal Machine Learning Engineer, Walmart Labs (Dec 2017- Current)

Grown a team of 1 to 20 data scientists, machine learning engineers, data engineers and actively involved in hiring, university recruiting, interviewing, screening candidates across all levels

Setting the AI vision for omni channel retail operations working with business stakeholders to increase topline by +5% yoy on top of 550B yearly revenue

Engaging with external AI experts from Google/Microsoft to invest in deep learning, reinforcement learning, causal inference framework’s, set up weekly meetings &. Increase overall technical & scientific capability of the entire team on effective applied DS practices. Partnering with MIT Professors on problems around stochastic optimization, robust optimizers under uncertainty with Lagrangian relaxations. With MIT partnership we were able to solve robust optimizations problems under uncertainty which were infeasible before unlocking new business revenue streams and speeding model deployments by 10x

Hands On as required on algorithms to tune algorithm performance in terms of accuracy, ROC for classification, clustering, forecasting or optimization solutions or scaling elastically algorithms & data transformations in cloud removing blockers for IC’s.

Build item recommendation model [ Deep Learning model with complex multiple embeddings like customers, inventory, substation groups, pricing, choice model forecasts within substation groups, chance constraint & combinatorial optimizers] for Walmart Stores for food & consumables in 5000 stores across 1M products. This algorithm increased topline by +$8B growth across all multiple categories

Build anomaly detection algorithms [ensemble of DBScan, Variational auto encoders] to detect in efficiencies in enterprise inventory decreasing out of stock of items by 12%

Build Causal Inference algorithm to model incrementality sales uplift within item substitutes for new items introduced to Walmart stores. Incrementality model along with forecast accuracy of 80% on sparse dataset increased confidence of item introductions and increased new item overall sales by 1% on F&C categories which is +150B yearly sales in Walmart

Introduced model feature selection tools [tsfresh] and feature importance values like [shap values] to team for modelling important features enabling explainability & adoption of AI models by 30%

Build Streaming tech stack to detect anomalies in item processing pipelines for 1p/3p marketplaces, helped save 400K man weekly hours to debug & resolve issues

Build a scalable self-serve cost effective machine learning training and deployment platform decreasing time to market of building & deploying models by 60% and saved $3M per year on infra cloud cost as models were scaled up

Manage team of high-quality Deep Learning & Machine Learning engineers from Ivy Leagues and top companies.

Awarded role model performer in for performance in Walmart 2018,2019, given to top 2% of employees.

Big Data Engineer, Intuit (March 2016- Dec 2017)

Develop Data Pipelines in Spark Streaming in AWS for Tax file ingestion and clickstream data handling 20TB of data per day optimizing the workflow by multifold in terms of performance.

Develop/Optimize spark big data pipelines for user click stream events reducing infra cost by 60% for 2TB daily processing.

Develop Akka based Ingestion Monitoring service for monitoring all types of pipelines batch and real-time and report intelligently on issues real time proactively. Increase visibility of all projects with a minimalist dashboard to management to help invest strategically in tech.

Develop AWS Cloud formation templates to build secure and immutable infra end-to-end.

Won Machine Learning hackathon for time series demand forecasting with BSTS models and Deep Learning LSTM RNN cells. Participated in multiple other prototypes.

Prototyped Real Time click stream analysis with Apache Beam/Kafka/ES/ Kibana for sessionization/windowing triggers.

Big Data Engineer, Ad-Tech Startup (Feb 2015- March 2016)

Batch & Real Time Data Pipelines in Hive/Spark, Machine Learning CTR/CPA optimization modeling workflows over Spark.

Real Time Bidding log analysis on impressions/clicks/conversions and generating various reports for clients using Hive for analysis and migration to Spark. Big Data Engineer, Lotame (Ad Mobius) (Ad-tech) (Jun 2013- Feb 2015)

Lookalike behavior modelling on profiles, coming up with lift-reach scoring mechanisms for siloes or correlated behaviors to improve better targeting.

Implemented real-time workflow using Cdap platform for unique stats using Hyper Loglog and saved the company 1M dollars to avoid running unique stats in MR framework.

Benchmarked both storm and Cdap platform for guaranteed message processing and performance in choosing the right platform for streaming and set platform for future enabling Spark/MLIb to be integrated for processing workloads and exploratory work.

Worked with cdap to understand the transaction layer architecture and do improvements to the HBase layers as required while scaling up.

http://www.slideshare.net/sundi133/cascading-talk

Ongoing work on classification of URLs based on custom taxonomy using basic NLP techniques and scaling up the system with multithread processing per URL.

Participated in Kaggle Data Science competitions and in top 10-20% across the different challenge’s setup in Kaggle (username sundi133).

Converted fingerprinting devices hive workflows to cascading resulting in a 2x improvement in runtime. Made optimizations in workflow, Cascading work in Oozie and implemented efficient joins.

Wrote combined File input reader for faster processing and equal distribution of workloads across the mappers in Cascading.

Developed and optimized the graph processing logic from hive to Apache Giraph for mobile device graph handling 3B devices and reduced the time from 10 hours to 15 minutes. The total graph size is ~ 500M devices.

Developed the multidimensional query engine using bitmap logic for various segment/attribute explorers which enabled to query adhoc results in milliseconds for multiple segments compared to minutes/hours before. Used Play framework for web services and javaewah for bitmaps. The bitmap files are generated by a Map Reduce job. The whole process is automated in Oozie in production.

Managed the Hadoop cluster using Cloudera manager. Upgraded services from MR1 to YARN and made sure cascading and hive jobs are compatible and running smoothly in Yarn UI. Used Hive, Oozie, Cascading, HBase for ETL data pipelines. Analytics Engineer, Ngmoco (Jan 2013- Jul 2013)

Built a cluster analyzer tool for log analysis of all the services running in the cluster like Hadoop, HBase, Hive, Oozie. This framework which helps in finding performance of the cluster, also finds correlations between nodes and visualize the aggregated data in a web UI. Technologies used are cascading, highcharts and pssh, Play web framework.

Used Pattern from cascading to do churn modeling on game user data based on the PMML files generated from R. The model was trained on 3 months of user data.

Involved in managing the Hadoop cluster using tools like Puppet, Chef, Vagrant. Software Engineer, Motorola Solutions (Home Network Division) – Push to Talk R&D (India)

(Sep 2008- July 2011)

Awarded the Bravo award in Motorola 2009, 2010 consecutively for writing highest quality bug free software code and creating better end to end monitoring (Java/Postgres/Android/Web Programming).

Contact this candidate