Data Engineer

Location:

Dorchester, MA

Posted:

April 16, 2021

Contact this candidate

Resume:

ANURAG KUCHE

*****.*@************.*** www.linkedin.com/in/anuragkuche 857-***-**** Boston, MA

https://anuragkuche.medium.com/ https://leetcode.com/Anuragkuche/ Highly Motivated and Self-taught Data science enthusiast with a flair towards Data Engineering and Deployment, coupled with good Statistical knowledge. Excellent communication with a business performance mindset experienced working in a production environment with a proven track record of Collaboratively working towards delivering valuable analytical solutions via Data Analytics. EDUCATION

Northeastern University, Boston, MA

Master IIIT -B, of Bangalore, Science in India Data Analytics [Stem] (GPA: 3.74) May 2021 PG- Diploma in Data science May 2018

SIR MVIT, Bangalore, India

Bachelor of Engineering in Mechanical Engineering April 2016 TECHNICAL SKILLS

Programming skills: Python [Scripting], SQL [Relational Database], NoSQL, Spark, Scala, Core Java, R, C++, Shell Big Data Technologies: Hadoop, HDFS, Hive, Apache Spark, AWS S3, EC2, SageMaker, EMR, RedShift, Distributed Systems Business Intelligence reporting: MS Excel, Tableau, Talend, Matplotlib, Seaborn, Git, Docker, Flask, Streamlit, Kubernetes, CI/CD, Agile Areas of Expertise: Computer Science, Software Development, Statistics, Data Warehousing, A/B testing, Data Analysis, Data Structures, Algorithms, REST APIs, Data Mining, Quantitative Modelling, Applied Data Science: Regression, Classification, Clustering, Decision Trees, SVM, XG Boost, LGBM, ADA Boost, ETL Deep Learning Frameworks: Neural Networks, CNN, RNN, Sequence Models, Hyperparameter, Performance Tuning Machine Learning libraries: NumPy, Pandas, TensorFlow, Keras, PyTorch, NLTK, Scikit-learn, OpenCV, SciPy, GitHub actions Databases [DB]: Oracle, MySQL, PostgreSQL [postgres], HBase, Redis, Cassandra, MongoDB, DynamoDB Cloud computing: Amazon Web Services [AWS], Google Cloud Platform [GCP], Snowflake, Azure Soft Skills: Communication Skills, Problem Solving, Collaborate, Hands-on, Best Practices, Data-Driven, Impact, PROFESSIONAL EXPERIENCE Presentation, Technical writing, Passionate, Fast-paced Environment, Critical Thinking DTonomy, Boston, USA

Machine Learning Engineer Intern September 2020 – Present

• Build end-to-end systems like chatbot and Deployed Natural language processing chatbot to GCP, integrated with Ngrok servers.

• Leveraging AWS Firehose, Kafka[json] to gather data from various large data Sets & visualized them in Quick sight

• Used Elasticsearch, Logstash, and Kibana [Monitoring tools] in searching, storing, perform complex aggregations and visualizations on logging analysis for various use cases [Data lifecycle]. India Tech, Bellary, India

Research Analyst - Junior Manager August 2017 - July 2019

• Build a real-time Tableau data visualization of manufacturing and sales of various range of Products and the client requirements.

• Worked in the supply chain & logistics industry, analysing(SQL) the requirements of more than 1000 customers & predicting sales SELECTED PROJECT’S

Building Transit status tracker using Apache Kafka Kafka Connect, REST Proxy, Faust, KSQL March 2021- April 2021

• Building highly scalable Transit tracker pipelines for its commuters using Kafka, REST Proxy, Faust, KSQL and, Kafka Connect

• Building a data pipeline of REST Proxy to get the data about the weather and Kafka producer for the train arrivals

• Using a Kafka consumer to consume from other Kafka topics, used Faust and KSQL to Extract data and perform transformations Wikipedia Edit log PySpark, Batch processing, streaming architectures, spark SQL, Delta lake March 2021

• 10X Cost-effective way of using a combination of batch processing and stream processing of logs from Kafka broker.

• Using delta lake as a source and normalized tables as a sink which would be updated every second in DataBricks

• Finally using advanced geospatial aggregated tables using SQL quarries for real-time visualizations like line plots and map plots. Website Recommender Based on User Reviews Hive, Pig, Sqoop, MySQL, HDFS November 2020 -December 2020

• Worked on large Data sets to Recommending the best websites for a particular genre, based on the customer reviews.

• Sentimental analysis on unstructured user reviews in XML. Using Pig and Hive to parse the data and HDFS to store the data.

• Scoop to write the results to MySQL database and Creating a user interface top related websites per category. IBM Recommendation system Ranking, User-Based and Content Based Recommendations January 2021

• Analyze the interactions of users so that we could recommend content. Providing personalized content by clustering users.

• Using NLP techniques to develop a content-based system recommendation system for customer segmentation.

• Using matrix factorization for predicting and forecasting how much the new article would engage the users for marketing. IP Theft: Anomaly detection Decision trees, XG Boost, K-means, Scikitlearn (Sponsor: General Electric). April 2020 - June 2020

• Performed Data Wrangling to check the data quality of the training data and convert it into a date-time object, use different models for threshold and atomic instances. Would have saved 8.8 million dollars when compared to the traditional rule-based system.

• Used various techniques like SMOTE to nullify the effect of class imbalanced datasets and evaluated various clustering Metrics RESEARCH PUBLICATIONS

Transliteration of Kannada Text to English Text September 2018 - October 2018 Converted vowels and consonants into Unicode and stored them in hash maps - this will help in mapping the letters to their ASCII values and enables efficient retrieval of key-value pairs.

Contact this candidate