Afsan Abdulali Gujarati *****.********@*****.*** stackoverflow.com/afsangujarati
github.com/afsangujarati93 linkedin.com/in/afsangujarati Summary
5+ years of experience building scalable data pipelines and deploying machine learning models to production environments. 5+ years of experience in Software Engineering. Extensive experience in scaling apps for increasing workloads. Exceptional ability to align technical requirements with business goals. Strong interpersonal skills. A big advocate of the KISS (Keep It Simple Stupid) principle. Relevant Work Experience
Propense Miami, USA (Remote)
Head of Data Science April 2023 - Present
Tools and Technologies: Snowflake, Fivetran, DBT, AWS (Athena, Glue, ECS, ECR, PostgreSQL, S3 Buckets), Docker, Python (Django, FastAPI, Pandas, PyTorch, and others), Terraform, Github Actions.
● Spearheaded the development and deployment of the inaugural recommendation model, leading to a pivotal $3M funding acquisition.
● Built a Market Triggers API using LLM and RAG to detect company changes and recommend services, saving
$5,000 monthly and enabling near real-time updates.
● Integral in the full-stack development of the product's first version, streamlining the integration of frontend, backend, data science, and data engineering components.
● Innovated a Dask-based recommendation engine, dramatically reducing processing time from days to an hour, showcasing potential for scalability and efficiency gains.
● Improved recommendation quality by analyzing user feedback and incorporating market and industry research, leading to significant advancements in recommendations. Clearco Toronto, Canada (Contract)
Senior Data Engineer June 2023 - Jan 2024
Tools and Technologies: Snowflake, Fivetran, DBT, GCP (Buckets, Artifact Registry, PostgreSQL), Docker, Python
(Flask, FastAPI, Pandas, PyTorch, and others), Terraform, Github Actions.
● Developed and executed a Reverse ETL process using Fivetran, Snowflake, and Census, drastically reducing the time required for new synchronizations from several days to minutes.
● Established staging data pipelines to enable pre-production testing of data-related artifacts, resulting in a reduction of data-related errors and issues by approximately 20%.
● Redesigned and implemented a more efficient data flow system to optimize Snowflake data warehouse costs, saving the company around $200,000 within the first six months of joining.
● Proposed and redesigned the storage strategy for the Data Science team, enhancing the efficiency of data retrieval for analytics and model refinement purposes. Marpipe Inc New York, USA (Remote)
Senior Data Engineer Sepember 2020 - March 2023
Tools and Technologies: AWS (SageMaker, S3, Lambda, ECS, ECR, Code Pipelines, EC2, PostgreSQL, Athena, Glue), Docker, Ruby (Ruby on Rails), Python (Flask, Django, Pandas, PyTorch and others), Vue.js.
● Optimized the app's Insights section through redesigning with caching strategies and optimized SQL queries, achieving a reduction in response time from minutes to milliseconds.
● Engineered and deployed efficient inference pipelines for machine learning algorithms, focusing on minimizing response times and optimizing costs.
● Developed and launched a comprehensive reporting section for the app, delivering detailed analyses of Ads, Assets, and Categories within Ad Campaigns, integrating both frontend and backend components.
● Implemented scalable data pipelines utilizing serverless technology to expedite data delivery to the app, enhancing overall performance and scalability.
Introhive Fredericton, Canada
Data Scientist - Engineer (Scrum Master) September 2019 – September 2020 Tools and Technologies: AWS (EC2, Lambda, S3, Glue, Athena, ECS), Docker, Ruby (Rails), Python (Flask, Pandas, Scikit-Learn, NumPy, PyTorch, and others), PostgreSQL, Javascript, HTML.
● Building scalable ETL pipelines and end-to-end integration of engineering components with machine learning models to handle up to a quarter million queries in less than 5 minutes from an initial 15 minutes.
● Create scalable predictive models through the application of mathematical techniques based on business requirements in an Agile environment to deploy behind the firewall with limited hardware specification.
● Research the state of art machine learning techniques, and also analyze and report their strengths and weaknesses to select the approach that would best suit the product specifications.
● Prepare a process to evaluate the effectiveness of predictive models for practical applications using a quantitative and qualitative approach
Bluenode Inc Halifax, Canada
Software Engineer - Data January 2018 - September 2019 Tools and Technologies: Azure Services (Appservices, Service Bus, PostgreSQL, Storage, ACR, Virtual Networks), Docker, Python (Django, Celery, Pandas, NLTK, Scikit-Learn, NumPy and others), Javascript, HTML (Bootstrap).
● Led the design and construction of a data ingestion and cleansing platform, capable of processing over half a million EDI transactions monthly through ETL processes, with a third undergoing automated cleaning.
● Developed and implemented experiments, evaluation methods, and scalable strategies for extracting HS Codes and matching descriptions using both heuristic and machine learning techniques.
● Created and deployed comprehensive reports and dashboards to track system performance and provide data insights through sophisticated visualizations.
● Collaborated closely with the product and business teams to gather requirements, devised strategic solutions through story creation, and spearheaded the iterative delivery of features in an Agile environment with a dedicated team of four.
BookMyShow Mumbai, India
Software Engineer August 2015 – September 2017
Tools and Technologies: C# .Net, MongoDB, MSSQL, Redis
● Spearheaded the design and implementation of the Movie Ticket Cancellation feature, translating business objectives into technical specifications, now processing over 250,000 cancellations monthly.
● Achieved faster query response times by migrating specific databases from SQL to Redis, optimizing system performance for users.
Education
Dalhousie University (GPA - 4.22/4.3)
Masters of E-Commerce (Data Science)
Halifax, Canada
September 2017 - July 2019
University of Mumbai (First Class)
Bachelors of Engineering
Mumbai, India
August 2010 - October 2014
Thesis
Authorship Attribution Halifax, Canada
Supervisor – Dr. Vlado Keselj September 2018 – July 2019 Examining Committee – Dr. Stan Matwin and Dr. Evangelos Milios
● Identifying the author of an unseen document based on their written and read documents using Natural Language Processing assisted with Machine Learning Techniques.
● The novel approach of using the documents read by the author to train the model, is the first of its kind which was able to boost the accuracy from 65.2% to 91.30%