Data Science Deep Learning

Location:

Irving, TX

Posted:

February 18, 2025

Contact this candidate

Resume:

Mounika Vakkanti

*******************@*****.*** +* (806) 401 - 7299

linkedin.com/in/mounika-v-520658237/

Dallas, TX

SUMMARY:

Around 6 years of Data science experience, I specialize in creating classification and regression models and delivering actionable business insights. I have worked extensively with Open AI, Vertex AI, Anthropic, Lang Chain, and Haystack. Additionally, I developed user friendly platforms for natural language communication with textual and SQL data and designed a chatbot for secondary research.

EDUCATION:

Degree School Graduation

Master’s in computer science Texas Tech University 2022 August – 2024 May

Bachelor’s in Electronics and Communication Engineering. JNTU Kakinada 2014 August – 2018 July.

TECHNICAL SKILLS:

Machine Learning:

Deep Learning:

Programming Languages:

Large Language Models(LLMs):

Embedding Models:

LLM Frameworks:

Linear Regression, Logistic Regression, KNN, Decision Tree, Navie Bayes, XGBoost, gradient boosting, online learning, and deep learning, scikit-learn,MLLib,Tourch

Neural Networks, Optimizers, Activation Functions, Early Stopping, Regularization, Loss Function, Keras Tuner for Hyperparameter Tuning, PyTorch, TensorFlow and Hugging Face

Python, SQL, R, Go, Java

Open AI: Proficient in Assistant API, Chat Completion, and Batch API(with and with out function call), GenAI and LLMOps(Large Language Model Operations), NLTK, CoreNLP, Gensim, spaCy, OpenNLP, UIMA, GATE,

Anthropic: Experienced with most variants.

Google: Experience with most variants

Open AI, Open-Source models.

Lang Chain Haystack, LlamaIndex, A/B testing

Image Processing:

Open source OCR:

Paid OCR:

Data Visualization:

Data Bases:

Web Development:

Source Code Management:

Deployment/production:

ETL Data flows:

Computer Vision, PIL/Pillow, GraphicsMagick, FFmpeg, OpenCV, SimpleCV

Pytesseract, frameworks

Google Vision, Amazon Textract, MS Azure computer vision.

Matplotlib, Vision, Tableau.

MySQL, Cassandra, PostgreSQL, NoSQL, Vector and MangoDB

Flask, React, HTML, CSS, Fast API, Cloud Based endpoints, ReST API,GraphQL, HTTPs

Git, GitHub, Bitbucket, Devops

AWS (EC2), Jenkins, Kubernetes, Docker, Dask, MLOps, CI/CD, Kubeflow, Airflow, SageMaker, Bedrock, Rekognition, Agile/Scrum

Spark API, Spark Streaming, Spark SGtructured Streaming, Spark SQL, Kafka, NextFlow

TECHNICAL SKILLS:

Client: Fannie Mae, Plano, Texas July 2024 - Current

Role: Data Scientist

Responsibilities:

1.Deep Learning:

Project: Image recommendation for Quality Control in Manufacturing, leveraging Deep Learning Using RNNs,

Approach: Designing a convolution neural network (CNN, RNN) using TensorFlow to detect defects in manufacturing products.

Preprocessed images and augmented data to improve model robustness, ML CI/CD pipelines for automated testing and

Deployments.

Outcome: Enhanced defect detection accuracy by 25% significantly reducing manual inspection efforts.

2.Recommendation System for Health Care:

Project: Personalized Product Recommendation for Ecommerce.

Approach: Implemented collaborative filtering and content – based recommendation algorithms to provide personalized

product suggestions. Leveraged user behavior data, purchase history, and product metadata.

Outcome: Boosetd average order value by 12% and improved customer satisfaction through tailored recommendations.

3. Generative AI (Gen AI):

Project: Financial Q&A Boat, BERT for contextual understanding and Fine tuning LLMs.

Approach: Developed a financial Q&A chatbot using Lang Chain and LLMs to answer user queries based on business

textual data. Implemented text summarization and sematic search for accurate response.

Outcome: Improved user experience and response accuracy by integrating semantic search and text summarization with

Lang Chain and GPT-4, reducing manual intervention and increasing efficiency by 40%.Working on the generative AI

architecture.

4. Preparing large datasets for ML models, applying statistical methods to improve the model performance and Working on GCP

cloud services for integrating end-to-end pipelines and model deployment, DPO and KTO on reinforcement

5. Developed and deployed production-grade software for SLAM, geometric computer vision, and point cloud

processing, enabling real-time scene reconstruction and mapping.

Client: Texas Tech University Health Sciences Center, Texas March 2023 – February 2024

Role: Software Developer

Responsibilities:

1.JAVA desktop application.

Project: Speech Diagnosis and Stress Testing Enhancements

Approach: Designed and developed a desktop application using J2EE, web containers, and Java for robust front-end UI and

backend logic, integrating RESTful APIs for seamless communication. The application utilized React, HTML, and CSS for a

dynamic and responsive user interface, ensuring enhanced user experience. Implemented a microservices architecture,

design patterns to enable modularity and scalability, handling data processing efficiently. Followed Agile methodologies,

conducted end-to-end testing, and optimized performance for smooth functionality across different environments.

Outcomes: Improved system efficiency by 40% through microservices integration and optimized RESTful API performance.

Enhanced software robustness using TDD, code reviews, and stress testing, ensuring high reliability and scalability.

2.Data Cleaning and Processing

Project: Data Cleaning and Preprocessing with Apache Spark on high dimensional data.

Approach: Leveraged Apache Spark for data cleaning and preprocessing of large datasets, ensuring data quality and

consistency. Managed missing data, outliers, and duplicates to ensure a clean data pipeline for further analysis. Utilized

Python to develop applications incorporating Collections, Exceptions, and Interfaces.

Outcomes: Improved data consistency, enabling more accurate downstream analysis, and enhanced processing speed for

large datasets by optimizing Apache Spark operations, detection, tracking, and scene understanding of data.

3. Data Visualization for Clinical Data by using R language

Project: Data Visualization for Clinical Data

Approach: Developed data visualizations for clinical datasets using ggplot2 in R, Seaborn, and Matplotlib in Python.

Focused on creating, model interactive, evaluation and insightful charts to present clinical data for analysis.

Outcomes: Delivered actionable insights to stakeholders by creating intuitive, accurate, and visually engaging data

visualizations, improving decision-making in clinical environments, deployments and integrations.

Client: Texas Tech University, Lubbock, Texas September 2022 – February 2023

Role: Software Developer

Responsibilities: -

Project: Smart Grid and Energy Consumption Analysis with Federated Learning

Approach: Analyzed large-scale smart meter and smart grid data using Seaborn and Matplotlib for visualizations like time series plots and heatmaps. Applied Deep Learning (CNN, TCN) for image and sequence data analysis, and implemented federated learning with TensorFlow Federated for decentralized model training. Conducted data mining on household electricity data, fine tuning of house data, and used FedDetect for anomaly detection, understanding probabilities, hypothesis testing and model evaluation to get better accuracy.

Outcomes: Improved anomaly detection accuracy, optimized energy consumption analysis, and enhanced smart grid operations. Enabled decentralization, secure training of AI/ML models, identified actionable consumption patterns, and improved energy usage forecast.

Client: FIS GLOBAL, Bangalore, Karnataka, India April 2021 – August 2022

Role: Data Scientist

Responsibilities:

1.Time Series Forecasting for CPG Client:

Project: Beverage Product Demand Forecasting

Approach: Developed a time series forecasting model using forecasting with ARIMA, to predict future demand for a

popular beverage product. Integrated external factors like seasonality, promotions, and economic indicators to enhance

accuracy, sklearn for ML systems that handle high-throughput, real-time, or batch processing workloads efficiently.

Outcomes: Achieved a 15% improvement in forecast accuracy, enabling better inventory management and reduced

Stockouts.

2.Marketing Mix Model (MMM):

Project: R0I optimization for marketing campaigns.

Approach: Developed am MMM using regression analysis to quantify the impact of various marketing channels on sales.

Incorporated variables such as TV, digital, print advertising, and performance.

Outcome: Provided actionable insights that optimized marketing spend, resulting in a 20% increase in ROI.

3.Customer Attribution in Banking:

Project: Customer Churn Prediction

Approach: Utilize Classification algorithms like Random Forest and Gadient Boosting to predict customer attrition. Analyzed

Transaction history, customer demographics, and engagement metrics to identify at risk customers.

Outcomes: Increased retention rate by 10% through targeted intervention strategies based on model insights.

4.Nural Language Processing (NLP) for safety Control:

Project: Sentiment analysis for Customer Feedback.

Approach: Employed NLP technique using spaCy and transformers to analyze customer reviews and social media mentions.

Implemented sentiment classification to gauge customer satisfaction NLP metrics and benchmarking utilities such as

BERTScore, ROUGE, or METEOR by using frameworks like such as spaCy and langchain.

Outcome: Enabled real time monitoring of customer sentiment, leading to proactive customer service improvements.

ZENMONICS PVT LTD, Hyderabad, India February 2019 – March 2021

Role: Data Scientist

Responsibilities:

Accomplishments:

Cleaned and prepared data using Pandas, NumPy, SciPy, pySpark, pytorch and Hive, ensuring reliability by addressing missing values, outliers, and inconsistencies.

Working closely with Data engineer and ML engineering teams to monitoring, retaining the pipelines and troubleshooting trouble shooting workflows to keep models accurate over time.

Worked on Databricks, MLflow, AWS (EC2, S3, Sage Maker) and containerization technologies such as Docker.

Developed, monitored, and maintained AWS cloud infrastructure for web development, microservices, and asynchronous processes using Django, Flask, and Tornado, integrating Django models for database connectivity.

Developing ETL Processes segregation of data transformations using microservices and dbt in On-Prem & cloud AWS (Glue, S3, Redshift, Athena, Snowflake) and Terraform (IaC) for automatically creating cloud services.

Excel in fast-paced environments, with expertise in developer tooling across SDLC and in-depth knowledge

of data structures & algorithms.

Ingested data through AWS Kinesis Data Stream and Firehose like Parquet, csv files from various sources to S3.

Experience with AWS cloud (EMR, EC2, RDS, EBS, S3, Lambda, Glue, Elasticsearch, Kinesis, SQL, DynamoDB,

Redshift, ECS, PostgreSQL and Microsoft SQL).

Hands on Experience with ETL/ELT tools such as AWS Glue, BigQuery, Using Data pipeline to move data to AWS RedShift.

Proven problem-solver in fast-paced startup environments with strong communication skills and effective team collaboration.

Developed scalable data pipelines and ETL processes in Databricks, optimizing data workflows and processing large

datasets efficiently and Refactoring code.

Elasticsearch was used to filter data from S3 buckets, and data was then loaded into external Hive tables.

Performed analysis on terabytes of data using Python, Spark SQL, S3 and Redshift In order to obtain customer insights.

Collaborated with the infrastructure team to ensure seamless external integrations, implementing scalability and

maintainable IaC solutions using Terraform and AWS CloudFormation.

Involved in building a custom Rest API to help data scientists and applications with real time customer analytics and

Data Integration tools like Airbyte.

Contact this candidate