Data Scientist & ML Engineer - NLP & MLOps Expert

Location:

Dallas, TX, 75215

Posted:

March 03, 2026

Contact this candidate

Resume:

Divya

Data Scientist / Machine Learning Engineer

New Jersey, United States +1-732-***-**** ************@*****.***

PROFESSIONAL SUMMARY

Data Scientist and Machine Learning Engineer with over 4 years of hands-on experience in end-to-end Machine Learning, Deep Learning, NLP, and Statistical Analysis. Proficient in building and deploying predictive models using Python, scikit-learn, TensorFlow, Keras, PyTorch, XGBoost, Random Forest, SVM, KNN, Linear/Logistic Regression, K-Means, DBSCAN, CNNs, RNNs, LSTMs, and Transformers. Expert in NLP techniques including Tokenization, Lemmatization, Stop Words Removal, TF-IDF, Word2Vec, BERT, n-grams, and Text Classification. Skilled in the full data science lifecycle data extraction, cleaning, feature engineering, ETL pipeline design, statistical modelling, A/B testing, hypothesis testing, ANOVA, cross-validation, and regularization using Pandas, NumPy, SciPy, Spark, PySpark, SQL, Hadoop, Hive, AWS (S3, Redshift, SageMaker, Glue, Lambda), MongoDB, Cassandra, and PostgreSQL. Experienced in creating interactive dashboards and reports with Tableau, Power BI, Streamlit, and R-Shiny. Strong expertise in Big Data processing, MLOps practices, Git, Agile/Scrum, and delivering scalable, production-ready solutions that drive business impact and data-driven decision-making.

Technical Skills SK

Programming: Python, SQL, R, Scala, C/C++, JavaScript

Libraries: Pandas, NumPy, scikit-learn, Matplotlib, Seaborn, NLTK, TensorFlow, Keras, MLlib, SciPy

Machine Learning Algorithms: Linear Regression, Logistic Regression, Decision Trees, Random Forest, KNN, XGBoost, SVC, K-Means, DBSCAN, Neural Networks (CNNs, RNNs)

NLP: Tokenization, Stop Words Removal, Lemmatization, Text Classification, TF-IDF, Word2Vec, n-grams

Databases: MySQL, Oracle, PostgreSQL, MongoDB, Cassandra

Tools: Power BI, Tableau, R-Shiny, AWS (S3, Redshift, SageMaker), Streamlit, Hive, Spark, PySpark, MapReduce

Big Data Technologies: Hadoop, HDFS, Spark, PySpark, Hive, MapReduce

Software: Microsoft Office (Advanced Excel), PowerPoint

Operating System: Windows, Linux

Other: Git, Agile/Scrum, Statistical Analysis, ETL Processes, Data Visualization, A/B Testing, Hypothesis Testing, ANOVA, Cross-Validation, Regularization

PROFESSIONAL EXPERIENCE

Capital One - McLean, Va Jul 2024 – Till date

Role: Data Scientist/Machine Learning Engineer

Responsibilities:

Developed and implemented machine learning and deep learning models using Python, Pandas, NumPy, scikit-learn, TensorFlow, Keras, Matplotlib, and Seaborn for predictive analytics and decision-making.

Applied NLP techniques, including Tokenization, Stop Words Removal, Lemmatization, Text Classification, TF-IDF, Word2Vec, and n-grams, to analyse unstructured text data.

Built and optimized classification models using Logistic Regression, Decision Trees, Random Forest, KNN, XGBoost, and SVC, and clustering models using K-Means and DBSCAN.

Created interactive dashboards and reports using Power BI and Tableau to visualize key performance metrics and trends for stakeholders.

Utilized SQL to query and transform data from MySQL, Oracle, and Cassandra databases, and integrated data using AWS services (e.g., S3, Redshift).

Developed Streamlit applications to deploy machine learning models and enable user interaction with data insights.

Leveraged Big Data technologies like Hadoop, HDFS, Spark, PySpark, and Hive for processing large-scale datasets.

Conducted A/B testing, hypothesis testing, and ANOVA to validate model performance and business strategies.

Performed data cleaning, feature engineering, and statistical analysis using Pandas, NumPy, and SciPy to enhance model accuracy.

Collaborated in Agile sprints, using Git for version control to deliver scalable data solutions.

Environment: Python, SQL, R, Pandas, NumPy, scikit-learn, Matplotlib, Seaborn, TensorFlow, Keras, Power BI, Tableau, AWS, Streamlit, Hadoop, Spark, PySpark, Hive, MySQL, Oracle, Cassandra, Windows, Linux, Git, Agile.

Heera - India. Sep 2022 – Dec 2023

Role: Machine Learning Engineer

Responsibilities:

Conducted data profiling and statistical analysis using Python, Matplotlib, Seaborn, and Tableau to identify patterns in large datasets.

Built predictive models using Linear Regression, Logistic Regression, Random Forest, XGBoost, SVC, and Neural Networks, improving performance with cross-validation, L1/L2 regularization, and hyperparameter tuning.

Applied NLP techniques such as Tokenization, Lemmatization, Text Classification, and Word2Vec to analyze customer feedback data.

Created and visualized performance metrics (e.g., AUC, F1-score, confusion matrix) to evaluate model effectiveness.

Developed Power BI and Tableau dashboards to present actionable insights to business stakeholders.

Utilized AWS (e.g., SageMaker, Redshift) for data storage, processing, and model deployment.

Performed data cleaning and feature engineering using Pandas, NumPy, and SciPy to prepare data for modelling.

Designed ETL pipelines using Hive and Spark for efficient data integration and transformation.

Environment: Python, SQL, Pandas, NumPy, scikit-learn, Matplotlib, Seaborn, TensorFlow, Keras, Power BI, Tableau, AWS, Spark, PySpark, Hive, Redshift, Windows, Agile.

Black & Veatch - India Aug 2021 – Jul 2022

Role: Data Scientist/Machine Learning Engineer

Responsibilities:

Extracted and analysed data using SQL and Hive queries to support business decision-making.

Developed machine learning models using Python, R, scikit-learn, and Keras for clustering (K-Means, DBSCAN), classification (Decision Trees, Random Forest, SVC), and regression tasks.

Implemented NLP techniques like Tokenization, Text Classification, and n-grams to process unstructured data and identify patterns.

Created data visualizations using Tableau, Matplotlib, and R-Shiny to communicate insights to stakeholders.

Performed data cleaning, feature scaling, and feature engineering using Pandas and NumPy to improve model performance.

Utilized AWS and Hadoop ecosystems (HDFS, Spark, MapReduce) for processing large datasets.

Conducted statistical analysis using ANOVA, hypothesis testing, and cross-validation to validate findings.

Collaborated in an Agile environment, using Git for version control to deliver data-driven solutions.

Environment: Python, SQL, R, Pandas, NumPy, scikit-learn, Matplotlib, Seaborn, Keras, Tableau, R-Shiny, AWS, Hadoop, Spark, PySpark, Hive, MySQL, Cassandra, Windows, Linux, Agile.

Contact this candidate