Machine Learning Data Scientist

Location:

Bonney Lake, WA

Posted:

March 13, 2025

Contact this candidate

Resume:

Rahul Vallapureddy

Sr. Data Scientist

******************@*****.***

+1-906-***-****

SUMMARY PROFESSIONAL:

Overall, 9+ years of hands-on experience in data science, machine learning, and advanced analytics, applying data-driven strategies to solve complex business challenges across diverse industries.

Expertise in developing and deploying machine learning models, including supervised, unsupervised, and deep learning algorithms (e.g., CNNs, RNNs, Decision Trees, Random Forests, SVMs, and Transformers like BERT and GPT).

Skilled in data wrangling, feature engineering, and preprocessing using Python libraries such as Pandas, NumPy, Dask, PySpark, and Apache Beam, ensuring high-quality and actionable insights from structured and unstructured data.

Proficient in utilizing cloud platforms (AWS, Azure, GCP) for scalable and efficient model development and deployment, including services like S3, EC2, SageMaker, Data Factory, Databricks, and Google Cloud ML.

Extensive experience in building and optimizing data pipelines using Apache Kafka, Spark, Airflow, and other ETL tools (Informatica, SSIS) to ensure seamless data flow across systems and improve model performance.

Hands-on experience with both relational (MySQL, Oracle, AWS RDS) and NoSQL (MongoDB, DynamoDB) databases, as well as cloud-native data lakes (AWS S3, Azure Data Lake, Snowflake).

Demonstrated expertise in feature selection, dimensionality reduction (PCA), and advanced machine learning techniques such as boosted decision trees, K-means clustering, and ensemble models for high-accuracy predictions.

Extensive experience with Generative AI (GenAI), including Large Language Models (LLMs), Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and transformers, for tasks such as image synthesis, text generation, and anomaly detection

Proficient in delivering clear and interactive visualizations and reports using BI tools like Tableau, Power BI, and Plotly, enabling data-driven decision-making at all organizational levels.

Strong command of Natural Language Processing (NLP) techniques for text classification, sentiment analysis, topic modeling, and language generation using tools like NLTK, spaCy, and Gensim.

Extensive experience in model monitoring, hyperparameter tuning, and performance evaluation using MLflow, SHAP, and other interpretability tools.

Graph Network experience, including working with knowledge graphs and graph neural networks for complex relational data modeling and analysis

Excellent communicator who can translate complex data insights into clear, actionable recommendations for stakeholders, ensuring alignment with business goals and objectives.

Strong problem-solving mindset focusing on continuous learning and adaptability, thriving in fast-paced environments, and bringing creative solutions to challenging data-driven projects.

TECHNICAL SKILLS:

Programming Languages: Python, R, SQL, T-SQL, VBA, PowerShell

Databases: MySQL, Oracle, MongoDB, AWS RDS, DynamoDB, NoSQL

Data Warehousing & Data Lakes: Snowflake, Data Lakes, Azure Data Lake, AWS S3

ETL Tools: Apache Airflow, Azure Data Factory, Informatica, SSIS

Cloud Platforms & Services: AWS (S3, EC2, RDS, DynamoDB, Lambda, SageMaker, Glue, EMR, CloudWatch), Azure (Azure ML Studio, Databricks, Data Factory, Data Lake, HDInsight, Azure DevOps, Key Vault), Google Cloud Platform (GCP)

Libraries & Frameworks: NumPy, Pandas, Scikit-learn, PyTorch, TensorFlow, NLTK, spaCy, Gensim, Dask, PySpark, Apache Beam, Apache Kafka, Apache Airflow, MLflow, SHAP, Plotly, Matplotlib, Seaborn, ggplot2, Caret, Transformers (e.g., BERT), OpenAI Gym

Big Data Technologies: Apache Kafka, Apache Spark, Apache Beam, Apache Hudi, Hadoop

CI/CD & Containers: Jenkins, Azure DevOps, Docker, Kubernetes

Data Visualization: Tableau, Power BI, Plotly, Matplotlib, Seaborn, ggplot2

Machine Learning Models: Boosted Decision Trees, CNNs, SVMs, RNNs, Transformers (e.g., GPT models), Logistic Regression, Decision Trees, Random Forests, K-means clustering, Principal Component Analysis (PCA)

NLP Techniques: Text Classification, Sentiment Analysis, Topic Modeling

Other Tools & Technologies: Selenium, Microsoft Word, Excel, PowerPoint, ServiceNow, Git, Jira, SPSS

WORK EXPERIENCE:

Fidelity Investments, Boston, MA July 2023 – Present

Sr. Data Scientist

Responsibilities:

Developed cloud-based data lakes leveraging AWS S3, AWS RDS, and DynamoDB, optimizing storage and retrieval of large-scale datasets for machine learning tasks.

Developed and maintained over 10 RESTful APIs that integrated machine learning models into business applications, improving system communication efficiency by 35%.

Led the end-to-end development and deployment of machine learning models using Python, NumPy, Pandas, and scikit-learn, driving predictive analytics initiatives and increasing business decision-making speed by 50%.

Generated actionable insights and optimized business processes by applying advanced machine learning techniques, including Decision Trees, Random Forests, Rule Mining, Clustering, PCA, Support Vector Machines, and Ensemble techniques.

Leveraged distributed computing frameworks such as Dask and PySpark to process large-scale datasets efficiently, ensuring fast, scalable solutions for complex data problems.

Designed and implemented real-time data streaming solutions using Apache Beam and Apache Kafka to process and analyze continuous data flows in real-time.

Built and managed end-to-end data pipelines using Apache Airflow, automating data workflows and ensuring smooth integration across systems and platforms.

Conducted exploratory data analysis (EDA) with visualization tools (Plotly, Matplotlib), identifying trends and outliers, improving data insights, and guiding strategy development by 30%.

Performed statistical analysis and hypothesis testing to validate assumptions and support data-driven decision-making across business units.

Developed and deployed NLP models using NLTK, spaCy, and Transformer models (BERT), enhancing text classification and sentiment analysis accuracy by 20%.

Extensive experience working with text data, developing, fine-tuning, and deploying NLP and LLM models for various use cases, including text classification, entity recognition, and sentiment analysis.

Implemented explainable AI (XAI) techniques (SHAP) to ensure model transparency and interpretability, increasing stakeholder trust and reducing model skepticism by 40%.

Utilized Apache Hudi for data versioning and change data capture, enabling efficient management of large datasets and real-time updates.

Managed model lifecycle using MLflow, ensuring proper tracking, versioning, and reproducibility of machine learning models.

Designed and implemented machine learning workflows in AWS services such as SageMaker, Glue, EMR, Lambda, and EC2 to scale model training, deployment, and monitoring.

Extensive hands-on experience with AWS SageMaker, developing, training, and deploying machine learning and NLP models at scale.

Applied microservices architecture principles to build scalable and flexible machine learning applications that integrate seamlessly with existing business systems.

Conducted web scraping using Selenium to gather unstructured data from various sources, transforming it into structured datasets for further analysis.

Utilized Docker and Kubernetes for containerizing machine learning models, improving model portability and scalability in cloud environments.

Implemented CI/CD pipelines with Jenkins and Git, automating model deployment, versioning, and testing to streamline development processes and increase operational efficiency.

Monitored model performance in real-time using AWS CloudWatch, ensuring models operate efficiently and providing insights on areas for improvement.

Ensured high data quality through rigorous data cleaning, validation, and preprocessing, optimizing datasets for accurate and effective machine learning model development.

Developed reinforcement learning models using OpenAI Gym, applying cutting-edge algorithms that simulated decision-making processes, resulting in a 15% increase in optimization efficiency.

Ensured compliance with data privacy regulations (GDPR, CCPA) by implementing robust data governance and security measures in machine learning pipelines.

Optimized data storage and retrieval processes, implementing best practices for efficient data storage and minimizing costs in cloud environments.

Utilized Snowflake and Oracle databases for efficient data storage, retrieval, and analytics, improving query performance and data management.

Led cross-functional teams in tackling complex data challenges, collaborating with business, engineering, and operations teams to deliver high-impact data solutions.

Applied Deep Learning techniques, including Neural Networks, Multilayer Perceptron, Word Embeddings, Categorical Embeddings, RNNs, LSTMs, Word2Vec, Encoder/Decoder Models, Attention and Transformer Models, Transfer Learning (ULMFiT), and Foundation Models from Azure OpenAI.

Worked with databases such as Snowflake, Oracle, and Graph Databases to manage, retrieve, and analyze structured and unstructured data.

Environment: Python (NumPy, Pandas, scikit-learn), AWS, Apache Kafka, Apache Beam, Dask, PySpark, Apache Hudi, NLTK, spaCy, BERT, MLflow, Plotly, Matplotlib, SHAP, Selenium, Docker, Kubernetes, Jenkins, Git, OpenAI Gym, Agile.

Ascension Health, St. Louis, MO November 2021 – June 2023

Data Scientist/ ML Engineer

Responsibilities:

Applied advanced machine learning techniques, including regression analysis, time series forecasting, and deep learning models, to drive data-driven decision-making in the healthcare domain.

Developed, optimized, and deployed machine learning models using Azure ML Studio, Databricks, and Snowflake, ensuring scalability, reliability, and efficiency in real-world applications.

Designed and implemented AI pipelines leveraging Azure AI Search, Azure OpenAI APIs, and Azure SQL Database to enhance data processing and AI-driven insights.

Integrated AI solutions into business applications such as Salesforce and Power BI, enabling seamless access to actionable insights and improving data-driven decision-making by 25%.

Worked extensively with healthcare data, ensuring compliance with HIPAA and FHIR standards, while handling sensitive data with advanced data masking and encryption techniques.

Deployed AI models into production environments, ensuring end-to-end deployment, monitoring, and integration into existing IT infrastructure, reducing model deployment time by 30%.

Utilized cloud-based platforms, including Azure Data Factory, Azure Data Lake, and Snowflake, to streamline ETL processes, automate data workflows, and improve data accessibility.

Designed and implemented RAG-based NLP solutions for text classification, sentiment analysis, named entity recognition, and summarization, leveraging libraries such as LangChain, TensorFlow, and spaCy.

Applied search algorithms and retrieval models using Elasticsearch and Azure AI Search to enhance information retrieval tasks.

Conducted exploratory data analysis (EDA) and feature engineering to improve model performance, increasing predictive accuracy by 20%.

Developed and maintained CI/CD pipelines using Azure DevOps to automate machine learning model deployment, improving iteration speed and operational efficiency.

Managed high-cardinality datasets, including ICD, CPT, and NDC codes, ensuring optimized model accuracy and performance.

Automated model deployment workflows through containerization and orchestration using Docker and Kubernetes.

Collaborated with cross-functional teams in Agile and Scrum environments, actively participating in sprint planning and delivering high-quality results.

Provided mentorship to junior data scientists, enhancing model development processes and improving overall project outcomes.

Conducted AI governance reviews and implemented model governance practices to ensure compliance and auditability of AI/ML systems.

Utilized Snowflake Notebooks and Snowpark for advanced analytics and AI-driven insights, enhancing decision-making across healthcare business units.

Ensured the confidentiality and security of PHI by implementing advanced data masking, encryption, and de-identification techniques, in full compliance with HIPAA regulations.

Collaborated with compliance teams to enforce strict data privacy measures, ensuring PHI was handled, stored, and processed securely throughout the AI model development and deployment lifecycle.

Ensured compliance with data security best practices in AI model development and deployment, reducing data security incidents by 15%.

Delivered strategic recommendations to leadership based on AI-driven insights, improving business performance and operational efficiency.

Environment: Python, SQL, Azure ML Studio, Databricks, Snowflake, Azure AI Search, Azure OpenAI APIs, Power BI, LangChain, TensorFlow, PyTorch, scikit-learn, spaCy, Docker, Kubernetes, Azure DevOps, CI/CD, Azure Data Factory, Azure Data Lake, Apache Kafka, Hadoop, SPSS, Elasticsearch, HIPAA, FHIR, ICD, CPT, NDC.

Robosoft Technologies, India. January 2018 – July 2021

Data Scientist

Responsibilities:

Conducted data preprocessing and exploratory data analysis (EDA) to identify patterns, trends, and outliers, driving key insights for business decisions.

Wrote complex SQL and T-SQL queries for data extraction and analysis from relational databases, ensuring data integrity and accurate reporting.

Employed a variety of machine learning algorithms, including Logistic Regression, Decision Trees, Random Forests, and K-means clustering to solve complex business problems.

Conducted regression analysis and used Principal Component Analysis (PCA) for dimensionality reduction, improving model interpretability and performance.

Utilized Pandas and NumPy for efficient data manipulation, cleaning, and preprocessing, ensuring datasets were well-structured and ready for analysis.

Applied feature engineering techniques to create meaningful features, improving performance and predictive capabilities of machine learning models.

Implemented strategies for handling missing values and performed data normalization to ensure consistency and accuracy across datasets.

Performed cross-validation and implemented overfitting prevention techniques to enhance the generalizability of predictive models.

Leveraged AWS services like S3, EC2, RDS, and Lambda to build scalable, cloud-based solutions for data processing and machine learning model deployment.

Developed and optimized machine learning models using Python libraries such as Scikit-learn, PyTorch, and TensorFlow, ensuring the highest model performance and accuracy level.

Applied hypothesis testing and statistical analysis techniques to validate assumptions and support data-driven decision-making across the business.

Developed and automated data pipelines using tools like Informatica and Apache Airflow to streamline data ingestion, transformation, and processing workflows.

Worked with various databases, including Oracle and MongoDB, leveraging relational and NoSQL databases for diverse data storage and retrieval needs.

Evaluated model performance through various metrics and visualizations, iterating on models to improve results continuously.

Collaborated in Agile and Kanban environments, participating in sprint planning and regular updates to ensure alignment with business goals and deadlines.

Automated repetitive tasks and data workflows using VBA and Excel automation, improving efficiency and data accuracy.

Created interactive visualizations and dashboards in Tableau, using Seaborn for statistical data analysis and insights presentation to stakeholders.

Assisted in designing and developing data models to support various business functions, providing insights that guided effective decision-making.

Utilized data automation strategies to optimize data management processes across systems and platforms.

Delivered actionable business recommendations based on data analysis and predictive modeling, driving strategic decisions and improving operational efficiency.

Environment: Python (Pandas, NumPy, Scikit-learn, PyTorch, TensorFlow), SQL, T-SQL, AWS, Tableau, Seaborn, Informatica, Airflow, Oracle, MongoDB, Excel, Tableau, Agile, Kanban.

GoodWorkLabs, India. October 2015 – December 2017

Jr. Data Analyst

Responsibilities:

Performed A/B testing and statistical analysis to evaluate business strategies and optimize outcomes based on data-driven results.

Developed and implemented ETL pipelines for seamless data integration.

Applied machine learning algorithms for predictive analytics, uncovering hidden patterns and trends to support actionable insights and strategic decision-making.

Managed Oracle DBMS and wrote advanced SQL queries for data analysis.

Conducted data validation, ensuring data accuracy, consistency, and completeness by implementing robust data cleaning processes and removing duplicates to maintain data integrity.

Utilized advanced Python libraries such as Pandas and NumPy for efficient data transformation, manipulation, and feature engineering to prepare datasets for analysis.

Developed interactive visualizations and dashboards using Plotly to present data trends and insights, enabling better understanding for stakeholders and decision-makers.

Created and maintained detailed reports and dashboards in Microsoft Excel, PowerPoint, and Word to provide stakeholders with actionable insights and support data-driven decisions.

Supported database operations and data management practices, ensuring compliance with organizational standards for data accuracy and security.

Streamlined SSIS data integration processes, ensuring seamless data flow across platforms and systems to improve operational efficiency.

Assisted with data extraction from various sources and systems, providing timely and relevant insights to drive business performance and enhance decision-making.

Collaborated with cross-functional teams to align data analysis with business objectives, contributing to developing strategic insights that enhance operational efficiency.

Continuously monitor data quality, performing regular audits and checks to guarantee the accuracy and integrity of data across the organization.

Environment: Python (Pandas, NumPy), Oracle DBMS, SQL, Plotly, Microsoft Excel, PowerPoint, Word, SSIS.

EDUCATION:

Bachelors in Computers from Osmania University, India.

Contact this candidate