Data Science Machine Learning

Location:

United States

Posted:

November 20, 2023

Contact this candidate

Resume:

Mehdi

AI/ML Associate Architect

Cell: 304-***-**** E-mail: ***********@*****.***

Summary

Data Science Engineer with 10+ years of experience in designing, developing, and implementing data-driven solutions. Proficient in various data analysis and machine learning tools and frameworks such as Python, R, TensorFlow, and scikit-learn.

Developed and deployed data pipelines using Apache Spark and Apache Kafka to process large-scale datasets, ensuring high throughput and low latency.

Created and maintained ETL (Extract, Transform, Load) processes to collect and preprocess raw data from various sources, including databases, REST APIs, and streaming platforms.

Leveraged Hadoop Distributed File System (HDFS) to store and manage petabytes of structured and unstructured data, enhancing data accessibility and scalability.

Implemented machine learning models using Python and libraries such as Scikit-Learn, TensorFlow, and PyTorch for predictive analytics and anomaly detection.

Utilized Jupyter Notebook for interactive data exploration, model development, and visualization, facilitating rapid experimentation and model fine-tuning.

Employed version control systems, such as Git, to track changes in code, collaborate with team members, and manage codebase history effectively.

Conducted data preprocessing tasks, including data cleaning, feature engineering, and data imputation, to enhance data quality and enable more accurate model training.

Collaborated with domain experts and stakeholders to define project objectives, KPIs, and success criteria, ensuring alignment with business goals.

Designed and developed RESTful APIs using Flask and FastAPI to serve machine learning models and enable seamless integration with external systems.

Built and maintained cloud-based data storage solutions, including Amazon S3 and Google Cloud Storage, for efficient data archiving and retrieval.

Conducted A/B testing and hypothesis testing using statistical tools like SciPy and performed rigorous statistical analysis to evaluate model performance and validate hypotheses.

Implemented real-time monitoring and alerting systems using Grafana and Prometheus to track data pipeline health, model accuracy, and system performance.

Leveraged containerization technologies, such as Docker and Kubernetes, to ensure consistent deployment and scalability of data science applications.

Designed and maintained data warehouses using technologies like Amazon Redshift and Google BigQuery for business intelligence and reporting.

Automated recurring tasks and job scheduling using Apache Airflow to ensure data pipelines run efficiently and reliably.

Employed time series analysis techniques, such as ARIMA and Prophet, to forecast trends and make data-driven decisions, particularly for demand forecasting and capacity planning.

Collaborated with data engineers to optimize SQL queries and database performance, ensuring efficient data retrieval and processing.

Conducted feature selection and dimensionality reduction using techniques like PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis) to improve model efficiency and interpretability.

Deployed models to production environments using cloud-based services like AWS SageMaker and Google AI Platform for online inference and real-time decision support.

Designed data visualizations and dashboards using tools like Tableau and Power BI to communicate insights and findings to non-technical stakeholders.

Collaborated with DevOps teams to ensure seamless integration of data science solutions into CI/CD pipelines, ensuring robust, automated testing and deployment.

Implemented natural language processing (NLP) techniques, including sentiment analysis, entity recognition, and topic modeling, for text data analysis and content recommendation systems.

Conducted data privacy and security assessments to ensure compliance with GDPR, HIPAA, and other regulatory requirements, implementing encryption and access control measures.

Stayed updated with the latest advancements in data science and machine learning, regularly attending conferences and participating in online courses.

Conducted code reviews and provided mentorship to junior data scientists, fostering a culture of knowledge sharing and code quality.

Collaborated with data architects to design data models and schema for efficient data storage and retrieval.

Conducted performance tuning of machine learning models, optimizing hyperparameters and model architecture for better accuracy and efficiency.

Supported data-driven decision-making by creating interactive and user-friendly data exploration tools and dashboards for non-technical stakeholders.

Performed root cause analysis and debugging of data-related issues, taking a systematic and data-driven approach to problem-solving.

Collaborated with data engineers to implement and optimize data indexing and search capabilities, enabling faster and more accurate data retrieval.

Technical Skills

Versions / Tools / Software

Data Analysis

Python (v3.7 - v3.9), Pandas, NumPy, SciPy Matplotlib, Seaborn Jupyter Notebook, RStudio

Machine Learning Algorithms and Models

Scikit-Learn, TensorFlow Keras, PyTorch Scikit-Learn, XGBoost, LightGBM

Data Visualization

Tableau, Power BI Plotly, D3.js ggplot2

Data Analysis and Visualization

Python, R

Time Series Analysis and Forecasting

ARIMA, Prophet

Data Engineering

Apache Kafka, Airflow ETL Processes

Data Warehousing

Snowflake, Redshift, Teradata

Database Management Systems

MySQL, PostgreSQL

Cloud Computing

AWS, Azure, GCP

Data Wrangling

Pandas, dplyr, data.table

Version Control

Git (v2.x), SVN

Scripting Languages

Shell Scripting, Perl, Ruby

EMPLOYMENT EXPERIENCE

Abbvie

Remote (Dec’2021 - Present)

AI/ML Associate Architect

Actively involved in the design and architectural planning of AI/ML solutions. This includes creating detailed technical specifications for AI models, defining data pipelines, and ensuring compatibility with TensorFlow, emphasizing the use of neural networks, recurrent neural networks (RNNs), and convolutional neural networks (CNNs).

Development and training of AI/ML models using Python, leveraging libraries like NumPy, Pandas, and Scikit-Learn. This involved optimizing hyperparameters, choosing appropriate loss functions, and implementing early stopping strategies to improve model accuracy and convergence.

Utilizing Keras, implementation of deep learning techniques for various applications, including natural language processing (NLP), image recognition, and time series forecasting. Employing deep learning models such as LSTM, GRU, and CNN, achieving state-of-the-art results on complex tasks.

Assessment of the suitability of PyTorch for specific AI projects, comparing its performance and ease of use against other frameworks. This involved conducting benchmark tests and identifying scenarios where PyTorch proved advantageous.

Design and implementation of containerized solutions for deploying AI models to various environments. These containers ensure consistency in model deployment, enabling seamless integration into production systems.

Integration of AI/ML projects into CI/CD pipelines for automated testing, building, and deployment. This ensures that AI models are continuously validated and deployed efficiently.

Work on scalability and performance optimization for AI applications, utilizing Apache Spark for distributed data processing. This enables handling large datasets and improving model training speed.

Utilization of NLTK for text preprocessing, tokenization, and sentiment analysis. Implementation of custom NLP models for tasks like named entity recognition and document classification.

In computer vision applications, employment of OpenCV for image processing, object detection, and feature extraction. This includes implementing techniques like Haar cascades and SIFT for various image-related tasks.

Setup of monitoring systems to track the performance and health of AI/ML models in production. This proactive approach allows for quick identification and resolution of issues.

Maintenance of detailed technical documentation using Confluence, providing comprehensive information on model architectures, data pipelines, and best practices. This documentation facilitates knowledge sharing within the team.

Use of JIRA for project management, task tracking, and collaboration with cross-functional teams. This ensures that AI/ML projects align with organizational goals and timelines.

Playing a role in ensuring AI/ML solutions comply with ethical guidelines and regulations, including GDPR and fairness principles. This involved developing data anonymization techniques and model bias reduction strategies.

Employing TensorBoard for model visualization and performance tuning. Analyzing training metrics and visualizing model architectures helps in fine-tuning AI/ML models for better results.

Identification and resolution of issues using Bugzilla for efficient bug tracking and management throughout the AI/ML development lifecycle.

Environment: TensorFlow, Python, Keras, PyTorch, Docker, Jenkins, Apache Spark, NLTK, OpenCV, Prometheus, Confluence, JIRA, and Bugzilla.

Amdocs

New York City, NY (Apr’2019 – Nov’2021)

Sr. Data Scientist

Developed predictive models using Python and scikit-learn to improve demand forecasting accuracy.

Employed deep learning techniques with TensorFlow to create a recommendation engine, boosting user engagement e-commerce platform.

Collaborated with the DevOps team to deploy machine learning models using Kubernetes and Docker, ensuring scalability and reliability.

Conducted exploratory data analysis (EDA) using Pandas and NumPy to identify data patterns and anomalies, enhancing data quality and insights.

Developed and maintained data pipelines using Apache Airflow to automate data ingestion, transformation, and loading processes.

Implemented natural language processing (NLP) algorithms with NLTK and spaCy for sentiment analysis, resulting in a sentiment score accuracy.

Collaborated with cross-functional teams to define key performance indicators (KPIs) and designed interactive Tableau dashboards for real-time monitoring and decision-making.

Optimized model hyperparameters using grid search and random search techniques, improving model performance by 10% and reducing training time.

Conducted A/B tests using Apache Spark for evaluating new features, leading to data-driven product improvements and increased user engagement.

Utilized Apache Hadoop and Hive for big data processing and analysis, handling large datasets efficiently.

Integrated machine learning models into production systems through RESTful APIs using Flask and Django, ensuring seamless deployment.

Collaborated with data engineers to maintain data lakes and data warehouses, utilizing HDFS and Amazon Redshift for data storage and retrieval.

Employed version control with Git and GitFlow for managing code repositories, promoting collaboration and code consistency within the team.

Conducted feature engineering and selection techniques, including Principal Component Analysis (PCA) and Recursive Feature Elimination (RFE), to improve model accuracy.

Led a team of data scientists in cross-validation and time-series analysis, incorporating models like XGBoost and LightGBM for better predictive accuracy.

Collaborated on the development of anomaly detection algorithms using Isolation Forest and One-Class SVM, detecting anomalies in real-time sensor data with 95% accuracy.

Environment: Tools and technologies used during this period included Python, scikit-learn, TensorFlow, Kubernetes, Docker, Pandas, Tableau and Apache.

Breezline

Bellevue, WA (Feb’2016 – Mar’2019)

Data Scientist

Utilized Python to conduct data analysis and modeling, employing libraries such as Pandas and NumPy for data manipulation and computation. Employed Jupyter Notebook for interactive data exploration and algorithm development.

Employed Scikit-learn for machine learning tasks, such as classification and regression. Applied statistical techniques and feature engineering to build predictive models and fine-tuned hyperparameters to enhance model performance.

Managed and transformed large datasets using Apache Hadoop and HDFS for distributed storage, as well as Apache Spark for distributed data processing. Wrote PySpark code to efficiently extract insights from big data.

Designed and developed deep learning models using TensorFlow and Keras for tasks like image recognition and natural language processing. Tuned neural network architectures and conducted training on GPUs to accelerate processing.

Utilized SQL (Structured Query Language) for data extraction, transformation, and loading (ETL) processes. Developed complex queries in PostgreSQL to retrieve and manipulate data from relational databases.

Constructed interactive and dynamic data visualizations using D3.js and Matplotlib for web-based dashboards and reports. Employed Plotly for creating interactive graphs within Jupyter notebooks.

Employed Git for version control, collaborating with cross-functional teams to manage code repositories and track changes to data science projects.

Developed automated data pipelines using Apache Airflow to schedule and orchestrate data extraction, transformation, and loading processes. Created custom Python operators for specific tasks.

Performed natural language processing (NLP) tasks using NLTK and spaCy libraries. Leveraged pre-trained word embeddings like Word2Vec and GloVe for sentiment analysis and text classification.

Implemented dimensionality reduction techniques such as Principal Component Analysis (PCA) and t-SNE (t-Distributed Stochastic Neighbor Embedding) for feature engineering and data visualization.

Conducted A/B testing (split testing) using Python and tools like SciPy to assess the impact of data-driven decisions on product or process improvements.

Collaborated with the DevOps team to deploy machine learning models using Docker containers. Worked on model versioning and model serving through RESTful APIs.

Employed Linux as the primary operating system for development and production environments, leveraging shell scripting for automation and server maintenance.

Employed Elasticsearch for real-time data indexing and search functionality, enhancing data retrieval capabilities within applications.

Managed and secured data with Apache Kafka for real-time event streaming, processing, and analytics.

Conducted data cleaning and preprocessing using PyTorch and TensorFlow Transform for building robust machine learning pipelines.

Maintained documentation and knowledge sharing within the team using Confluence and integrated automated reporting with Slack and email notifications for project updates.

Leveraged Tableau for creating interactive data visualizations and dashboards for business stakeholders.

Environment: Python, Pandas, NumPy, Scikit-learn, Apache Hadoop, Apache Spark, TensorFlow, Matplotlib, Apache Kafka, TensorFlow Transform, Confluence, Slack, and Tableau.

Mitre Corporation

McLean, VA (Dec’2014 – Jan’2016) AI/ML Engineer

Developed machine learning models using TensorFlow for tasks such as image classification, natural language processing, and recommendation systems. Implemented deep neural networks, including convolutional and recurrent networks, to improve model accuracy.

Cleaned and prepared large datasets for model training, ensuring data quality and consistency. Utilized Python and Pandas for data wrangling, feature engineering, and data augmentation.

Implemented machine learning algorithms for supervised and unsupervised learning tasks. Fine-tuned hyperparameters and optimized model performance through cross-validation.

Conducted feature selection to improve model efficiency and interpretability. Employed techniques like Recursive Feature Elimination (RFE) and feature importance analysis with decision trees.

Assessed model performance using Python, creating visualizations of metrics such as accuracy, precision, recall, and F1-score. Conducted A/B testing to evaluate model improvements.

Integrated machine learning models into production systems using Python and Flask, creating RESTful APIs for real-time predictions. Ensured seamless model deployment and maintained version control.

Leveraged NLTK to preprocess and analyze text data, including tokenization, part-of-speech tagging, and sentiment analysis. Developed chatbots and text classification models.

Utilized OpenCV for computer vision tasks, including image processing, object detection, and optical character recognition (OCR). Designed solutions for image-based automation.

Worked with Hadoop and Spark to handle and process large-scale datasets. Developed distributed data pipelines for parallel processing and feature extraction.

Implemented monitoring solutions in Python to track model performance in real-time. Set up alerts and triggers for model retraining.

Collaborated with cross-functional teams using JIRA for project management and Confluence for documentation. Documented model architectures, data flows, and best practices.

Automated model testing and deployment pipelines. Ensured consistent and reliable model updates and releases.

Managed codebase and model versions using Git. Collaborated with team members to resolve code conflicts and maintain a clean code repository.

Optimized deep learning algorithms for efficiency, leveraging NumPy for array manipulation.

Orchestrated and containerized machine learning workloads using Docker and managed container clusters for scalability and reliability.

Employed SHAP for model interpretability, explaining model predictions and ensuring compliance with regulatory requirements.

Addressed privacy concerns by implementing differential privacy techniques.

Regularly communicated project progress and results to stakeholders, translating technical findings into actionable business insights.

Environment: Scikit-Learn, Matplotlib, Flask, NLTK, OpenCV, JIRA, Confluence, Jenkins, Git, NumPy, Docker and SHAP

SoftSol

Hyd,India (Apr’2013 – Aug’2014) Machine Learning Associate

Developed and maintained Java applications, ensuring compatibility and performance optimization.

Collaborated with the development team to design and implement software solutions, following Agile methodologies and version control with Git.

Worked with JavaEE for building robust and scalable web applications, implementing RESTful web services using JAX-RS.

Utilized the Spring Framework for building the application's core components, configuring Spring beans, and handling data access using Spring Data JPA.

Designed and implemented complex database schemas and queries using Oracle Database, optimizing performance through SQL tuning.

Integrated Hibernate for Object-Relational Mapping (ORM) to simplify data access and ensure data integrity.

Developed and maintained front-end components using JavaServer Pages (JSP) and HTML/CSS, enhancing the user interface functionality and user experience.

Implemented security measures using Spring Security to control access and protect sensitive data, including authentication and authorization.

Utilized Apache Maven for project build automation, managing dependencies, and creating executable JAR files.

Performed code reviews and collaborated with the Quality Assurance (QA) team to identify and fix bugs, ensuring high-quality software.

Created and maintained documentation for the codebase, APIs, and system architecture, enhancing team collaboration and knowledge sharing.

Utilized JUnit and Mockito for unit testing, ensuring the reliability and correctness of the codebase.

Investigated and resolved production issues, providing timely support and troubleshooting using logs and monitoring tools.

Optimized application performance using profiling tools like VisualVM and YourKit, identifying bottlenecks and memory leaks.

Deployed applications on Apache Tomcat, configuring the server for optimal performance and scalability.

Collaborated with system administrators to manage the operating system servers, ensuring seamless application deployment and maintenance.

Implemented Continuous Integration (CI) to automate the build, test, and deployment process, increasing development efficiency.

Managed application configurations, enabling dynamic configuration updates.

Integrated and optimized third-party libraries and APIs, such as Apache POI for Excel file manipulation and Log4j for logging.

Worked with Java Message Service (JMS) for asynchronous messaging and ensured reliability using Apache ActiveMQ.

Environment: Hibernate, Oracle Database, Java Server Pages, HTML/CSS, Spring Security, Apache Maven, JUnit, Mockito and Apache Tomcat.

EDUCATION AND PROFESSIONAL DEVELOPMENT

B.Sc in computer Science, JNTUH University, India. (2013)

Contact this candidate