Senior Data Scientist/Data Analyst

Location:

Lowell, MA

Posted:

January 05, 2025

Contact this candidate

Resume:

Mahendra Vardhan Amilineni

Sr. Data Scientist /Data Analyst

Email: **************@*****.*** Ph No: 978-***-****

LinkedIn: linkedin.com/in/mahendra-vardhan-amilineni

PROFESSIONAL SUMMARY:

Highly efficient Data Scientist/Data Analyst with 9+ years of experience in data analysis, machine learning, data mining, and predictive modeling. Skilled in handling large structured and unstructured datasets.

Proficient in R, Python, and Big Data technologies like Hadoop, Hive, Spark, and PySpark. Expertise in statistical modeling, dimensionality reduction, and data visualization using tools like Tableau, Power BI, and Qlik.

Hands-on experience in text analytics, NLP, web scraping, and advanced machine learning models like Random Forest, SVM, and neural networks.

Extensive experience in leveraging AWS cloud services for building, deploying, and managing scalable machine learning models and data pipelines.

Proficient in utilizing AWS SageMaker for model training, tuning, and deployment, enhancing predictive analytics for business applications.

Hands-on experience in designing and implementing end-to-end data pipelines using AWS services such as S3, AWS Lambda, AWS Redshift, and AWS Glue.

Highly proficient in Python and R, leveraging them for advanced data analysis, modeling, and visualization to drive data-driven decision-making.

Expertise in using AWS EC2 for distributed computing, running big data frameworks like Apache Spark and Hadoop for large-scale data processing.

Skilled in utilizing AWS Athena and AWS Glue for querying and transforming large datasets, enabling seamless integration of structured and unstructured data.

Expertise in utilizing Azure Machine Learning (AML) for building, training, and deploying machine learning models at scale, leveraging Azure's cloud-based data science tools for end-to-end solutions.

Proficient in Azure Databricks for collaborative data science, processing large datasets, and running distributed analytics using Apache Spark, enabling faster and scalable model training.

Hands-on experience with Azure Synapse Analytics to perform data integration, data transformation, and analytics, driving data-driven decision-making and insights.

Expertise in SQL for database querying and management, optimizing data retrieval processes for large datasets in relational and NoSQL databases.

Strong background in machine learning algorithms, including supervised and unsupervised learning, to solve complex business problems.

Advanced experience with deep learning frameworks such as TensorFlow and Keras and building and optimizing neural networks for predictive analytics.

Proven ability to perform statistical analysis and hypothesis testing to uncover insights and trends in large-scale datasets.

Hands-on experience with big data technologies such as Hadoop and Spark, ensuring efficient data processing and storage at scale.

Skilled in data cleaning and preprocessing techniques, ensuring high-quality input for machine learning models and analytics.

Skilled in leveraging Azure SQL Database, Azure Cosmos DB, Azure Data Lake for efficient data storage, management, and retrieval, supporting large-scale data science workflows.

Expertise in designing and implementing end-to-end data pipelines using Azure Data Factory, automating ETL workflows to streamline data extraction, transformation, and loading processes.

Expertise in data visualization tools and techniques, using libraries like Matplotlib and Seaborn to present complex data insights clearly.

Proficient in using NoSQL databases such as MongoDB and Cassandra for handling unstructured data and ensuring fast data retrieval.

Expertise in building and deploying machine learning models, with a focus on optimization for performance and scalability.

Experienced in Natural Language Processing (NLP) and knowledge of A/B testing and experimental design, implementing these techniques to validate model effectiveness and business impact.

Familiar with advanced machine learning techniques, including reinforcement learning and ensemble methods, to improve model accuracy.

Proficient in developing predictive models and performing risk analysis using advanced machine learning and statistical techniques.

Hands-on experience with Scikit-learn and other Python libraries to implement and tune machine learning models for real-world applications.

Experienced in designing analytical models, transforming business requirements, and building scalable reporting solutions.

Skilled in Tableau dashboards, R-Shiny, ggplot2, and Flask for stunning visualizations.

Strong knowledge of data warehousing, data engineering, feature engineering, and database querying with SQL, Oracle, and DB2.

Innovative problem-solver with extensive industry knowledge and excellent communication and interpersonal skills.

TECHNICAL SKILLS:

Programming Languages

Python, R, SQL, .NET, Java, T-SQL, PyTorch

Natural Language Processing (NLP)

NLP Models, Sentiment Analysis, Tokenization, SpaCy, NLTK, Topic Modeling, Text Classification, Text Preprocessing

Data Analysis & Modeling

Predictive Analytics, Risk Analysis, A/B Testing, Statistical Analysis, Regression Models, Time Series Forecasting, Statistical Modeling, Regression Analysis

AI & Machine Learning

Machine Learning Models, Optimization, Scikit-learn, TensorFlow, Keras, XGBoost, Random Forests, SVM, KNN, Deep Learning, Natural Language Processing, Computer Vision Models, Neural Networks

Data Engineering

ETL, Data Pipelines, Azure Data Factory, Spark, Databricks, Hadoop, Data Lake, Data Warehouse, NoSQL, SSIS, Airflow

Cloud Technologies

AWS, Azure

Data Visualization

Tableau, Power BI, Matplotlib, ggplot2

Big Data Tools

Hadoop, Spark, Apache Kafka, Azure HDInsight, AWS Glue, AWS Lambda

Database Management

SQL, Oracle, DB2, PostgreSQL, MongoDB, Data Mapping, ETL Transformations

Deep Learning

Neural Networks, CNN, RNN, TensorFlow, Keras, Deep Learning Models

Data Quality & Validation

Data Quality Validation, Missing Values, Outliers, Data Cleaning, Feature Engineering, Data Integrity, Scurm

DevOps & Automation

Git, Jenkins, CI/CD, Azure DevOps, Docker, Kubernetes, Automation using Python, Jupyter Notebooks, Monitoring, Jira, redshift

Data Integration

API Integration, RESTful Web Services, AWS API Gateway, SOAP, AWS CloudFormation, Snowflake

Version Control

Git, SVN, TFS, GITLAB

PROFESSIONAL EXPERIENCE:

Client: Morgan Stanley, New York, NY. Feb 2023 – Present

Role: Sr. Data Scientist /Data Analyst

Responsibilities:

Championed the design & execution of machine learning projects to address specific business problems determined by consultation with business partners.

Exercised Machine Learning Algorithms such as linear regression, SVM, Multivariate Regression, Fuzzy Logic, Naive Bayes, Random Forests, K-means, & KNN for data analysis.

Created ETL packages to transform data into the right format and join tables together to get all features required by using SSIS.

Cleaned and preprocessed large datasets, handling missing values, outliers, and noise to ensure high-quality data for analysis and modeling.

Led time series analysis projects, leveraging statistical models and machine learning techniques to forecast demand and optimize resource allocation.

Designed and implement scalable data pipelines in Azure Data Factory, ensuring seamless data ingestion, transformation, and integration across diverse data sources.

Leverage Azure Databricks for big data processing, utilizing Apache Spark for data transformation and machine learning model development.

Optimized cloud data storage solutions using Azure Data Lake and Azure Blob Storage, ensuring efficient data management and retrieval for analytical purposes.

Developed and deployed machine learning models using Azure Machine Learning, integrating with Azure Databricks and Azure Synapse Analytics for end-to-end model lifecycle management.

Developed machine learning models to predict customer behavior, utilizing algorithms such as random forests, XGBoost, and support vector machines.

Developed NLP pipelines for text preprocessing, tokenization, stemming, and lemmatization to prepare unstructured data for analysis.

Led the design and implementation of deep learning models for image classification and text analysis, leveraging TensorFlow and Keras.

Built and optimized data pipelines using Python, SQL, and big data tools such as Hadoop and Spark, ensuring efficient data processing and storage.

Created and maintained interactive dashboards in Power BI, visualizing complex data insights to support business decision-making processes.

Applied statistical methods to identify trends and patterns in large datasets, supporting actionable insights for stakeholders across various departments.

Utilized Natural Language Processing (NLP) techniques for sentiment analysis and text summarization, improving customer experience through insights derived from unstructured data.

Collaborated with cross-functional teams to implement predictive models that enhanced sales forecasting, inventory management, and fraud detection.

Predominant practice of Python Matplotlib package and Power BI to visualize and graphically analyses the data. Data pre-processing, Splitting the identified data set into set and Test set using other libraries in python.

Predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on to Azure and Django platform for the company's core business.

Worked with data-sets of varying degrees of size and complexity including both structured and unstructured data. Piping and processing massive data-streams in distributed computing environments such as Hadoop to facilitate analysis (ETL).

Superintended usage of Python NumPy, SciPy, Pandas, Matplot, Stats packages to perform dataset manipulation, data mapping, data cleansing and feature engineering . Built and analyzed datasets using R and Python.

Used Data Quality validation techniques to validate Critical Data elements (CDE) and identified many anomalies. Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel, R and Python.

Enforced model Validation using test and Validation sets via K- fold cross validation, statistical significance testing

Multi-layers Neural Networks built in Python Scikit-learn, Theano, TensorFlow and ketas packages to implement machine learning models.

Extracting the data from Azure Data Lake into HDInsight Cluster and applying spark transformations & Actions and loading into HDFS.

Environment: Machine Learning Algorithms, Linear Regression, SVM, Multivariate Regression, Fuzzy Logic, Naive Bayes, Random Forests, K-means, KNN, SSIS, Data Cleaning and Preprocessing, Time Series Analysis, Statistical Modeling, Azure, XGBoost, NNLP, TensorFlow, Keras, Python, SQL, Hadoop, Power BI, Data Visualization, Python Matplotlib, Django, R, HDFS.

Client: Ford Motor Company, Dearborn, Michigan. Aug 2019 – Jan 2023

Role: Data Scientist /Data Analyst

Responsibilities:

Predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on to AWS such as S3, EC2 and Django platform for the company's core business.

Developed a prototype pipeline in Spark to pre-process data and to make user recommendations based on an ensemble of machine learning models.

Implemented NLP models for sentiment analysis, topic modeling, and text classification using libraries such as spaCy and NLTK.

Utilized AWS services such as EC2, S3, Lambda, and RDS for building, training, and deploying machine learning models at scale, optimizing data processing workflows and performance.

Used AWS Kinesis and SNS to build real-time data streaming applications, enabling continuous data ingestion and processing for advanced analytics.

Implemented AWS IAM roles and policies to ensure secure, role-based access control across cloud resources, maintaining best practices in data security and governance.

Designed and developed Natural Language Processing models for sentiment analysis.

Led the migration of on-premise machine learning models to AWS cloud infrastructure, significantly improving scalability and reducing operational overhead. Designed and implemented scalable data pipelines using AWS Glue and AWS Lambda for ETL processes, enabling seamless data transformation and ingestion into data lakes and warehouses.

Identified and resolved data inconsistencies by handling missing values, duplicates, and outliers to ensure high-quality datasets.

Conducted time series forecasting using ARIMA, Prophet, and LSTM models to predict business trends.

Applied seasonal decomposition techniques to extract insights from historical data for resource optimization.

Conducted qualitative and quantitative research to gather data from data mart and responsible for data identification, collection, exploration & cleaning for modelling, participate in model development Visualize, interpret, report findings and develop strategic uses of data.

Developed and deployed machine learning models for predictive analytics, enhancing decision-making capabilities and operational efficiency.

Utilized Python and R for data analysis, model development, and automation of data workflows, ensuring scalability and reproducibility.

Conducted advanced statistical analysis to interpret complex datasets, driving actionable insights and improving business strategies.

Led the design and implementation of end-to-end data pipelines, integrating Big Data tools such as Hadoop and Spark to handle large datasets effectively.

Created interactive visualizations in Power BI and other tools, translating complex data into clear and actionable business insights for stakeholders.

Applied Natural Language Processing (NLP) techniques to analyze text data, extracting meaningful insights for customer sentiment analysis and automation.

Managed and cleaned large datasets, implementing data preprocessing techniques to ensure quality and integrity for downstream analysis and model training.

Built and deployed predictive models for customer churn and sales forecasting using machine learning algorithms.

Understand transaction data and develop Analytics insights using Statistical models using Machine learning.

R programming language for graphically critiquing the data and performed data mining. Interpreting Business requirements, data mapping specifications and visualized data as per the business requirements using R shiny.

Environment: Predictive Analytics, Machine Learning, AWS S3, AWS EC2, AWS Lambda, AWS RDS, AWS Glue, AWS Kinesis, AWS SNS, Django, Spark, NLP, spaCy, NLTK, ARIMA, Prophet, LSTM, Time Series Forecasting, AWS IAM, Data Pipeline, Hadoop, Python, R, Power BI, R Shiny, Data Cleansing, Data Mining, Statistical Analysis, Data Visualization, Customer Churn Prediction, Sales Forecasting.

Client: Max Life Insurance Company Limited, Gurgaon, India. Oct 2017 – Jun 2019

Role: Data Analyst

Responsibilities:

Implemented Agile Methodology for building an internal application.

Worked on development of data warehouse, Data Lake and ETL systems using relational and non-relational tools such as SQL, NoSQL.

Utilized Azure Data Factory and Databricks to design and manage ETL workflows for seamless data integration and transformation, optimizing data pipeline performance.

Analyzed large datasets using SQL, SSIS, and Azure SQL Database, providing actionable insights for business intelligence and decision-making.

Developed dynamic dashboards and reports with Tableau delivering real-time visualizations and key performance indicators to stakeholders.

Automated data analysis and reporting tasks using Python, Jupyter Notebook, and Excel VBA, improving operational efficiency and reducing manual effort.

Designed and maintained scalable data storage solutions in Azure Synapse, Azure Blob Storage, and Azure Data Lake, ensuring high availability and data integrity.

Applied ggplot2, Matplotlib, and R Markdown for data visualization and statistical analysis, communicating complex results in a clear and concise manner.

Created a report by using Tableau to show client how prediction can help the business.

Executed data analysis and modeling using SAS, IBM SPSS, and Hadoop, Leveraged Unix Shell Scripting and RESTful APIs to streamline data extraction processes and integrate third-party systems into the data pipeline.

Conducted thorough data analysis and reporting using SQL Server, PostgreSQL, Teradata SQL, and Toad, ensuring data quality and consistency across platforms.

Collaborated with cross-functional teams using tools such as JIRA and Microsoft Visio, translating business requirements into data-driven solutions and visual workflow diagrams.

Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, Powerbase and Smart View.

Programmed a utility in Python that used multiple packages such as scipy, numpy, pandas.

Environment: Agile, Hadoop, Data Warehouse, Data Lake, ETL Systems, SQL, NoSQL, SSIS, Azure SQL Database, Tableau, Python, Jupyter Notebook, Excel VBA, R Markdown, SAS, IBM SPSS, Hadoop, Unix Shell Scripting, RESTful APIs, SQL Server, PostgreSQL, Teradata SQL, Toad, JIRA, Microsoft Visio, Nexus, Business Objects, Powerbase, Smart View, Scipy, Numpy, Pandas.

Client: Max Healthcare, New Delhi, India. Jul 2015 – Sep 2017

Role: Data Analyst

Responsibilities:

Designed and implemented data mappings and transformations for ETL pipelines using AWS Glue, AWS Lambda, and Spark JDBC to integrate and prepare data for analytical purposes.

Developed and maintained data dictionaries to ensure proper documentation and standardization of data across various business units.

Leveraged AWS S3 and AWS CloudFormation to manage and automate the deployment of data storage solutions, ensuring scalability and reliability in cloud environments.

Utilized Python and Jupyter Notebooks for advanced data analysis, scripting, and automation of data processing tasks across multiple data sources such as PostgreSQL.

Implemented data validation and integrity checks to ensure the quality and accuracy of the data used in reporting and decision-making processes.

Applied clustering algorithms i.e. Hierarchical, K-means with help of Scikit and Scipy.

Developed visualizations and dashboards using Tableau.

Conducted data governance activities, including the definition and enforcement of data standards and policies, to ensure compliance and consistency across the organization.

Developed and optimized complex SQL queries for data extraction, reporting, and analysis across relational and NoSQL databases such as SQL Server.

Collaborated with business stakeholders to define data requirements, deliver actionable insights, and create automated reports using tools such as MicroStrategy and MS Access.

Built and analyzed datasets using R, SAS, Matlab and Python.

Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.

Environment: AWS Glue, AWS Lambda, Spark JDBC, ETL Pipelines, AWS S3, AWS CloudFormation, Python, Jupyter Notebooks, PostgreSQL, Data Validation, Scikit-learn, SciPy, Tableau, Data Governance, SQL, MicroStrategy, MS Access, R, SAS, MATLAB, Data Mining, Data Cleaning, Model Development, Gap Analysis.

EDUCATION: Jawaharlal Nehru Technology University, Hyderabad, TS, India

BTech in Computer Science and Engineering, June 2011 - May 2015

Contact this candidate