Mahender
Data Scientist
Email: *************@*****.***
Ph #: 330-***-****
PROFESSIONAL SUMMARY:
Data Scientist & ML Engineer with 7+ years of experience delivering AI-powered solutions across banking, retail, and eCommerce domains.
Expert in Machine Learning, Deep Learning, and Generative AI (LLMs) using Python, TensorFlow, PyTorch, Hugging Face, and OpenAI GPT APIs.
Hands on experience in SDLC (Software Development Life Cycle) like gathering requirements, designing, developing, implementing and testing projects before moving to production.
Experienced in building production-level Machine Learning models, with a strong background in Statistics and Software programming.
Skilled in end-to-end ML lifecycle from data engineering with Spark, Databricks, and AWS SageMaker to deployment via Docker, Kubernetes, and CI/CD pipelines.
Experience in Performance tuning and Query Optimization, Data Transformation Services, and Database Security.
Experience in Visualized data using different visualization tools Azure Machine Learning, Python and Power BI.
Hands on experience manipulating data using Python and SAS.
Extensive experience in Text Analytics, generating Data Visualization using Python creating dashboards using tools like Tableau.
Experience in designing and deploying supervised and unsupervised ML models using scikit-learn, XGBoost, LightGBM, and CatBoost.
Built deep learning models for NLP, time-series, and image tasks using TensorFlow, Keras, and PyTorch.
Specialized in NLP with spaCy, NLTK, Hugging Face Transformers, and GPT for classification and summarization.
Strong experience on Base SAS, SAS/Stat, SAS/Access, SAS/Graphs and SAS/Macros, SAS/ODS and SAS/SQL in Windows Environment.
Proficient in programming languages including Python, R, and SQL, with extensive experience in data preprocessing, feature engineering, and data wrangling for clean and structured data.
Skilled in building and deploying predictive models, utilizing algorithms like Random Forest, Gradient Boosting Machines, SVM, KNN, and Neural Networks to address business challenges and improve decision-making processes.
Expertise in data visualization using tools like Tableau, Power BI, Matplotlib, Seaborn, and Plotly to communicate findings and insights through interactive dashboards and reports and strong understanding of statistical analysis and hypothesis testing, with proficiency in A/B testing, regression analysis, time-series forecasting.
Hands-on experience in working with NoSQL databases like MongoDB and Cassandra, and relational databases such as MySQL, PostgreSQL, and Oracle to manage and query structured and unstructured data.Familiarity with Docker and Kubernetes for containerization and deployment of data science models and workflows.
Built visualizations to convey technical research findings, and to ensure alignment of data science ideas with client use cases.
Developed recommendation engine using TensorFlow and PyTorch, integrated with LangChain for LLM-based personalized product search.
Working Experience on Normalization and De-Normalization techniques for both OLTP and OLAP systems with strong Knowledge of Relational and Multidimensional Modeling concepts.
Working Experience on on Modeling and full scale Implementation of both relational and Multidimensional Data warehouse.
Proven ability to build and manage data lakes using technologies like AWS S3, Azure Blob Storage, and Google Cloud Storage to store vast amounts of structured and unstructured data for analysis.
Experienced T-SQL programming (DDL, DML, and DCL) skills like creating Stored Procedures, User Defined Functions, Constraints, Querying, Joins, Keys, Indexes, Data Import/Export, Triggers, Tables, Views and Cursors
Evaluated & reviewed test plan against functional requirements and design documents policies, procedures, and regulatory requirements.
Demonstrated leadership abilities and team work skills as well as the ability to accomplish tasks under minimal direction and supervision.
Ability to deliver highly complex technical information in terms and concepts that the end users or management can readily grasp or understand.
Good working knowledge of major Operating Systems and tested applications on Windows and Linux/UNIX environments.
Ability to handle multiple tasks and work independently as well as in a team.
Good team player with excellent written and verbal communication and interpersonal skills.
TECHNICAL SKILLS:
Languages
Python,C#, SQL, Machine Learning,R,NumPy,Pandas,Scikit-learn, TensorFlow, PyTorch, XGBoost, LightGBM
Machine Learning Algorithms
Linear Regression, Logistic Regression, SVM, Random Forest, Gradient Boosting, KNN,
Deep Learning
TensorFlow, Kera’s, PyTorch, Theano, Caffe
Natural Language Processing (NLP)
spaCy, NLTK, Text Blob, Genism, Transformers, BERT
Data Visualization
Matplotlib, Seaborn, Plotly, Tableau, Power BI, ggplot2
Data Warehousing
Amazon Redshift, Google Big Query, Snowflake, Teradata
OLAP Tools
OLAP Suite, Business Objects
ETL Tools
Informatica, SSIS, DataStage
Databases
Oracle, MS SQL Server, MS Access, DB2
Big Data
Hbase, Hive, MapReduce, MongoDB
Operating Systems
Windows, Unix, Linux
PROFESSIONAL EXPERIENCE:
Client: PNC Bank, Pittsburgh, PA Aug 2023 – Till Date
Role: Sr. Data Scientist
Responsibilities:
Extracted and analyzed large volumes of financial data using SQL, Python, and R to improve customer targeting and transaction analysis.
Built predictive models (credit scoring, fraud detection) using scikit-learn, TensorFlow, and XGBoost, enhancing risk assessment and customer segmentation.
Developed dashboards and reports using Tableau, Power BI, and Looker for executive visibility into loan performance, customer churn, and revenue trends and utilized Hadoop, Apache Spark, and Hive for distributed processing of banking transactions, customer history, and ATM network data and integrated data warehousing solutions using Snowflake, Big Query, and Amazon Redshift to support scalable and efficient financial reporting.
Applied NLP with spaCy, NLTK, and Transformers to analyse call centre transcripts and customer complaints for sentiment classification and service improvements.
Processed and stored semi-structured data using MongoDB, Cassandra, and indexed datasets with Elasticsearch for high-performance retrieval and implemented Text Blob, Genism, and BERT for financial news sentiment analysis and customer feedback mining to inform strategic decisions and used Git, GitHub, and Bitbucket for version control and implemented CI/CD with Jenkins, and GitLab CI/CD for seamless deployment of analytics pipelines.
Conducted financial time series forecasting and anomaly detection using Prophet, ARIMA, and LSTM models to support investment and treasury operations.
Designed and maintained scalable data pipelines using Apache Airflow, NiFi, and AWS Glue to automate ETL workflows across multiple banking systems and implemented secure API integrations with internal and external systems using Flask, Fast API, and OAuth 2.0 for seamless data exchange and service interoperability.
Deployed containerized data science models using Docker, orchestrated with Kubernetes, for real-time fraud detection and credit scoring and Utilized Kafka and AWS Kinesis for real-time streaming of transactions and customer activity logs to enhance fraud monitoring and dynamic risk scoring.
Conducted graph-based analysis using Neo4j and GraphQL to detect fraudulent account networks and complex customer relationships and developed secure data access layers with JWT, LDAP, and Keycloak to ensure regulatory compliance and role-based control in sensitive banking systems.
Automated data processing tasks using Domo’s Magic ETL and Excel macros, streamlining reporting workflows.
Designed ETL processes in Hadoop to manage and analyze large datasets, ensuring scalability and reliability.
Documented workflows and processes in Confluence to maintain data transparency and accessibility.Created weekly executive summaries and PowerPoint reports for senior leadership to support strategic business reviews
Environment: SQL, Python, R, TensorFlow, PyTorch, scikit-learn, Hadoop, Spark, Hive, Tableau, Power BI, Matplotlib, Snowflake, Redshift, Big Query, spaCy, NLTK, ArcGIS, MongoDB, Cassandra, Elasticsearch, NoSQL databases, Jenkins, GitLab CI/CD, LSTM, Apache Airflow, NiFi, AWS Glue, Fast API, OAuth 2.0, Docker, Kubernetes, AWS Kinesis, Neo4j, GraphQL, JWT, LDAP, Keycloak
Client: Optum – Eden Prairie, MN Aug 2021– Dec 2022
Role: Data Scientist
Responsibilities:
Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest.
A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB and Hadoop.
Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms.
Installed and used Caffe Deep Learning Framework
Extensively used open source tools - Spyder (Python) for statistical analysis and building machine learning algorithms.
Applied various machine learning algorithms and statistical modeling like decision tree, logistic regression, Gradient Boosting Machine to build predictive model using scikit-learn package in Python.
Generated data analysis reports using Matplotlib successfully delivered and presented the results for C-level decision makers.
Used Tableau and Python for programming for improvement of model.
Translate business needs into mathematical model and algorithms and build exceptional machine learning and Natural Language Processing (NLP) algorithms (using Python, java and NLP modules).
Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, Power BI and Smart View.
Implemented Agile Methodology for building an internal application.
As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
Programmed a utility in Python that used multiple packages (scipy, numpy, pandas).
Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
Data transformation from various resources, data organization, features extraction from raw and stored.
Validated the machine learning classifiers using ROC Curves and Lift Charts.
Extracted data from HDFS and prepared data for exploratory analysis using data munging.
Environment: ER Studio, AWS, MDM, GIT, Unix, Python, MLLib, SAS, Regression, Logistic Regression, Hadoop, OLTP, OLAP, HDFS, NLTK, SVM, JSON, XML.
Client: Prologging-in It Private limited India May 2019 – Jul 2021
Role: Data Scientist
Responsibilities:
Involved in gathering, analyzing and translating business requirements into analytic approaches.
Worked with Machine learning algorithms like Neural network models, Linear Regressions (linear, logistic etc.), SVM's, Decision trees for classification of groups and analyzing most significant variables.
And utilized SAS for developing Pareto Chart for identifying highly impacting categories in modules to find the work force distribution and created various data visualization charts.
Converted raw data to processed data by merging, finding outliers, errors, trends, missing values and distributions in the data.
Performed data analysis, visualization, feature extraction, feature selection, feature engineering using Python.
Generated detailed report after validating the graphs using Python and adjusting the variables to fit the model.
Worked on Clustering and factor analysis for classification of data using machine learning algorithms.
Used Power Map and Power View to represent data very effectively to explain and understand technical and non-technical users.
Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase - Hive Integration.
Used TensorFlow, Keras, Theano, Pandas, NumPy, SciPy, Scikit-learn, NLTK in Python for developing various machine learning algorithms such as Neural network models, Linear Regression, multivariate regression, naïve Bayes, random Forests, decision trees, SVMs, K-means and KNN for data analysis.
Responsible for developing data pipeline with AWS S3 to extract the data and store in HDFS and deploy implemented all machine learning models.
Worked on Business forecasting, segmentation analysis and Data mining and prepared management reports defining the problem; documenting the analysis and recommending courses of action to determine the best outcomes.
Created SQL tables with referential integrity and developed advanced queries using stored procedures and functions using SQL server management studio.
Worked with risk analysis, root cause analysis, cluster analysis, correlation and optimization and K-means algorithm for clustering data into groups.
Coordinated with data scientists and senior technical staff to identify client's needs.
Environment: Python, Jupyter, MATLAB, SSRS, SSIS, SSAS, MongoDB, HBase, HDFS, Hive, Pig, SAS, Power Query, Power Pivot, Power Map, Power View, SQL Server, MS Access.
Company: Ebix, Hyderabad, India Aug 2017– Apr 2019
Role: Data Analyst
Responsibilities:
Responsible for gathering data migration requirements.
Identified problematic areas and conduct research to determine the best course of action to correct the data.
Analyzed reports of data duplicates or other errors to provide ongoing appropriate inter-departmental communication and monthly or daily data reports.
Monitor for timely and accurate completion of select data elements
Identify, analyze, and interpret trends or patterns in complex data sets
Monitor data dictionary statistics
Developed logical and physical data models that capture current state/future state data elements and data flows using Erwin.
Extensively used Erwin tool in Forward and reverse engineering, following the Corporate Standards in Naming Conventions, using Conformed dimensions whenever possible
Involved in data mapping and data clean up.
Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
Enhance smooth transition from legacy to newer system, through change management process.
Archived the old data by converting them in to SAS data sets and flat files.
Analyzed and maintained SAS programs and macros to generate SAS datasets.
Created Technical Design Documents, Unit Test Cases.
Involved in Test case/data preparation, execution and verification of the test results
Reviewed PL/SQL migration scripts.
Coded PL/SQL packages to perform Application Security and batch job scheduling.
Created user guidance documentations.
Created reconciliation report for validating migrated data.
Analyzed problem and solved issues with current and planned systems as they relate to the integration and management of order data.
Planned project activities for the team based on project timelines using Work Breakdown Structure.
Environment: UNIX, Shell Scripting, XML Files, XSD, XML Spy 2010, SAS PROC SQL, Cognos, Oracle 10g,Teradata, Sybase, Mercury Quality Center,, Toad, Autosys.
References will be provided upon request.