Post Job Free
Sign in

Senior Data Scientist - ML - NLP - Data Architect

Location:
Morrow, GA, 30260
Posted:
April 23, 2026

Contact this candidate

Resume:

Chanikya Marichetty

Data Scientist / Data Analyst

E-Mail: ******************@*****.***

Phone: +1-502-***-****

LinkedIn: https://www.linkedin.com/in/chanikya-m-95a5083b8/

Professional Summary:

Data Scientist with 4+ years of experience in machine learning, NLP, and enterprise data solutions, with growing expertise in AI-driven systems and emerging technologies. Experienced in designing scalable data pipelines, developing predictive models, and collaborating with cross-functional teams to deliver business-driven insights. Strong interest in enterprise AI architecture, AI agents, and next-generation platforms such as Microsoft Copilot, with a focus on aligning AI capabilities to business strategy and operational efficiency.

Experienced in analytics models such as Decision Trees, Linear & Logistic Regression, Hadoop (Hive, PIG), R, Python, Spark, Scala, MS Excel, SQL and PostgreSQL, Erwin.

Strong knowledge in all phases of the SDLC (Software Development Life Cycle) – Agile/ Scrum from analysis, design, development, testing, implementation and maintenance.

Strong leadership in the fields of Data Cleansing, Web Scraping, Data Manipulation, Predictive Modeling with R and Python, and Power BI & Tableau for data visualization.

Experienced in Data Modeling techniques employing Data Warehousing concepts like Star/ Snowflake schema and Extended Star.

Hands-on experience in the entire Data Science project life cycle including data extraction, data cleaning, statistical modeling and data visualization with large data sets of structured and unstructured data, and created ER diagrams and schema.

Excellent knowledge of Artificial Intelligence/ Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of Big Data ecosystem.

Expertise in Data Analysis, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import/ Export through the use of ETL tools such as Informatica Power Center.

Experience working in AWS environment using S3, Athena, Lambda, AWS SageMaker, AWS Lex, AWS Aurora, Quick Sight, Cloud formation, Cloud Watch, IAM, Glacier, EC2, EMR, Rekognition and API Gateway.

Proficient in Artificial Intelligence/ Machine Learning, Data/ Text Mining, Statistical Analysis & Predictive Modeling.

Good Knowledge and experience in deep learning algorithms such as Artificial Neural network (ANN), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN), LSTM and RNN based speech recognition using TensorFlow.

Expertise in using AWS S3 to stage data and to support data transfer and data archival. Experience in using

Excellent knowledge and experience in OLTP/ OLAP System Study with focus on Oracle Hyperion Suite of technology, developing Database Schemas like Star schema and Snowflake schema (Fact Tables, Dimension Tables) used in relational, dimensional and multidimensional modeling, physical and logical Data Modeling using Erwin tool.

Created functions and assigned roles in AWS Lambda to run python scripts, and AWS Lambda using Python to perform event driven processing.

Proficient in Python Scripting and worked in stats function with NumPy, visualization using Matplotlib and Pandas for organizing data.

Strong understanding of Deep learning using CNN, RNN, ANN, Reinforcement learning, Transfer Learning and performing data augmentation using Generative Adversarial Networks (GANs).

High level analytical thinking by extensively leveraging statistical techniques, such as T-test, P-value analysis, z-score analysis, ANOVA, Confidence Interval, Confusion Matrix, Precision, Recall, ROC / AUC curve analysis, etc.

Good Knowledge on Natural Language Processing (NLP) and Time Series Analysis and Forecasting using ARIMA model in Python and R.

Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR.

Strong programming experience in MATLAB and Python visioning libraries

Experienced in the use of active dashboards and reports that are both visually appealing and functional, such as Python Matplotlib, R Shiny, Power BI, and Tableau.

Proficiency with creating, publishing, and customizing Tableau dashboards and dynamic reports with user filters.

Experience in Dimensional Modeling, ER Modeling, Star Schema/ Snowflake Schema, FACT and Dimensional tables and Operational Data Store (ODS).

Extensive knowledge in working with Amazon EC2 to provide a solution for computing, query processing, and storage across a wide range of applications.

Proficient in data visualization tools such as Tableau, Power BI, Python Matplotlib, R Shiny to create visually powerful and actionable interactive reports and dashboards.

Experience in building models with deep learning frameworks like TensorFlow, PyTorch, and Keras

Expertise in building, publishing customized interactive reports and dashboards with customized parameters and user-filters using Tableau.

Familiar on Deep learning projects for image identification CNN, RNN for stock price prediction auto encoders for Movie Recommender System (PyTorch), Image captioning (CNN-RNN auto encoder architecture).

Designed and implemented scalable NLP pipelines and deep learning models (LSTM, RNN) to support enterprise-level text analytics use cases.

Architected cloud-based ML solutions using AWS (S3, EC2, SageMaker) for scalable data processing and model deployment.

Proficient in Python, experience building, and productionizing end-to-end systems.

Strong programming expertise (preferably in Python) and strong in Database SQL.

Solid coding and engineering skills preferably in Artificial Intelligence/ Machine Learning.

Exposure to python and python packages.

Experience in developing various solution driven views and dashboards by developing different chart types including Pie Charts, Bar Charts, Tree Maps, Circle Views, Line Charts, Area Charts, and Scatter Plots in Power BI.

Be a valued contributor in shaping the future of our products and services.

Successfully working in fast-paced multitasking environment both independently and in a collaborative team.

Excellent communication skills needed for swift implementation of data science and data analytic projects.

Collaborated with cross-functional teams to identify AI/ML use cases aligned with business objectives

Contributed to solution design discussions for scalable AI systems

Participated in evaluating emerging AI technologies for enterprise adoption

Built data-driven analytics pipelines to monitor system performance and optimize processing workflows, similar to high-throughput engineering environments

Designed scalable cloud-based data pipelines using AWS (S3, Lambda, SageMaker) to automate data processing and KPI tracking

Electronic Design Automation (EDA)

Workflow automation

Cloud-based processing pipelines

KPI monitoring systems

Process optimization frameworks

Interactive dashboards for engineering metrics

data systems

cloud pipelines

analytics tools

Built collaborative filtering recommendation system using implicit feedback

Developed ranking model for personalized content delivery using embeddings

Built scalable pipelines using PySpark on Databricks handling X TB data

Implemented MLflow for experiment tracking and model versioning

Deployed models with monitoring for drift detection

Built LSTM model improving classification accuracy by 22%, reducing manual ticket triage effort by 35%

Implemented semantic search using sentence transformers

Used vector similarity for document retrieval

Marketing Analytics & Causal Modeling

Marketing Mix Modeling (MMM)

Media Spend Optimization

ROI / CAC / LTV Analysis

Attribution Modeling (First-touch, Last-touch, Multi-touch)

Incrementality Testing

Causal Inference

Time Series Forecasting for Marketing Spend

AI & Enterprise Technologies

Microsoft 365 Copilot (conceptual understanding / exploration)

AI Agents & Agent-based architectures

Enterprise AI solution design

AI adoption strategy & rollout planning

Data integration & knowledge systems

AI governance, security, and compliance

Healthcare & Domain Expertise

Healthcare analytics and predictive modeling

Claims and patient data analysis

EMR/EHR data processing

Payer and health plan analytics

Clinical outcome modeling

Tooling / Platform Development

Developed internal tools

Built interactive analytics platforms

Automated workflows

Generative AI + RAG

Built RAG-based semantic search using FAISS + transformer embeddings

Fine-tuned BERT using Hugging Face for domain-specific NLP tasks

Technical Skills:

Languages

SQL, Python, JAVA, JavaScript, jQuery, ReactJS, Next.js, HTML, CSS, C, C++, Angular, R, Impala, Hive

Statistical Methods

Hypothetical Testing, ANOVA, Time Series, Confidence Intervals, Bayes Law, Principal Component Analysis (PCA), Dimensionality Reduction, Cross-Validation, Auto-correlation

Artificial Intelligence/ Machine Learning

Regression analysis, Bayesian Method, Decision Tree, Random Forests, Support Vector Machine, Neural Network, Sentiment Analysis, K-Means Clustering, KNN and Ensemble Method

R Package

dplyr, sqldf, data table, Random Forest, gbm, caret

Big Data

Hadoop, Spark

Python Packages

NumPy, Scipy, Pandas, Matplotlib, Seaborn, scikit-learn, Requests, urllib3, NLTK, Pillow, Pytest

Deep Learning

CNN, RNN, ANN, Reinforcement learning, Transfer Learning, TensorFlow, PyTorch, Keras

Python Framework

Django, Flask

Methodologies

SDLC – Agile/ Scrum, TDD, BDD

Databases

SQL, MYSQL, MongoDB, Oracle

Cloud

AWS - EMR, EC2, ENS, RDS, S3, Athena, Glue, Elastic search, Lambda, SQS, DynamoDB

BI/ Analysis Tools

SAS, Stata, Tableau, Power BI, Docker, Git, SAP, MS Office Suite, Anaconda, SSIS

Data Modelling

Snowflake, Star Schema

Reporting Tools

Tableau, Power BI

Operating Systems

Windows, Linux

Statistical Methods:

Multivariate Regression for MMM,Elasticity Modeling,Uplift Modeling Media Response Curves

Professional Experience:

AT&T Middletown, NJ Oct 2025 – Present

Data Scientist

Responsibilities:

Filtering out unwanted characters from a text ticketing data.

Data is mostly from a log entry for each and every ticket with respect to a problem.

Used NLTK, regular expressions, Keras and TensorFlow for preprocessing of the text.

Built and applied LSTM after cleaning the data.

Analyzed the performance with fine tuning techniques. Compared the LSTM Model with RNN, GRU, Word embeddings etc.

Plotted the performance metrics from each model.

Performed Data Cleaning, feature scaling, feature engineering using pandas and NumPy packages in python.

Used Hierarchy, association and focused in order to break down the issue at hand into main components/sub-components. Connect/group components and show the presumed associations between them and displayed the separated inputs and outputs and outcomes.

Used various packages in R like ggplot2, RStudio and caret in order to represent the cheat sheet on given data.

Coordinate/assist developers with establishing and applying appropriate branching, labeling/naming conventions using Subversion source control.

Coordinate with Release Management regarding appropriate system releases among other development platforms.

Provided guidance to development regarding effective microservice architectures.

Environment: Python, R, Machine learning, deep learning, NLP, Tableau, MapReduce, Tensor flow, Oracle.

Cigna, Dublin, OH. Jul 2024 to Oct 2025

Data Scientist

Responsibilities:

Involved in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch. Performed data imputation using Scikit-learn package in Python.

Built several predictive models using machine learning algorithms such as Logistic Regression, Linear Regression, Lasso Regression, K-Means, Decision Tree, Random Forest, Naïve Bayes, Social Network Analysis, Cluster Analysis, and Neural Networks, XGboost, KNN and SVM.

Building detection and classification models using Python, TensorFlow, Keras, and scikit-learn.

Used Amazon Web Services, AWS provisioning and good knowledge of AWS services like EC2, S3, Redshift, Glacier, Bamboo, API Gateway, ELB (Load Balancers), RDS, SNS, SWF and EBS.

Designed and developed Machine Learning technique (Classification, Regression, Clustering, Ensemble Learning, Neural Networks, and Predictions.)

Generated ad-hoc reports in Excel Power Pivot and shared them using Power BI to the decision makers for strategic planning.

Developed regressions models to predict the time of recovery of a patient diagnosed with a disease based on the previous disease report data using R Programming.

Implemented and tested the model on AWS EC2 and collaborated with development team to get the best algorithms and parameters.

Built multi-layers Neural Networks to implement Deep Learning by using Tensor flow and Keras.

Developed the required data warehouse model using Snowflake schema for the generalized model

Worked on processing the collected data using Python Pandas and Numpy packages for statistical analysis.

Used Cognitive Science in Artificial Intelligence/ Machine Learning for Neuro feedback training which is essential for intentional control of brain rhythms.

Worked on data cleaning and ensured Data Quality, consistency, integrity using Pandas, and Numpy.

Developed Star and Snowflake schemas based dimensional model to develop the data warehouse.

Used Numpy, Scipy, Pandas, NLTK (Natural Language Processing Toolkit), and Matplotlib to build models.

Involving in Text Analytics, generating data visualizations using Python and creating dashboards using tools like Power BI.

Performed Naïve Bayes, KNN, Logistic Regression, RandomForest, SVM and XGboost to identify whether a design will default or not.

Managed database design and implemented a comprehensive Snowflake Schema with shared dimensions.

Application of various Artificial Intelligence (AI)/ machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models.

Implemented Ensemble of Ridge, Lasso Regression and XGboost to predict the potential loan default loss.

Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks.

Involved in scheduling refresh of Power BI reports, hourly and on-demand.

Performed data mapping between source and target systems to support data migration initiatives.

Supported User Acceptance Testing (UAT) by validating data accuracy and tracking defects.

Documented business requirements, data mappings, and testing scenarios for data transformation projects.

Developed predictive models on healthcare datasets including patient records and claims-related data to improve clinical and operational decision-making

Analyzed longitudinal patient data to identify trends in disease progression and treatment outcomes

Collaborated with healthcare stakeholders to support payer-based analytics and health plan insights

Environment: SDLC, Python, Scikit-learn, Numpy, Scipy, Matplotlib, Pandas, AWS S3, Dynamo DB, AWS Lambda, AWS EC2, Sage Maker, NLTK, Lex, EMR, Redshift, Machine Learning, Deep Learning, Snowflake, OLAP, OLTP, Naive Bayes, Random Forest, K-means clustering, KNN, PCA, PySpark, XGBoost, Tensor flow, Keras, Power BI.

Tesco-Retail, India. May 2021 to Nov 2023

Data Analyst

Responsibilities:

Performed Data Analysis, Data Migration, and Data Preparation useful for Customer Segmentation and Profiling.

Implementing investigation calculations in Python. Pandas, Numpy, Seaborn, Scipy, Matplotlib, Scikit learn, and NLTK in Python

Implementing Data Warehousing and Data Modelling procedures to build ETL pipelines to extract and transform data across multiple sources.

Architected scalable algorithms using Python programming and capable of performing Data Mining, Predictive Modelling using all kinds of statistical algorithms as required.

Utilize ETL tooling to build, template, and rapidly deploy new pipelines for gathering and cleaning data

Developed Multivariate data validation scripts in Python for equity, derivative, currency and commodity related data, thereby improving efficiency of pipeline by 17%.

Used Predictive Analysis to develop and design of sample methodologies and analyzed data for pricing of client's products.

Involved in optimizing the ETL process of Alteryx to Snowflake.

Used Data visualization tools Such as Tableau, Advanced MS Excel (macros, index, conditional list, arrays, pivots, and lookups), Alteryx Designer, and Modeler.

Used Data Analytics, Data Automation and coordinated with custom representation instruments utilizing Python, Mahout, Hadoop and MongoDB.

Performed all necessary day-to-day GIT support for different projects, Responsible for design and maintenance of the GIT Repositories, and the access control strategies.

Fostered teamwork, communication, and collaboration while managing competing priorities of weekly, bi-weekly, monthly and quarterly priorities.

Worked extensively on ER/ Studio in several projects in both OLAP and OLTP applications.

Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy Oracle and SQL Server database systems.

Analyzed business requirements and upgraded function specification while conducting testing on multiple versions and resolving critical bugs to improve the functionality of the Learning Management System.

Built and Deployed a UI/ UX e-learning web application using jQuery, JavaScript, HTML, and NodeJS for various courses.

Cleaned and transformed the data using Python, developed dashboards and visual KPI reports using Tableau.

Involved in publishing of various kinds of live, interactive data visualizations, dashboards, reports and workbooks from Tableau Desktop to Tableau servers.

Developed Marketing Mix Models (MMM) using multivariate regression to quantify the impact of media spend across digital, TV, and in-store campaigns on sales performance.

Measured Return on Investment (ROI) and Customer Acquisition Cost (CAC) across paid and organic channels to optimize budget allocation.

Performed multi-touch attribution analysis to evaluate channel contribution and marketing effectiveness.

Built time-series regression models to estimate incremental sales lift from promotional campaigns.

Conducted A/B testing and hypothesis testing to measure marketing campaign performance.

Partnered with marketing stakeholders to provide insights on LTV (Customer Lifetime Value) and customer segmentation strategies.

Environment: Python, Pandas, Numpy, Seaborn, Scipy, Matplotlib, Scikit Learn, NLTK, ETL, Alteryx, Snowflake, Tableau, jQuery, JavaScript, HTML, NodeJS, Hadoop, MongoDB, OLTP, OLAP, ER Studio, Oracle, SQL Server, SQL, Tableau Server.



Contact this candidate