Machine Learning Data Science

Location:

Charlotte, NC

Posted:

June 03, 2025

Contact this candidate

Resume:

Chandra Sekhar Ravuri

*******************@*****.***

980-***-****

SUMMARY

Results-driven data professional with over 5 years of experience across data science, engineering, analytics, and machine learning. Proven ability to turn messy, high-volume data into clean, actionable insights and deployable models in cloud-native environments. Adept at solving complex problems across industries like banking, healthcare, and retail using end-to-end data solutions.

Worked for the last year on Large Language Models and migrating applications to underlying Large Language Model architectures

Partnered with stakeholders to define KPIs, success metrics, and model goals aligned with business impact in fraud detection, marketing, and health operations.

Led gap analyses in pharma and banking domains to identify missing links between raw data sources and business insights, informing solution roadmaps.

Advised data retirement and governance teams on the LEI framework, helping them reuse legacy data effectively with audit-ready clarity.

Worked with Chat GPT and Chat GPT API to leverage its capabilities with several applications.

Experience in Machine Learning, Natural Language Processing, and all stages of Software Engineering Lifecycle such as planning, analysis, design, implementation, and maintenance of scalable software architecture.

Built secure, reusable ingestion pipelines using Azure Data Factory and Airflow to automate CSV/JSON loading from SharePoint and SFTP to BigQuery and Azure Data Lake.

Constructed batch and real-time data flows using PySpark, Hive, and AWS Lambda, ensuring consistent schema validation and error logging.

Designed and maintained Snowflake SQL scripts for enterprise-scale reporting layers, enabling downstream users to access clean, unified views.

Automated SharePoint-to-BigQuery ingestion with Jupyter notebooks, saving the team 10+ hours per week and reducing dependency on manual uploads.

Developed and maintained master datasets in SAP for LEI operations, integrating business logic with system-level automation.

Used SQL, SAS, and Python (Pandas, NumPy) to clean and transform transactional, demographic, and clinical data with consistent quality checks.

Created auditing pipelines using GCP, Data Studio, and Excel to validate staging data before modeling or reporting.

Designed data models (reverse & forward engineered) with 3NF structures to support accurate joins, reduce redundancy, and improve performance.

Deep analytics and understanding of Big Data and Natural Language Processing (NLP), Machine Learning algorithms using Hadoop, MapReduce, NoSQL and distributed computing tools.

Leveraged dbt and Alteryx to build modular transformation layers, simplifying ETL processes and enhancing reusability.

Performed EDA using Seaborn, Matplotlib, and Power BI to surface trends in patient metrics, churn behavior, and loan applications.

Built weekly operational dashboards using Tableau and Power BI to track business KPIs for compliance, marketing, and support teams.

Engineered features from LEI, transaction, chat, and patient records to support fraud detection, compliance scoring, and chatbot training.

Reduced transformation runtime by 30% by re-indexing large dataframes and applying optimized joins and filters in SQL and PySpark.

Built classification models (Random Forest, Logistic Regression, XGBoost) for insurance, fraud, and loan approval use cases.

Developed regression models in R to estimate clinical metrics like body fat using low-cost, minimal-feature datasets.

Applied NLP techniques (topic modeling, entity extraction, keyword tagging) using Spacy, NLTK, and Transformers for chatbot and metadata tasks.

Built sequence models (LSTM, Bi-LSTM) and transformer-based text generators for biomedical applications with BLEU evaluation.

Created risk scores and customer segmentation logic using clustering algorithms (K-Means, hierarchical) for early warning systems.

Validated models using k-fold CV, precision-recall curves, and AUC metrics to ensure fairness and generalization.

Designed A/B testing frameworks for marketing and chatbot optimization, helping measure improvements with statistical rigor.

Built REST APIs using FastAPI and Flask to expose ML predictions, supporting real-time inference and integration with business apps.

Deployed models in AWS and Azure environments using CI/CD pipelines (Docker, Jenkins, GitHub Actions), ensuring stability and scalability.

Built logging and exception handling in ETL and API scripts to provide fail-safe monitoring in production environments.

Automated repetitive reports and dashboards using SQL and Tableau macros, freeing up analyst time and improving consistency.

Designed reusable SQL templates and Power BI components, improving onboarding for new analysts and reducing project ramp-up time.

Led internal training on dashboards, LEI usage, and SQL workflows, promoting knowledge sharing across ops, compliance, and support teams.

Served as an SME for a new ERP system, preparing step-by-step documentation and guiding adoption across departments.

Presented findings to senior executives with emphasis on ROI, impact, and business alignment—enabling faster decision-making and higher trust in data.

TECHNICAL SKILLS

Languages: Python, SQL, R, C, SAS, Bash, Verilog

ML & Deep Learning: Scikit-learn, TensorFlow, PyTorch, Keras, XGBoost, Transformers, CNN, RNN, Seq2Seq, SpaCy, NLTK

NLP & GenAI: Transformers (Hugging Face), SpaCy, NLTK, LSTM, Topic Modeling, BLEU, FastText, Seq2Seq

Data Engineering & ETL: PySpark, Apache Spark, Hadoop, Hive, Apache Hudi, dbt, Airflow, Luigi, Prefect

Databases & Warehousing: MySQL, PostgreSQL, Snowflake, MongoDB, DynamoDB, Azure Blob Storage, Data Warehousing

Cloud Platforms: AWS (S3, Redshift, Lambda, EMR, SageMaker, EC2, Kinesis, Step Functions), Azure (ADF, Data Lake, ML Studio), Google Cloud Platform (BigQuery, Cloud Functions)

Databases: MySQL, PostgreSQL, Dynamo DB, Azure Blob Storage, Data Warehousing, Snowflake, MongoDB

Visualization & BI Tools: Tableau, Power BI, Looker, Excel (PivotTables, INDEX-MATCH, Macros), Google Data Studio

APIs & App Development: FastAPI, Flask, REST APIs, Jupyter Notebooks, Visual Studio

Version Control & DevOps: Git, GitHub, GitLab, Docker, Jenkins, GitHub Actions, Shell Scripting

Operating Systems: Linux, Unix, Windows, Command Line Tools

Experience

Uline Atlanta, GA

AI/ML Engineer June 2024 – Present

Technologies Used: Python (pandas, NumPy, scikit-learn, SciPy, matplotlib, seaborn, NLTK), SQL, MongoDB, Hadoop, Excel, SPSS, SAS, Power BI, Tableau, AWS, Azure, GCP, JIRA, CVS, Git.

Developed an advanced NLP Engine using Azure LUIS, boosting the predictive accuracy and performance of language understanding models.

Designed custom deep learning models using PyTorch and Transformer architectures (e.g., BERT) for intent classification and entity recognition.

Managed data ingestion and storage across relational (SQL) and non-relational (MongoDB) databases, as well as distributed systems using Hadoop.

Participated in data mining phases including data collection, validation, and pipeline structuring to ensure quality and accessibility.

Conducted exploratory data analysis (EDA) and created data visualizations using matplotlib and seaborn to identify patterns and outliers.

Integrated ChatGPT API into client-facing platforms, automating 86% of manual tasks and reducing support load through contextual AI interactions.

Built production-grade, dynamic-response chatbots using the LangChain Framework, leveraging LLMs for real-time, context-aware answers across high-volume environments.

Developed and validated supervised and unsupervised models including Bayesian Hidden Markov Models, XGBoost, SVMs, and Random Forests using scikit-learn and SciPy.

Leveraged NLP techniques with NLTK for unstructured text analysis where applicable.

Performed rigorous model evaluations using cross-validation, A/B testing, and hyperparameter tuning, ensuring optimal generalization performance.

Automated repetitive data workflows, reducing processing time by 65% and boosting operational efficiency.

Designed and maintained Clickstream Data Ingestion Pipelines for NLP training, enabling real-time intent and entity refinement using Azure Data Lake and Cognitive Services.

Deployed scalable ML solutions on AWS, Azure, and GCP, integrating with CI/CD pipelines for continuous delivery and updates.

Monitored post-deployment model performance via real-time dashboards (Power BI, Tableau), enabling transparent, data-driven decision-making.

Used JIRA for tracking tasks and bugs, and CVS and Git for version control and collaborative development.

Humana Louisville, Kentucky

Data Engineer Nov 2022 - May 2024

Technologies Used: Python, SQL, Scikit-learn, Transformers, XGBoost, Spacy, Azure Data Factory, Azure Data Lake, AWS Lambda, Step Functions, Apache Airflow, PySpark, Hive, Tableau, Power BI, Fraud Detection, Risk Modeling, Regulatory Compliance, NLP, Financial Forecasting, A/B Testing, LLM Automation

Collaborated with business analysts and compliance teams to define and implement data solutions for customer segmentation, fraud detection, and risk alerting across financial product lines.

Conducted stakeholder interviews to shape requirements for anomaly detection in financial reporting, reducing operational blind spots.

Developed secure ingestion pipelines using Azure Data Factory (ADF) to load sensitive CSV and JSON files from SFTP into Azure Data Lake, ensuring encryption at rest and in transit.

Built automated PySpark-based pipelines to structure and clean transaction logs, preparing compliant and audit-ready datasets for reporting and modeling.

Tuned Spark job configurations and SQL query logic to reduce transformation time across financial datasets (e.g., credit history, transaction activity), enhancing real-time analytics readiness.

Tuned Spark job performance using AQE and shuffle optimization, reducing job runtime by 30% across key pipelines.

Orchestrated pipelines with Apache Airflow, integrating with Azure Data Factory to automate data workflows and enable secure, traceable operations.

Built real-time alerting systems using AWS Lambda and SNS to monitor transaction anomalies and alert stakeholders to potential fraud signals.

Developed predictive models for fraud detection and customer behavior analysis, using scikit-learn, XGBoost, and Transformer models.

Integrated cloud-native tools (ADF, S3, Step Functions) to deploy scalable automation for data validation and dashboard refresh.

Built and deployed NLP-driven classification models to monitor and categorize customer service interactions, supporting chatbot systems and internal helpdesk automation.

Created topic models to classify and route queries to the right internal teams, reducing average resolution time by 20%.

Created interactive Power BI dashboards to report risk flags, alert patterns, and operational KPIs to executives and audit teams.

Used SQL to build secure, traceable reporting layers over Hive, Azure Data Lake, and PostgreSQL for internal and regulatory reporting.

Deployed ML models into production on AWS (Lambda, Step Functions, S3) using MLOps best practices with versioned pickle files and CI/CD integration.

Reduced time-to-detection of suspicious transactions by automating fraud alerting workflows, improving compliance responsiveness.

Streamlined metadata management and eliminated manual risk report generation using LLM-based automation solutions.

Millenium Intech Private Limited Chennai, India

Data Analyst Nov 2020 - Jul 2022

Technologies Used: Python (Pandas, NumPy, SQLAlchemy, Scikit-learn, XGBoost, Seaborn, Matplotlib), PySpark, SQL, Excel Macros, Tableau, AWS, Docker, Jenkins, GitHub Actions.

Collaborated with stakeholders to identify analytics opportunities in loan processing and supply chain operations, translating business needs into data solutions.

Conducted statistical analysis and A/B testing to validate hypotheses and optimize strategic decisions, improving overall operational efficiency.

Built scalable ETL pipelines using Python and Excel macros to integrate data from multiple third-party sources, streamlining workflows and improving reporting accuracy.

Managed and optimized SQL databases to support 100,000+ monthly financial transactions, ensuring high data integrity and performance.

Performed advanced data preprocessing and feature engineering on 50,000+ loan applications, boosting model accuracy and enabling nuanced credit risk assessments.

Automated repetitive data cleaning processes using Python (NumPy, Pandas, SQLAlchemy), cutting manual processing time by 50%.

Conducted in-depth exploratory analysis and developed visual insights using Tableau, Seaborn, and Matplotlib to identify trends and guide business strategies.

Delivered compelling visual narratives to non-technical stakeholders, improving cross-functional alignment and buy-in.

Built AI-driven loan approval models using Random Forest, Gradient Boosting, and KNN, achieving 82% accuracy and 70% recall.

Developed fraud detection systems using anomaly detection algorithms, reducing financial risk and preventing losses.

Utilized PySpark for large-scale data processing, improving runtime efficiency by 40%.

Applied hyperparameter tuning and cross-validation to improve model generalizability and reduce overfitting.

Enhanced strategic planning via predictive models deployed on AWS infrastructure, which led to a 15% increase in loan approvals through improved risk profiling.

Containerized models using Docker and deployed them via robust CI/CD pipelines (Jenkins, GitHub Actions) to ensure reliable production updates.

Monitored model performance and business KPIs post-deployment using real-time dashboards, enabling continuous improvement.

Led knowledge-sharing sessions on data analytics tools and best practices, fostering a data-first culture within the organization.

Worked closely with product and operations teams to convert analytical findings into impactful business outcomes.

GROWZ SOFTWARE SOLUTIONS Bangalore, India

Jr. Data Analyst Jan 2020 – Oct 2020

Technologies Used: R, SQL, Power BI, Excel, Tableau, Jupyter, Google BigQuery, SharePoint, SAS, SAP

Analyzed 2,000+ patient records to derive actionable health insights, aiding clinical decision-making and improving patient care workflows.

Built a regression model in R to estimate body fat percentage with less than 2% error against DXA benchmarks, using minimal and interpretable features.

Conducted statistical analysis using R to identify patterns and correlations in patient attributes, informing targeted interventions.

Created interactive Power BI dashboards to visualize patient health KPIs, improving visibility and responsiveness for executive reporting.

Used SQL and SAS to extract and transform datasets from multiple SQL servers, ensuring data readiness for cross-departmental analysis.

Automated data ingestion from SharePoint into Google BigQuery using Jupyter notebooks, saving 10+ hours/week and increasing data freshness.

Developed reusable SQL code for Listings Exclusions/Inclusions (LEI) reporting, enabling standardized access to pharma compliance data across teams.

Built real-time auditing dashboards using Google Data Studio to track data accuracy and reduce compliance errors in LEI master data.

Maintained LEI datasets in SAP, collaborating with domain experts to ensure high data integrity across product listings.

Conducted root cause analysis and streamlined LEI reporting pipelines, reducing manual reporting efforts through automation in Excel and Tableau.

Led internal knowledge transfer sessions on LEI dashboards and SQL workflows, empowering junior analysts and support teams.

Performed reverse and forward engineering to optimize health data models, improving schema maintainability and query performance.

Built weekly performance metric dashboards in Power BI and Excel, integrating multi-source data to assist with operational reviews.

Partnered with cross-functional teams, including clinical ops, finance, and IT, to consolidate data requirements and scale reporting capabilities.

Acted as an early SME for a new ERP module, creating training materials and onboarding users to ensure smooth data adoption and usability.

ACADEMIC PROJECTS

HLA Translation Using NLP

Technologies Used: Python, TensorFlow, Keras, Transformer Models, LSTM, Seq2Seq, Jupyter Notebook, BLEU Score.

Developed and optimized sequence-to-sequence models including Transformer, LSTM, and Bi-LSTM for biomedical text generation.

Benchmarked four model architectures for BLEU score and memory efficiency, reflecting an understanding of model performance tradeoffs.

Fine-tuned architecture and hyperparameters to achieve BLEU score of 0.60, balancing speed and accuracy in sequence translation.

Alzheimer's Disease Prediction Using ML & FastAPI

Technologies Used: Python, XGBoost, Scikit-learn, Logistic Regression, FastAPI, AWS, SHAP, SQL, PostgreSQL.

Trained Logistic Regression and XGBoost models for health prediction, achieving 95% ROC-AUC.

Developed a REST API using FastAPI and deployed model to AWS, integrating prediction pipeline with real-time input validation.

Performed precision-recall trade-off analysis and cross-validation, with focus on latency and throughput across demographic segments.

EDUCATION

Master of Science, Data Science April 2024

Bowling Green State University, Bowling Green, Ohio

●Relevant Coursework: Data Science Programming, DBMS, Regression Analysis, Data Mining, Time Series Analysis, Artificial Intelligence, Methods, Probability Theory I, Mathematical Statistics II, Linear and Integer Programming, Data Science Project

Contact this candidate