Post Job Free
Sign in

Data Engineer Machine Learning

Location:
Bhubaneswar, Odisha, India
Posted:
September 10, 2025

Contact this candidate

Resume:

Sai Kavyusha Ponnaganti Data Engineer LLM AWS NLP

Chicago, IL ************@*****.*** +1-773-***-**** LinkedIn PROFILE SUMMARY

Data Engineer with ML Expertise and 3+ years of experience building scalable data pipelines and deploying machine learning models in production. Proficient in Spark, Airflow, and dbt for ETL/ELT workflows across AWS, GCP, and Azure. Skilled in transforming raw data into model-ready features and deploying ML models (XGBoost, LSTM, BERT) using FastAPI, Docker, and CI/CD. Experienced in handling imbalanced datasets, real-time monitoring (MLflow), and ensuring model explainability (SHAP). Strong team player with a focus on data quality, performance, and business impact. TECHNICAL SKILLS

• Language: Python, R, SQL, Java (basic)

• Libraries/Frameworks: scikit-learn, XGBoost, LightGBM, PyTorch, TensorFlow, Keras, HuggingFace

• NLP & LLM: BERT, GPT, Transformers, SpaCy, NLTK, TextBlob, TF-IDF

• Deployment & MLOps: MLflow, Airflow, Docker, FastAPI, Flask, Jenkins, Git, CI/CD

• Cloud Platforms: AWS (SageMaker, S3, Lambda), GCP (Vertex AI, BigQuery), Azure ML

• Visualization: Power BI, Tableau, Seaborn, Matplotlib

• Databases: MySQL, PostgreSQL, MongoDB, Snowflake

• Tools: Jupyter, VS Code, Bitbucket, Confluence

ORGANISATIONAL EXPERIENCE

Jan 2025 – Present Charles Schwab, USA Data Engineer

• Designed and deployed a real-time fraud detection system using XGBoost and SMOTE on AWS SageMaker to reduce financial loss.

• Built a FastAPI-based microservice to serve ML predictions in production, containerized with Docker, and automated deployment via CI/CD pipeline using Jenkins, enabling seamless model delivery with 60% lower latency.

• Developed an end-to-end NLP pipeline using HuggingFace Transformers (BERT) for financial sentiment analysis.

• Led the migration of a model retraining workflow to Airflow + MLflow, integrating experiment tracking, version control, and automated validation, which cut deployment time by 70% and ensured continuous model reliability.

• Utilized Apache Spark and PySpark to engineer features from over 100 million transaction records, drastically improving training data pipeline efficiency by 45% for credit scoring models.

• Architected ETL pipelines using Apache Spark and Airflow to process and transform multi-source financial data, improving model readiness and reducing data latency by 50%.

• Implemented a feature store with Redis and Delta Lake for credit risk models, enabling consistent feature reuse and reducing engineering time by 40%.

• Productionized LSTM-based anomaly detection models for transaction monitoring, integrating them with Kafka and Spark Structured Streaming for real-time scoring.

Aug 2020 – Jan 2023 Magna Infotech, India Data Engineer

• Visualized key business metrics through dynamic dashboards in Power BI and Tableau, increasing stakeholder transparency and accelerating decision-making cycles by 30% across the product team.

• Built an AI-powered food classification model using CNNs with MobileNet and ResNet, achieving over 80% accuracy and integrating the solution with a calorie-tracking recommendation system for health-conscious users.

• Collaborated with data engineers to design a robust ETL pipeline using SQL, Airflow, and Snowflake, enabling the real- time flow of cleaned data to ML pipelines and reducing manual data handling by 40%.

• Created a chatbot for resume screening using TF-IDF + Logistic Regression, deployed via Streamlit and Flask, which increased hiring team efficiency by 30% and reduced bias in early-stage screening.

• Implemented an anomaly detection system using Autoencoders and Isolation Forest, paired with Prometheus + Grafana dashboards to monitor model drift and proactively flag data quality issues in production.

• Implemented an anomaly detection system using Autoencoders and Isolation Forest, paired with Prometheus + Grafana dashboards to monitor model drift and proactively flag data quality issues in production. EDUCATION

Master of Science in Data Science Jun 2025

DePaul University, Loop Campus, Chicago, Illinois

Bachelors in Science Mathematics, Statistics and Computer Science Sep 2021 Aditya Degree College, Visakhapatnam, Andhra Pradesh



Contact this candidate