Saketh Varma Kalidindi
***********.*********@**********.*** • +1-934-***-**** • linkedin.com/in/saketh-varma-kalidindi-413b69237/ Education
Stony Brook University, Stony Brook, NY
Master of Science in Data Science (Computer Science)Expected May 2026 GPA: 3.9/4.0
VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India Bachelor of Technology in Electrical and Electronics Engineering (Major), Data Science (Minor)2019–2023 Major GPA: 9.10/10 Minor GPA: 9.68/10
Technical Skills
• Programming Languages: Python, Java, C++, SQL, Go, R
• Analytics & ML Frameworks: Scikit-learn, PyTorch, TensorFlow, XGBoost, HuggingFace Transformers, OpenAI GPT, LangChain
• Statistical Modeling: Collaborative Filtering, Matrix Factorization, Word2Vec, Logistic Regression, GBDT, Deep Neural Networks, Wide & Deep Models
• Business Intelligence & Visualization: Power BI, Tableau, Microsoft Excel (Advanced), Looker, Plotly, Matplotlib, Seaborn
• Data Engineering & MLOps: MLflow, Docker, Kubernetes, Jenkins, CI/CD, Prometheus, Grafana
• Analytics Tools: A/B Testing Frameworks, Statistical Hypothesis Testing, Experimental Design, Causal Inference, Time Series Analysis
• Database Systems: MySQL, PostgreSQL, MongoDB, Redis, Cassandra, Snowflake
• Specialized Skills: REST APIs, NLP, Large Language Models, Feature Engineering, Data Mining, ETL/ELT Pipelines
Professional Experience
Data Analyst, Brane Enterprises, Hyderabad, India, March 2023 – June 2024
• Architected scalable analytics pipelines processing 1.5M+ daily records utilizing PySpark, Databricks, and Azure Synapse, maintaining 99.9% uptime for production inference systems
• Developed reusable statistical modeling frameworks leveraged by 10+ cross-functional teams, enabling click-through rate optimization, vector embedding generation, and customer segmentation while accelerating development velocity by 40%
• Modernized 500K+ lines of legacy codebase through automated refactoring tools, recovering 200+ engineering hours quarterly across predictive modeling workflows
• Optimized end-to-end experimentation orchestration via Apache Airflow DAGs, MLflow tracking, and Gradle build automation, reducing model deployment cycles by 35%
• Implemented comprehensive monitoring infrastructure using Prometheus metrics and Grafana visualizations, expediting anomaly detection and decreasing mean time to resolution by 60% Data Science Intern, Verzeo, Hyderabad, India May 2021 – Jun 2021
• Designed and fine-tuned 5+ deep learning architectures for text classification leveraging PyTorch and TensorFlow, enhancing predictive performance from 76% to 88% through recurrent neural networks
• Built robust data preprocessing pipelines incorporating advanced NLP techniques (tokenization, sequence normalization), improving model generalization by 21%
• Orchestrated hyperparameter optimization experiments with TensorBoard monitoring and version-controlled model artifacts, analyzing insights from 50+ experimental iterations Projects
Human Activity Recognition Analytics Platform (PyTorch, Residual CNN, BiLSTM, Attention Mechanisms)
• Engineered a sophisticated deep learning ensemble combining residual convolutional layers and bidirectional LSTM with attention mechanisms to classify 45+ movement patterns from IoT sensor data streams
• Eliminated manual feature engineering requirements by implementing automated representation learning, reducing computational overhead by 67% and boosting F1-score from 47% to 56.9%
• Deployed production-grade inference system supporting 1M+ concurrent users, incorporating automated model refresh pipelines for continuous learning lifecycle management Cross-Lingual Information Retrieval System(Q&A) (TensorFlow, HuggingFace, mBERT, XLM-R)
• Fine-tuned multilingual BERT and XLM-RoBERTa models on 150K+ question-answer datasets for cross-linguistic comprehension, achieving 82.4 F1-score on TyDi QA and MLQA evaluation benchmarks
• Constructed comprehensive text preprocessing pipelines spanning 3+ linguistic domains, delivering 38% improvement in zero-shot transfer learning performance
• Enhanced distributed training infrastructure with 4x gradient accumulation and mixed-precision computing across 8 GPU clusters, accelerating model convergence throughput by 3.2x while reducing memory footprint by 45%
Predictive Sports Analytics Platform (GCP, PySpark, BigQuery, XGBoost)
• Created real-time ETL orchestration and automated model training workflows using Apache Airflow and PySpark to ingest 500K+ match records with sub-second latency
• Calibrated ensemble ranking algorithms (Logistic Regression, Gradient Boosting) for outcome prediction and player performance scoring, attaining 85.3% classification accuracy
• Streamlined automated retraining schedules and executive reporting dashboards, minimizing manual intervention by 70% and enabling continuous model enhancement