Data Science Business Intelligence

Location:

Lake Grove, NY, 11755

Salary:

Posted:

August 02, 2025

Contact this candidate

Resume:

Saketh Varma Kalidindi

***********.*********@**********.*** • +1-934-***-**** • linkedin.com/in/saketh-varma-kalidindi-413b69237/ Education

Stony Brook University, Stony Brook, NY

Master of Science in Data Science (Computer Science)Expected May 2026 GPA: 3.9/4.0

VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India Bachelor of Technology in Electrical and Electronics Engineering (Major), Data Science (Minor)2019–2023 Major GPA: 9.10/10 Minor GPA: 9.68/10

Technical Skills

• Programming Languages: Python, Java, C++, SQL, Go, R

• Analytics & ML Frameworks: Scikit-learn, PyTorch, TensorFlow, XGBoost, HuggingFace Transformers, OpenAI GPT, LangChain

• Statistical Modeling: Collaborative Filtering, Matrix Factorization, Word2Vec, Logistic Regression, GBDT, Deep Neural Networks, Wide & Deep Models

• Business Intelligence & Visualization: Power BI, Tableau, Microsoft Excel (Advanced), Looker, Plotly, Matplotlib, Seaborn

• Data Engineering & MLOps: MLflow, Docker, Kubernetes, Jenkins, CI/CD, Prometheus, Grafana

• Analytics Tools: A/B Testing Frameworks, Statistical Hypothesis Testing, Experimental Design, Causal Inference, Time Series Analysis

• Database Systems: MySQL, PostgreSQL, MongoDB, Redis, Cassandra, Snowflake

• Specialized Skills: REST APIs, NLP, Large Language Models, Feature Engineering, Data Mining, ETL/ELT Pipelines

Professional Experience

Data Analyst, Brane Enterprises, Hyderabad, India, March 2023 – June 2024

• Architected scalable analytics pipelines processing 1.5M+ daily records utilizing PySpark, Databricks, and Azure Synapse, maintaining 99.9% uptime for production inference systems

• Developed reusable statistical modeling frameworks leveraged by 10+ cross-functional teams, enabling click-through rate optimization, vector embedding generation, and customer segmentation while accelerating development velocity by 40%

• Modernized 500K+ lines of legacy codebase through automated refactoring tools, recovering 200+ engineering hours quarterly across predictive modeling workflows

• Optimized end-to-end experimentation orchestration via Apache Airflow DAGs, MLflow tracking, and Gradle build automation, reducing model deployment cycles by 35%

• Implemented comprehensive monitoring infrastructure using Prometheus metrics and Grafana visualizations, expediting anomaly detection and decreasing mean time to resolution by 60% Data Science Intern, Verzeo, Hyderabad, India May 2021 – Jun 2021

• Designed and fine-tuned 5+ deep learning architectures for text classification leveraging PyTorch and TensorFlow, enhancing predictive performance from 76% to 88% through recurrent neural networks

• Built robust data preprocessing pipelines incorporating advanced NLP techniques (tokenization, sequence normalization), improving model generalization by 21%

• Orchestrated hyperparameter optimization experiments with TensorBoard monitoring and version-controlled model artifacts, analyzing insights from 50+ experimental iterations Projects

Human Activity Recognition Analytics Platform (PyTorch, Residual CNN, BiLSTM, Attention Mechanisms)

• Engineered a sophisticated deep learning ensemble combining residual convolutional layers and bidirectional LSTM with attention mechanisms to classify 45+ movement patterns from IoT sensor data streams

• Eliminated manual feature engineering requirements by implementing automated representation learning, reducing computational overhead by 67% and boosting F1-score from 47% to 56.9%

• Deployed production-grade inference system supporting 1M+ concurrent users, incorporating automated model refresh pipelines for continuous learning lifecycle management Cross-Lingual Information Retrieval System(Q&A) (TensorFlow, HuggingFace, mBERT, XLM-R)

• Fine-tuned multilingual BERT and XLM-RoBERTa models on 150K+ question-answer datasets for cross-linguistic comprehension, achieving 82.4 F1-score on TyDi QA and MLQA evaluation benchmarks

• Constructed comprehensive text preprocessing pipelines spanning 3+ linguistic domains, delivering 38% improvement in zero-shot transfer learning performance

• Enhanced distributed training infrastructure with 4x gradient accumulation and mixed-precision computing across 8 GPU clusters, accelerating model convergence throughput by 3.2x while reducing memory footprint by 45%

Predictive Sports Analytics Platform (GCP, PySpark, BigQuery, XGBoost)

• Created real-time ETL orchestration and automated model training workflows using Apache Airflow and PySpark to ingest 500K+ match records with sub-second latency

• Calibrated ensemble ranking algorithms (Logistic Regression, Gradient Boosting) for outcome prediction and player performance scoring, attaining 85.3% classification accuracy

• Streamlined automated retraining schedules and executive reporting dashboards, minimizing manual intervention by 70% and enabling continuous model enhancement

Contact this candidate