PARTHIKA BATTALA
USA (open to relocate) +1-630-***-**** ************@*****.*** LinkedIn GitHub Summary
Results-driven Data Scientist with 3+ years of experience delivering machine learning and analytics solutions across financial services and consulting domains. Skilled in Python, SQL, Spark, TensorFlow, AWS, GCP, Power BI, Docker, Kubernetes, and CI/CD pipelines. Expertise in predictive modeling, NLP, LLM/GenAI, fraud detection, time series forecasting, A/B testing, and MLOps. Proven ability to build scalable ETL pipelines, deploy ML models, optimize business processes, and deliver data-driven insights through cross-functional collaboration and stakeholder engagement.
Experience
Data Scientist Goldman Sachs, USA Aug 2024 – Present
• Developed and deployed ML models for risk assessment, fraud detection, and quantitative analysis; managed end-to-end model lifecycle using MLflow and Git-based version control following Agile/Scrum methodologies.
• Engineered scalable data pipelines using Python and SQL via Docker-containerized services and CI/CD workflows, improving reporting efficiency by 35%; built Power BI and Tableau dashboards for real-time KPI monitoring and executive stakeholder communication.
• Applied NLP and LLM/GenAI techniques (RAG, prompt engineering) to extract sentiment from financial reports and news feeds; led A/B testing and statistical modeling frameworks to validate model performance across multiple asset classes.
• Collaborated cross-functionally with quantitative research and trading teams to develop alpha-generating signals using alternative data sources, contributing to a 12% improvement in risk-adjusted returns.
• Automated regulatory reporting workflows using Python and SQL, reducing manual effort by 50% mentored junior data analysts on MLOps best practices, reproducible research, and data governance standards.
• Designed and implemented real-time model monitoring dashboards using MLflow and Grafana, enabling early detection of data drift and reducing model degradation incidents by 40%.
Data Scientist Tata Consultancy Services, India Jan 2022 – Jul 2023
• Designed end-to-end data analytics and machine learning solutions for Fortune 500 clients across retail, healthcare, and finance sectors; built supervised and unsupervised ML models achieving up to 20% improvement in prediction accuracy.
• Developed automated ETL workflows using Python (Pandas, NumPy, Scikit-learn) and Apache Spark; containerized data pipelines with Docker, reducing manual reporting time by 40% and improving pipeline reliability.
• Deployed optimized predictive models to cloud environments (AWS SageMaker, GCP Vertex AI) with advanced feature engineering and hyperparameter tuning, reducing model inference latency by 22%.
• Partnered with data engineering and cross-functional business teams to define data quality standards and governance policies, reducing pipeline error rates by 30% created interactive Tableau dashboards adopted by 3 client teams for ongoing performance tracking.
• Built and fine-tuned NLP pipelines for text classification and entity extraction on unstructured client data, improving downstream reporting accuracy by 18% across 2 healthcare client engagements.
• Led knowledge transfer sessions and authored technical documentation in knowledge wikis, enabling seamless onboarding of 5+ new team members and reducing ramp-up time by 35%.
Skills
• Programming Languages: Python (Pandas, NumPy, Scikit-learn, TensorFlow, PyTorch), SQL, MySQL, R, Scala, Spark, Git, Jupyter
• AI/ML Frameworks: Supervised/Unsupervised Learning, NLP, LLMs/GenAI (RAG, Prompt Engineering, Fine-tuning), Deep Learning, CNNs, Transformers, XGBoost, Random Forest, Time Series Forecasting, A/B Testing, Feature Engineering, Statistical Modeling, MLOps
(MLflow, Kubeflow), Model Interpretability (SHAP, LIME)
• Cloud & DevOps: AWS (SageMaker, S3, EC2, Lambda), GCP (Vertex AI, BigQuery), Azure ML, Alteryx, Docker, Kubernetes, CI/CD, ETL, Apache Airflow, Hadoop
• Visualization: Power BI, Tableau, Matplotlib, Seaborn, Plotly, Streamlit, Excel Education
Master of Science in Data Science Aug 2023 – May 2025 Lewis University, USA
Projects
Financial Fraud Detection System
Tech Stack: Python, XGBoost, Random Forest, Flask, Docker, MLflow, CI/CD
• Built a real-time fraud detection model on 500K+ transactions using XGBoost and Random Forest, achieving 97.3% accuracy and reducing false positives by 28%.
• Implemented SMOTE, feature engineering, Docker deployment, and CI/CD pipelines to ensure scalable, reliable, continuous model delivery.
• Improved real-time monitoring and alert systems detecting suspicious transactions quickly, improving fraud investigation response efficiency significantly.
Customer Churn Prediction & Retention Analytics
Tech Stack: Python, Power BI, SQL, Scikit-learn, Neural Networks
• Enhanced churn prediction models on 100K+ CRM records using logistic regression and neural networks, achieving an AUC-ROC of 0.91.
• Built interactive Power BI dashboards visualizing churn trends, customer lifetime value, and campaign performance metrics effectively.
• Performed customer segmentation analysis to identify high-risk customers and improve overall retention strategy effectiveness.