Data Scientist Senior

Location:

Dallas, TX

Salary:

120000

Posted:

October 09, 2025

Contact this candidate

Resume:

Daniel Lopez ******.*******@*****.***

Senior Data Scientist 339-***-**** Dallas, TX

linkedin.com/in/daniel-lopez-452966363

SUMMARY

Senior Data Scientist with 8 years of experience architecting, deploying and scaling end-to-end AI and machine learning solutions across finance, legal, e-commerce, security analytics and IoT. Specializes in LLM-powered NLP, recommendation systems, time-series forecasting and real-time anomaly detection. Converts ambiguous business questions into measurable ML objectives, owns delivery from discovery through MLOps and production and quantifies impact on retention, revenue, MTTR and forecast accuracy. Expert in Python, SQL, TensorFlow, PyTorch, Hugging Face and cloud ML platforms. Collaborative communicator and mentor who raises team standards and stakeholder confidence.

TECHNICAL SKILLS

● Languages & Libraries: Python, SQL, R, Bash, scikit-learn, TensorFlow, Keras, PyTorch, XGBoost, LightGBM, CatBoost, Pandas, NumPy, Matplotlib, Seaborn

● NLP & LLM: Hugging Face Transformers, LangChain, OpenAI API, embeddings, RAG, semantic search, NER, text classification, summarization

● Recommenders & Search: matrix factorization, two-tower and deep embeddings, ANN retrieval, FAISS, Elasticsearch

● Time Series & Forecasting: ARIMA, Prophet, LSTM, feature engineering for seasonality and trends, backtesting

● Anomaly & Fraud: Isolation Forest, autoencoders, clustering, risk scoring, threshold calibration

● Data Engineering & Pipelines: Apache Spark, Kafka, Airflow, AWS Step Functions, SQL modeling, data modeling for analytics

● MLOps & Production: MLflow, Docker, Kubernetes, GitHub Actions, AWS SageMaker, model registry, experiment tracking, model and data drift detection, monitoring and alerting

● Data Quality & Reliability: Great Expectations, data validation, feature contracts, SLA and SLO design

● Observability & Dashboards: Grafana, Prometheus, DataDog, Splunk

● Datastores & Warehouses: PostgreSQL, Redshift, S3, Elasticsearch

● Analytics & Experimentation: A/B testing, experiment design, SHAP and model explainability, KPI definition

● Version Control & Collaboration: Git, GitHub, branching strategies, code review and pull requests, Git LFS for large artifacts

WORK EXPERIENCE

Senior Data Scientist - AstroSirens Oct 2021 - Present

● Lead data scientist across multi-client engagements in finance, legal, retail and e-commerce, media and IoT. Deliver production ML systems processing 50 million to 2 billion events per month. Own full lifecycle from scoping and data strategy through modeling, deployment and monitoring.

● Architected a GPT-4 and LangChain RAG pipeline with Elasticsearch and FAISS for contract review at a global law firm. Fine-tuned transformer encoders using a PyTorch backend to boost retrieval quality. Automated clause extraction, risk classification and semantic retrieval to cut manual review time by 70% and lift throughput by more than three times. Implemented prompt and version control, PII redaction and response guardrails. Deployed reproducibly with MLflow and containerized inference on AWS SageMaker.

● Built customer churn and survival models for a fintech and SaaS portfolio using XGBoost and time-to-event features derived from behavioral, transactional and engagement signals. Drove a 15% year-over-year retention lift with targeted offers guided by uplift modeling. Instrumented A/B tests and delivered SHAP-based explainability dashboards to Marketing and Customer Success for actionability and trust.

● Designed multi-tenant recommendation systems for e-commerce and media clients using matrix factorization and deep embeddings on multi-modal data. Increased upsell conversions by 18% and click-through rate by 12%. Implemented semantic and item search with vector similarity and content-based cold-start fallback to improve first-session relevance.

● Engineered real-time anomaly detection for operations and fraud using Kafka streams with PyTorch autoencoders and Isolation Forest. Served low-latency predictions via SageMaker endpoints with sliding windows and online features. Reduced incident response time by 40% and cut alert noise by 28% with calibrated thresholds and risk scoring. Added drift monitoring and automated retraining triggers.

● Productionized demand, revenue and capacity forecasting using Prophet and TensorFlow LSTM with backtesting and bias and variance diagnostics. Improved MAPE by 20%, enabling proactive inventory and staffing decisions. Orchestrated pipelines with AWS Step Functions and enforced data quality gates using Great Expectations.

● Integrated LLM-based assistants into customer support workflows with secure retrieval and escalation logic for a consumer app. Reduced average handle time by 35% and increased customer satisfaction by 10 points. Monitored response quality, deflection rate and hallucination frequency to sustain performance.

● Established standardized MLOps practices using MLflow model registry, Docker and Kubernetes deployments and CI and CD through GitHub Actions. Implemented model and data drift alerts with DataDog, feature contracts, service level objectives and rollbacks to reduce time to production by 30% and improve reliability.

● Mentored six data scientists on LLMOps, NLP and deep learning training best practices in PyTorch and TensorFlow. Facilitated discovery workshops to translate business targets into measurable ML KPIs and aligned deliverables with product and engineering leaders. Data Scientist - Splunk Jun 2018 - Sep 2021

● Developed and scaled security analytics and observability features for enterprise SIEM and log data. Partnered with product, engineering and customer success to ship reliable ML in production.

● Built user and machine anomaly detection with clustering and Isolation Forest over SIEM telemetry to reduce false positives by 30% and accelerate threat response. Integrated detections into Splunk dashboards, alerting rules and risk scoring to prioritize triage.

● Delivered predictive maintenance for connected devices using TensorFlow RNNs and gradient boosting, reducing unplanned downtime by 20%. Orchestrated rolling-window retraining with Spark and exposed scores and uncertainty in near real-time visualizations.

● Implemented log parsing, classification and semantic similarity search with Hugging Face Transformers on a PyTorch backend to improve incident routing and shorten mean time to resolution. Surfaced similar incidents to speed root-cause investigations and playbook selection.

● Optimized high-volume feature pipelines using Splunk HEC and Spark to improve training throughput and inference latency service levels. Built visualizations for anomaly trajectories, model confidence and system reliability to increase adoption.

Junior Data Scientist - IBM May 2017 - Apr 2018

● Supported analytics engagements in legal and financial services and operations with a focus on NLP and forecasting.

● Developed text classification and entity extraction pipelines for document processing in legal and financial datasets using TensorFlow Keras. Improved processing speed and accuracy while adhering to privacy and access controls.

● Created Prophet and ARIMA models for demand and operational KPIs to guide inventory and staffing plans. Delivered reproducible, parameterized pipelines with robust validation across time.

● Prepared large-scale datasets across SQL, Redshift and S3. Implemented feature engineering and quality checks and contributed to team knowledge sharing on Spark and ML workflows. EDUCATION

Master of Science in Engineering Data Science and AI University of Houston Sep 2015 - Mar 2017 Houston, TX

Bachelor of Science in Computer Science University of Houston Apr 2011 - Sep 2015 Houston, TX

Contact this candidate