Location: Exton, PA area, Flexible Hybrid
Must have green card or be a US Citizen
About Company
Our client is a pioneer in phenotypic screening and in vivo pharmacology services, offering transformative insights to clients in the biotech and pharmaceutical sectors. Their mission is to identify novel therapeutic opportunities through rigorous preclinical science. They are seeking a Lead Data Scientist to accelerate their platform by building intelligent, scalable data systems that enhance our experimental rigor and translational insights.
Role Summary
As Lead Data Scientist, you will drive the development and application of machine learning and data infrastructure to support in vivo R&D efforts. This includes behavior analysis from video data, biosignal processing, and development of automation tools that improve speed, accuracy, and reproducibility in preclinical research. You will collaborate closely with biology, operations, and client-facing teams to turn complex data into actionable insights.
Key Responsibilities
Data Infrastructure & Strategy
Lead modernization of data infrastructure across preclinical and business functions, improving data access, standardization, and workflow efficiency.
Implement robust documentation and data versioning practices to support reproducibility and regulatory compliance.
Behavioral Analytics & Deep Learning
Build and deploy deep learning pipelines for behavior classification and motion tracking in rodent models using tools like TensorFlow, Keras, and Ultralytics.
Oversee all phases of model development: manual labeling, preprocessing, training, validation, and iteration to reach high accuracy (e.g., >95% F1 score).
Electrophysiology & Signal Analysis
Develop and maintain processing pipelines for extracting key features from EEG and other biosignals.
Create dynamic visualizations and analysis tools to support interpretation of complex in vivo datasets.
Automation & Document Intelligence
Design and implement NLP and document-scraping pipelines to transform unstructured client reports into structured databases.
Automate generation of client-facing reports and internal summaries, significantly reducing turnaround time.
Machine Learning Initiatives
Apply supervised and unsupervised learning methods to phenotypic screening datasets for pattern discovery, treatment differentiation, and predictive modeling.
Collaborate with R&D teams to develop interpretable models that inform dosing, mechanism-of-action studies, and novel compound discovery.
Integrate ML outputs with in-lab digital tools to improve experiment monitoring and annotation efficiency.
Qualifications
Ph.D. or M.S. in Data Science, Computer Science, Neuroscience, Biomedical Engineering, or a related field.
5+ years of experience applying ML to real-world data, preferably in a preclinical, CRO, or in vivo environment.
Deep expertise in Python (pandas, NumPy, scikit-learn, PyTorch/TensorFlow).
Hands-on experience with behavioral video analysis, signal processing, and ML model deployment.
Familiarity with in vivo pharmacology, rodent behavioral models, or electrophysiology is a strong plus.
Preferred Skills
Experience in phenotypic screening or drug discovery workflows.
Use of annotation tools like CVAT or Labelbox for supervised learning datasets.
Comfort with MLOps and cloud-based platforms (e.g., AWS, MLflow, Docker).
If you are interested in this fantastic opportunity, please email
#HYBRID