Machine Learning Data Science

Location:

Lubbock, TX, 79416

Posted:

December 12, 2023

Contact this candidate

Resume:

PETER MUHLBERGER

Lubbock, TX

703-***-****

SUMMARY

My education includes a BA from the University of Chicago with honors calculus and statistics and a MPP and Ph.D. from the University of Michigan in quantitative political science (2nd best such program in the U.S.), with extensive quantitative preparation. My career has included teaching database management and software analysis and design (requirements gathering etc.) at Carnegie Mellon Univ. (CMU) to programming teams working with companies; and leading a software development team as well as programming my own Java-based survey software (years before SurveyMonkey) for a large CMU grant project. My career has also included designing, carrying out, and analyzing data on multiple large grant projects as a quantitative and increasingly computational social scientist. I have served as a senior analyst and program director at the U.S. National Science Foundation, where I conducted Bayesian and maximum likelihood statistical analyses and applied machine learning methods. I also worked extensively with diverse teams to develop and meet program objectives and communicated with multiple diverse audiences, including the public, journalists, researchers, and Congress. As a researcher at the University of Nebraska, I developed grant proposals and conducted research aimed at applying deep learning to U.S. national security issues. Currently, as a freelance data scientist, I have been involved in projects using Bayesian analysis and deep learning. I transitioned into data science because of intense interest and my personal situation. EDUCATION

•University of Michigan, Political Science; Ph.D. 1995. Renowned quantitative program. Coursework in game theory, computational modeling, econometrics, statistics, causal modeling

•University of Michigan, Institute of Public Policy Studies; Master of Public Policy, 1986. Coursework in macroeconomic modeling, econometrics, linear algebra, statistics

•University of Chicago, Bachelor of Arts in Political Science, 1983. Coursework in honors calculus and statistics

WORK EXPERIENCE

Freelance Data Scientist (1/2020-Present; 40+ hours / week)

•Client: The Lubbock Freedom to Read Coalition

- Combined multiple book banning lists, using my own NLP-inspired Python algorithm for approximate textual matching with cosine similarity and a probability-based surprisingness. Created a predictive subset list. Extensive application of Pandas and Numpy for data wrangling and data engineering.

- Analyzed data on the predictive subset from a local school district to discover hidden book censorship using a hierarchical Bayesian model. Given the data distribution, a zero-inflated negative binomial mixed effects model was appropriate. The model was estimated in both Stan and R (lme4). Model selection and out-of-set predictive accuracy were assessed through Bayesian approximate leave one out cross-validation (using posterior pdf) and Bayesian and ML k-fold cross-validation. Model fit was assessed with fit statistics and posterior predictive distribution checks (histogram, quantile-quantile graph, residual checks, Kolmogorov-Smirnov statistic).

- Currently writing an accessible report on insights from the analyses

•Client: The Lubbock County Democratic Party

- Developed a predictive model of turnout and partisanship using a novel Poisson binomial model that utilizes Census and precinct voting data and surmounts the ecological inference problem

- Conceived a hierarchical Bayesian model for estimation but Stan and NumPyro did not have the Poisson binomial distribution (that took too long to program), so defaulted to maximum likelihood in R

- With the new availability of the needed distribution in Stan, plan to run a hierarchical Bayesian predictive model for the upcoming elections, using the Bayesian posterior distribution from the 2020 election, plus current population demographics to predict turnout and partisanship and identify targets for a campaign

•Client: The Canadian-Muslim Vote (organization)

- Designed a large language model project to identify anti-Muslim bias in communications from Parliament members (waiting for data)

•Client: Texas Democratic Women, South Plains

- Developed a Lubbock City Council redistricting plan to set equitability standard

- Used Census and city demographic, voting, and geometry data and a cutting-edge sequential Monte Carlo algorithm for plan generation (R)

- Presented the redistricting plan to City Council

•Participated in the 2023 Kaggle AI natural language processing competition

(Python, PyTorch, PyTorch Lightning, data engineering with Pandas and Numpy, TensorBoard, Optuna, FAISS: AI Similarity Search, Pytorch Profiler, Jupyter Notebooks, GitHub, remote Kaggle kernel, GPU / TPU training, VS Code)

•Experimented with Pyro variational Bayes deep learning Courtesy Research Associate Professor / Research Manager, Public Policy Center, University of Nebraska at Lincoln (9/2018 to 12/2020; 40+ hours / week)

•Focus on the application of machine learning / data mining methods (topic modeling cluster analysis; using R, Python, and HLTA in Java), knowledge graphs and deep learning language models (PyTorch) to issues of U.S. national security

•Primary author of multiple grant proposals to such funders as NSF and DARPA

•Funded projects such as: Understanding and Limiting the Influence of Extremist Social Media Propaganda: A Multidisciplinary Approach and the NCSC (National Center for State Courts) Public Engagement Pilot Projects Senior Science Resources Analyst, National Center for Science and Engineering Statistics (NCSES), National Science Foundation (9/2013 to 8/2018; 40+ hours / week)

•For research projects, applied JAGS (similar to Stan) to hidden Markov modeling and Bayesian modeling with posterior predictive checks and model selection (see above regarding methods)

•Utilized natural language processing and machine learning techniques (latent Dirichlet allocation) to help construct an index and topical links for the 1000+ page Science and Engineering Indicators (SEI) volume (Python, NLTK, R)

•Conducted original analyses of the variance of complex survey data (R, own code)

•Technical expert for bibliometrics contracting (BaseX XML database)

•U.S. federal agency representative to the OECD Global Science Forum expert group on social research ethics and 'Big Data'

•Worked smoothly with teams to write the reports Science and Engineering Indicators and Women, Minorities, and Persons with Disabilities in Science

•Answered questions and addressed concerns from the National Science Board, academics, journalists, Congressional staff, and the general public

•Advised on survey experiments regarding public knowledge of science Program Director, National Science Foundation (NSF), Division of Social and Economic Sciences (SES), Directorate for Social, Behavioral, and Economic Sciences (SBE),

(9/2011 to 9/2013; 40+ hours / week); including Director, Secure and Trustworthy Cyberspace Program; Director, Building Community and Capacity for Data Intensive Research Program; and Director, Political Science Program

•General Responsibilities: Worked with program director teams to develop and direct multiple programs, authored solicitations and management plans, developed two interdisciplinary research communities, communication with research communities, interagency coordination, research community point of contact, New Awards Presentations to NSF officials, supervision and leadership of peer review processes, funding decisions, supervision of multiple NSF staff members Director, Center for Communication Research, Texas Tech University (9/2008 to 9/2011)

•Developed original methods to analyze textual data (NLTK, Python, Perl)

•Worked extensively with SQL and Postgresql for an online survey tool

•Deployed a range of social research tools: online surveys, online text scrapping, focus groups; computerized media experiments, eye tracking, continuous response

•Grant and commercial survey and focus group research Research Assistant Professor, Texas Tech University (9/2007 to 9/2011; 40+ hrs/wk)

•Advised faculty and collaborators on and helped conduct hierarchical mixed effects Bayesian / ML models and analyzed complex survey data

•Primary Principal Investigator, "Collaborative Research: Deliberative E- Rulemaking Decision Facilitation Project (DeER)," a $450,000 National Science Foundation (NSF) funded project. The project developed and tested the benefits an artificial deliberation facilitation agent, using natural language processing.

•Co-Principal Investigator on additional research projects amounting to $1.7 million

•Taught Mass Communications Research Methods, a course covering statistics and social research methods, particularly surveys and experiments Visiting Assistant Professor of Political Science and Senior Policy Fellow, Texas Tech University (9/2006 to 9/2007; ; 40+ hours / week)

• Taught research methods and statistics to undergraduate and graduate students Visiting Scholar, University of Pittsburgh (9/2005 to 9/2006; ; 40+ hours / week) Research Director, E-Governance and Engagement, Institute for the Study of Information Technology and Society (InSITeS), Carnegie Mellon University (9/2000 to 9/2005; 40+)

•Head of software analysis and design on the $2.1 million VAProject

•Developed my own online survey system in JSP years before SurveyMonkey

•Sole social scientist and chief research administrator on the VAProject

•Developed new statistical methods for analyzing response latency

•Designed and tested new survey instruments for assessing causes of political apathy

•Taught courses in information systems management and e-democracy Assistant Professor of Political Science, Social and Decision Sciences Department, Carnegie Mellon University (9/1995 to 9/2000; 40+ hours / week)

•Taught core and advanced courses in software application design and development, systems analysis, and database management

•Taught Visual Basic, HTML, Javascript, Active Server Pages, Microsoft Access, Structured Query Language (SQL), and Unified Modeling Language Research Consultant—Olszak Management Consulting Inc. (4-6/1999; 20 hours / week)

•Designed questionnaire and complex, multistage sampling plan with bootstrapping

•Planned statistical analyses and final reports

Computer Staff, Inter-university Consortium for Political and Social Research, ICPSR (5- 8/1994; 40+ hours / week)

•Assisted ICPSR participants with statistical software, statistics consulting Market Research Consultant—Demand Research Incorporated (3-5/1994; 20+ hrs/wk)

•Factor and path analysis of market research data

•Developed data manipulation software

Statistical Consultant—University of Michigan Medical School Summer Program

(2/1993 to 5/1994; 20+ hrs/wk)

Journalist—The Recorder, San Francisco (10/1983 to 9/1984; 40+ hrs/wk) Author of 30 peer-reviewed academic papers and 90+ newspaper articles METHODOLOGICAL SKILLS

MACHINE LEARNING METHODS AND STATISTICS

Data mining Cluster and factor analysis

Decision trees, random forests, XGBoost Nonparametric processes Deep learning / machine learning Reinforcement learning Graph neural networks Network analysis

Text analytics Language models

Topic models Data visualization

Ordinary and generalized linear models Bayesian methods Econometrics Mixed effects models

Classifiers Model selection

Bootstrapping Dynamic Structural Equation Modeling DEEP LEARNING FRAMEWORKS

Pyro, PyTorch, PyTorch Lightning, TensorFlow,

NATURAL LANGUAGE PROCESSING / UNDERSTANDING

NLP FRAMEWORKS: NLTK, Stanza, SpaCy

SEMANTIC PARSING AND REPRESENTATION: Minimal Recursion Semantics, AMR, Knowledge Graphs, Hypergraphs, DeepWalk

TOPIC MODELING: BERTopic, LDA, HLTA, LSA, HDP-LDA

DEEP LEARNING NLU ARCHITECTURES: RNN, LSTM, GRU, CNN, Transformers NLP / NLU TASKS: sentiment analysis, question-answer systems, summarization, translation, semantic role labeling, relation extraction, semantic similarity, semantic and syntactic parsing, topic modeling

DATA SCIENCE TOOLS

IBM Cloud, SPSS Modeler, Watson Studio, Data Distillery, Jupyter Notebook, Jupyter Lab, Git, GitHub, Kaggle, Google Collab

STATISTICAL SOFTWARE

R, STATA, Stan, JAGS, SPSS, SAS, NumPyro, Pyro, Numpy, Pandas COMPUTER LANGUAGES AND SOFTWARE

Python, Java, C++, Pascal, Perl, Pandas, Numpy, TensorBoard, Optuna, FAISS,VS Code R data visualization, Matplotlib, Seaborn, Folium, Plotly, Dash, Structured Query Language (SQL), Postgresql and Microsoft Access Databases, BaseX, R Optimization SOFTWARE ANALYSIS AND DESIGN

Familiar with full lifecycle software analysis and design methods: business requirements gathering, data exploration and visualization, data preparation, modeling, evaluation

(validation and testing), deployment, and model maintenance. Agile development— iterative development with repeated outputs and customer feedback Unified Modeling Language (UML)

Led two software development teams at Carnegie Mellon Univ. to develop a web-based survey system and a real-time and asynchronous audio online media, years before such software became commercially available

OPERATING SYSTEMS

Linux / Unix (multiple flavors), Mac OS, Windows

CERTIFICATIONS AND RELATED

IBM Data Science Professional Certificate

Machine Learning Specialization Certificate, Stanford University / DeepLearning.AI fast.ai Practical Deep Learning course (self-taught) University of Amsterdam Deep Learning course (self-taught) Kaggle 2023 deep learning NLP competition (self-taught)

Contact this candidate