Irina Max, Data Scientist/Economist
Mountain View, CA 650-***-**** E-mail: i *********@*****.***
Strong focus on R and Python Statistical Analysis with ML techniques in challenging environments. Summary:
● Experienced in data analysis, designing experiments, interpreting data behavior, business decision support.
● Experienced data scientist with 5+ years of industry experience with hands on experience in developing machine learning models.
● Master’s degree in Economics with solid statistical background and recently updated skills in Data Science, Statistical Learning and ML Stanford University. Technology-savvy and mathematically-equipped professional with experience in diverse set of procedures but not limited to methods such as machine learning, deep learning, regressions, GBM, cluster analysis, PCA, decision trees, time series, resampling and regularization, LASSO, NLP, CNN, XGBoost and other techniques.
● Experience in latest Big data technologies like Spark, Hadoop, Cloudera, Impala, HBase, Hive, Cassandra with ETL, NoSQL, N1QL.
● Ability to provide wing-to-wing analytic support including pulling data, preparing analysis, interpreting data, making strategic recommendations and presenting to client/product teams.
● Excellent visual representation of data and communicating analysis with R, Tableau, WEKA, MOA to all levels of business users within the organization.
● Strong background in statistics, economic analysis and customer intelligence. Languages and platforms:
● R, Python, Numpy, Java, C#, SQL, JavaScript, Tableau, JSON, Spark, XML/HTML, AWS, ES2, SparkR SQLContext, NLP
● SPSS, SAS, Matlab, Kernel, Unix, Linux, OSX, MS Windows, E xcel,CSS, Docker, Hadoop, VM, .NET, H2O, MOA, WEKA
● MySQL, PHP, Oracle, PostgreSQL, SQLite, NoSQL, JDBC, OLAP, PCA, CNN, ETL, Zeppelin, SAP HANA Vora, Couchbase Experience:
2017 - Present: SanDisk a Western Digital brand, Data Scientist, Big Data team.
● Machine learning models of Apple memory, BICS3 reliability, die sort parameters tuning(LASSO, XGBoost, Ranger)
● ML performance a deep analysis of the HTPD/RTPD/LTPD test data to define a model of FBC growth rate across the temperature.
● Memory Technology, experimenting statistical analysis for selecting and optimising elements of memory.
● ML models for projection pre-production SLC, MLC, TLC single and multi die packages ICC memory characteristic with and without PPM .
h ttps://github.com/IrinaMax/ICC-characterization-/blob/master/BU_B2_fileGenerator.R 2012-2016: MIM consulting, Data Scientist, ML experimenting Consultant
● Time series outlier detection with Arima package, SQLContext on SparkR, MPP with MOA interacting with R and ML techniques plus forecasting and visualization. Automatic Anomaly Detection in Time series Using Seasonal Hybrid ESD Test for model CPU utilisation.
h ttps://github.com/IrinaMax/CPU-usage-and-anomaly-detection
● Natural Language Processing of article “How Presidential Elections Affect the Markets” on R and openNLP.
h ttps://github.com/IrinaMax/NLP-of-article-How-Presidential-Elections-Affect-the-Markets-
● Web scraping on R and ‘rvest’ library with SelectorGadget and elements of CSS https://github.com/IrinaMax/Web_scraping_with_R
● Exploratory with ML techniques, Domestic Violence in California partnership 2005-2014 for WEAVE human trafficking survivors organization.
h ttps://github.com/IrinaMax/Domestic-Violence-in-CA
● Optimisation of statistical analysis and forecasting using Lasso, Cross Validation, Random forest, Log regression and prediction Tree based ML models with in R for diabetes incidences on patient cohorts.
h ttps://github.com/IrinaMax/Statistica_of_diabetes
● Logarithmic means with RMSLE approach and product clustering algorithm for Grupo Bimbo bakery product, Kaggle competition data set, top 26%
h ttps://github.com/IrinaMax/Grupo-Binbo_LogMeans/blob/master/console
● Hadoop academic project with WWC b ig data processing modules (MapReduce, YARN, Pig, Spark and Impala) and data analysis using ETL, Spark streaming/Hadoop Streaming
● Clustering, classification, PCA and CHAID models for Parkinson’s Telemonitoring Data Set, private project for US Davis Medical Group.
● SVD, PCA, clustering analysis as ML implementation with visualization with ggplot2 library and recommender system for unsupervised Expedia destination data set, “Expedia hotel research” Kaggle, Kernels contributor https://www.kaggle.com/irinamaxds/expedia-hotel-recommendations/irina
● Analyzed and managed marketing data to maximize pricing margins for “ Beauty supply” women's Center using SQL and R.
● Implemented statistical ML techniques Exponential moving averages and logistic regression algorithm with TTR package to forecast auto prices “Sundays motors” cardilership.
● Research and SWOT analysis as structure planning method of st rengths, weaknesses, opportunities, and threats for “E&M enterprises” for strategy, developing and best business decisions. 2007-2010: Analyst, W inrock International (USAID) W ater Users Association Support Program, Uzbekistan-US
● Modified a > 400MB financial database for a 25 million dollar American aid program.
● Implemented and analyzed financial models to distribute funds using Excel.
● Analyzed time-series data from over 50,000 water users examining water use, crop yields, equity of water distribution, and farmer satisfaction with program.
● Tracked and carry out of data for distribution of funds to cooperating stakeholders. 2004-2007: Database Administrator, U zbekistan Govt. State Pharmaceutical Center.
● Lead ETL and security for 50+ disparate databases
● Designed, implemented and administrated SQL custom database for the national pharmaceutical center to track the receipt and distribution of medicines .
● Managed inputs and updates to medical databases; debug designed transactions; planned database backups and restores; maintained data storage; managed and trained employees. 2000-2004: Data Security Analyst, C entral Bank Uzbekistan
● 100+ programming, data security, and financial projects
● Lead projects, verified account balances, generated electronic signatures, performed encryption and data security tasks. Increased transactions speed by implementing new security approach. Trained staff. Education:
● MS and BS in E conomics, T ashkent State University of Economics, GPA 3.9
● Machine Learning, Stanford University
● Statistical Learning, Stanford University Lagunita
● KIx: KiexploRX, Explore Statistics with R, EDX
● Computer Science and Programming Using Python, M ITx: 6.00.1x
● Databases. OLAP. Stanford University
● SQL. V iews and Authorization. Stanford University
● Constraint and Triggers. S tanford University
● Xpath and XQ uery. S tanford University
● Indexes and transactions. Stanford University
● CS and Analytics, S tanford University
● Techniques and concept of big Data, L inda.com
● Data Science and Engineering with Spark XSeries, Berkeley
● Hadoop administration course, WWC and Coursera.
● Spark SQL training workshop, Woman in Big Data workshop, Databricks
● Intermediate Data Science class, W oman in Big Data workshop, Hortonworks
● Data Science and Machine Learning Bootcamp with R, U demy
● SAP HANA Vora training, W oman in Big Data workshop, SAP
● Data Mining With WEKA, T he University Of Wa ikato
● https://github.com/IrinaMax?tab=repositories