South Bend, IN *****, USA
Phone: 574-***-****
(Please leave a voicemail)
Email: *********@*******.***
Shane Diana May
ABOUT ME
I have a Master's of Science degree in Applied Mathematics & Computer Science from Indiana University, awarded in December 2023. My expertise includes AI research, including transfer learning and image processing utilizing frameworks such as PySpark, PyTorch, TensorFlow, Keras, and Scikit-Learn. I have experience working with natural language processing (NLP) and processing extensive datasets as well as batching and pooling techniques, using Python, R, C, and SQL. Likewise, I’m proficient with programming environments like Microsoft Visual Studio/Visual Studio Code, NetBeans, Google Colab, Jupyter Notebooks, Docker, and Kaggle. I’ve worked with cloud-based languages and distributed systems, like Apache Spark, including SparkML and SparkSQL, and I have hands-on experience in parallel computing with C and Python, utilizing MPI for high-performance, cluster-based communication and OpenMP for efficient multithreading and internal core-level computation.
In addition to my AI and data science experience, of course, I have plenty of foundational experience with object oriented programming languages, such as C++, Java, and Python. My full-stack development experience includes working with front-end technologies like JavaScript (and frameworks such as React and Vue.js), HTML/CSS, and backend tools like SQL, Python (with Flask and Django), as well as Dart/Flutter for cross-platform application development. My statistical analysis expertise includes R in RStudio, alongside Python’s data science stack (NumPy, pandas, Matplotlib, and Seaborn). I’m equally skilled as a full-stack engineer and as a data engineer, capable of building, deploying, and optimizing applications and pipelines across the stack. EDUCATION
Indiana University — Master's of Science in Applied Mathematics & Computer Science
August 2020 - December 2023
AI Research
● Transfer Learning Research: Published research on transfer learning with ResNet50 in Python using PyTorch, NumPy, Scikit-Image, SciPy, TensorFlow, and Scikit-Learn.
● Machine Learning & Data Mining: Extensive experience in NLP and data mining with modern techniques in Python and R.
Programming Languages
● OOP: Proficient in object-oriented languages: C++, Python, Java, .Net, Node, Visual Studio Code.
● Functional Programming: Familiar with functional languages: LISP, Racket.
● Video Game Development: Developed 2D and 3D games in Unity using C#, including a 2D side stroller spaceship game, a 3D maze game with path-finding algorithms
● Front End Development: Web development skills in HTML, CSS, JavaScript, jQuery, Flutter & Dart.
● UI Design: Utilized Flutter with Dart for modern, widget-based UI development.
Mock Job Listing App
● Flutter: Developed using Flutter (Dart) with a modular, widget-based approach.
● REST: Made asynchronous API requests to RESTful endpoints from Indeed and LinkedIn.
● API’s: Integrated multiple APIs including Google Maps, LinkedIn, Indeed, Weather.
Data Mining & Machine Learning
● Large Scale Datasets: Managed large datasets and dataframes using Python, R, SQL.
● Team Collaboration: Utilized Google Colab and Kaggle for GPU-accelerated research.
● Statistical Analytical Skills: Programmed with the R statistical language in R Studio using Weka, a free open source machine learning & data mining toolkit.
Parallel Computing
● MPI: Applied parallel computing techniques with MPI for cluster-based communication, intercommunication across a network of connected computers, in both C and Python.
● OpenMP: Applied parallel computing techniques using OpenMP (shared memory core-based communication) in both C and Python.
● Parallel Programming Research: Conducted Research using both MPI and OpenMP in both C and Python for optimization and calculations in efficiency.
Distributed Data Processing
● PySpark: Proficient in cloud computing using SparkML, SparkSQL with PySpark for distributed data processing.
● Data Analysis: Utilized machine learning algorithms using the Spark framework for business insights and data driven decision making. Networking and Cybersecurity
● Docker: Utilized Linux (Ubuntu) with Docker and containers for simulating cyber attacks.
● Protocols: Experienced in TCP and UDP protocols, socket communication, and port management.
● Firewalls: Constructed test firewalls and explored networking protocols.
● Research: Conducted Research combining Machine Learning & Cybersecurity Methods
Future Goals
● Visual Analytics: Mastering PowerBI and Tableau for visual data mining.
● Technical Skills: AWS, AWS Glue, Customer & Stakeholder Meetings, Data Engineering, Data Modeling, EMR, ETL Pipeline, Graph Databases, Lambda, Natural Language Processing, Object Storage, Redshift, SQL, NoSQL, Non-Relational Databases. AWS PostgreSQL RDS, AWS Lambda, AWS API Gateway, AWS CloudFormation, ElasticSearch, Atlassian Suite (Jira, Bitbucket, Confluence), Angular, Geospatial technologies
● Organizational Business Skills: Coaching & Mentoring, Soft Skills, Data Driven Problem Solving.
● Big Data Tools: Experience with SQL and AWS technologies like Redshift, S3, AWS Glue, EMR, Kinesis, FireHose, Lambda, and IAM roles and permissions. Kubernetes, Argo, Docker, Ansible, Linux (CentOS and Red Hat), BASH
● API Requests: ReadyAPI, SoapUI, LoadUI
● DBMS: Experience with non-relational databases / data stores (object storage, document or key-value stores, graph databases, column-family databases).
● ETL Pipelines: Design, build, and maintain scalable data pipelines and infrastructure for large-scale data processing and analytics using technologies such as Hadoop, Spark, distributed event store and stream-processing platform, in-memory databases, and other big data tools.
● Event-Streaming: Build large-scale distributed event streaming platforms such as Apache Kafka and Google Cloud Pub/Sub, for real-time data processing pipelines and near-real-time (NRT) streaming.
● Data Lakes: Design and implement data storage solutions using Data Lakes and NoSQL databases to support high-volume and high-velocity data processing.
● Stakeholders: Work with data scientists, analysts, and other stakeholders to understand data requirements and develop solutions that meet their needs.
● Cloud Platform Optimization: Optimize data processing workflows using managed services provided by cloud platforms. Eclipse IDE, Spring, Maven/Gradle, Jenkins
● Documentation: Create and maintain documentation and best practices for data engineering processes and systems.
● ETL Pipelines: Develop and maintain data pipelines to extract, transform, and load data from various sources into the data visualization tool.
● Dashboards: Build and maintain dashboards and reports to provide insights into business performance and trends.
● Data Engineering Coding: Programming experience with Spark Scala, Python, Big Query and GCP Platform, Hive, and Hudi, Unix Shell Scripting, SAP Hana, NoSQL database Cassandra, Data Lake, Event Streaming Platform Kafka.
● Data Engineer System Design: Experience with system design, ETL pipeline design, data modeling, and Schedulers including Automic and Airflow.
● Azure: Experience in data engineering and cloud data processing services
(Azure Cloud Platform)
● ETL Tools: Hands on Experience in Data Warehousing & ETL tools, such as Hadoop, Spark, Databricks, Snowflake, SQL, Python/PySpark, Oracle PL/SQL, Java
● Data Modeling: Performance/security-oriented data modeling and data design, including partitions, different type of indexes, views, and most effective data management practices, including Azure Synapse Analytics, Azure DevOps, Purview, Snowflake, Qlik, and Tableau EXPERIENCE
Healthcare Helper, Remote — Data Analyst (Unpaid Internship) August 2024 - Still Employed
● Docker: Backend Local Runtime using Docker with Jupyter Notebooks/Google Colab
● Large Datasets: Large Medicare datasets for data analysis with dataframes
● Spark: SparkSQL dataframes and RDD's for parallel processing large healthcare datasets
● Web-Scraping: I built a web-scraper for retrieving live data and statistics, including population demographics
● Excel Spreadsheets: Automated Excel Workbooks creates from the web-scraping program, to make graphs and tables
● Machine Learning Models: Mathematics & Statistics, Machine Learning algorithms for data classification (good medicare hospitals) Indiana Toll Road, Elkhart, IN — IT Help Desk Intern May 2021 - June 2022
● Administrative Organizational Skills: Microsoft Office 365 Administrator
(Outlook email group policies and general tech support), Azure Active Directory maintenance and account management
● Technical Support: Database management, VPN tech support with Okta, remote desktop & remote server maintenance & troubleshooting
● Customer Service Excellence: Delivered personalized customer service, addressing inquiries and resolving issues to ensure a positive shopping experience.
● Problem Solving & Adaptability: Applied critical thinking and data analysis skills to troubleshoot inventory discrepancies and customer complaints, leading to timely resolutions.
Disc Replay, Mishawaka, IN — Team Member
June 2020 - September 2020
● Data Management: Leveraged a relational database to track inventory, sales, and customer preferences, ensuring accurate and up-to-date records.
● Point-of-Sale (POS) Expertise: Processed transactions efficiently using cashier systems, maintaining a high level of accuracy and speed in handling cash, credit, and digital payments.
● Customer Relationship Management: Utilized data insights to recommend products, enhancing customer satisfaction and driving repeat business.
● Interpersonal Communication: Collaborated with team members to create a positive store environment, facilitating smooth operations and effective problem-solving.
● Team Collaboration: Worked closely with colleagues to manage inventory, organize promotional events, and resolve customer issues, contributing to overall team success.
● Data-Driven Decision Making: Analyzed sales data to identify trends and optimize product placement, improving sales performance. Chick-Fil-A, Mishawaka, IN — Front of House Team Member November 2018 - March 2019
● Point-of-Sale (POS) Expertise
● Customer Relationship Management
Target, Granger, IN — Seasonal Cashier
November 2017 - March 2018
● Point-of-Sale (POS) Expertise
● Customer Relationship Management
CVS, Granger, IN — Front End Cashier
June 2014 - January 2015
● Point-of-Sale (POS) Expertise
● Customer Relationship Management
Walmart, Mishawaka, IN — Lawn & Garden Seasonal Associate, Front End Cashier
April 2013 - March 2014
● Point-of-Sale (POS) Expertise
● Customer Relationship Management