Sree Vyshnavi V ********************@*****.***
EDUCATION
University of Missouri, Kansas City, Computer Science Major GPA: 3.98/4.0 Kansas, MO Master of Science in Computer Science with Data Science Option Dec 2022 Relevant Coursework: Design and Analysis of Algorithms Machine Learning AI Natural Language Processing Principals of Big Data Management Python programming Data Visualization Software Engineer Java PROFESSIONAL SUMMARY
3+ years of experience in designing, developing, and optimizing ETL pipelines, big data solutions, and cloud-based data architectures (AWS, GCP, Azure).
Proficient in applying machine learning algorithms and techniques, including Convolutional Neural Networks
(CNN), NLP, and predictive analytics. Developed and implemented machine learning models using Python libraries (Scikit-learn, TensorFlow, PyTorch) for classification, regression, and text analysis tasks
Strong analytical capabilities using Python, SQL, R, and Excel to collect, preprocess, and analyze large datasets
Experience in creating interactive dashboards and data visualizations using Power BI, Tableau
Hands-on experience with AWS (EC2, S3, Redshift, Glue, CloudFormation), GCP (Big Query, Cloud Storage), and Hadoop-based ecosystems (EMR, HDFS, Hive, Sqoop). Automated ETL workflows and implemented data warehousing solutions for high-performance analytics WORK EXPERIENCE
ETL Software Engineer/Analyst at Neni Techsystems Feb 2024 – present
Designed and implemented an ETL data pipeline using AWS S3, EMR, Spark, Hive, and Hadoop to process large-scale datasets efficiently.
Developed PySpark-based transformation scripts to clean, aggregate, and analyze raw data stored in S3, optimizing processing time by 30%.
Automated data ingestion workflows using Apache Spark on EMR, reducing manual effort and enhancing data processing speed.
Automated data extraction from Hive, transforming Parquet/CSV data, and loading it into Snowflake stages S3 for efficient ingestion.
Utilized Snow SQL COPY INTO commands to ingest large datasets into Snowflake, ensuring optimized batch loading and schema consistency.
Ensured data integrity and schema compatibility between Hive and Snowflake, resolving type mismatches and optimizing performance.
Utilized SQL scripting to query, clean, and manipulate data for reporting and dashboard creation.
Assisted in data validation and quality checks using RDBMS tools such as MySQL, PostgreSQL, or SQL Server. Data Engineer/Analyst at CVS Health Nov 2023 – Jan 2024
Built and maintained scalable, distributed data pipelines using Pyspark and Apache Airflow on Google Cloud Platform (GCP).
Designed and managed data warehousing solutions on Big Query, enabling real-time analytics and BI reporting.
Migrated Hadoop-based workloads to GCP, optimizing query performance by 40% and reducing costs by 25%.
Automated ETL workflows to process terabytes of structured and unstructured data for business insights.
Implemented and managed data warehouse solutions on GCP, including Big Query, to support business intelligence (BI) and analytics applications, leveraging Python for integration and automation
Performed data migration from on-premises Hadoop clusters to GCP Big Query using automated tools, Pyspark scripts, and Python, ensuring seamless and efficient data transfer Python Developer at Cloud Revolute May 2023 – Sept 2023
Developed Convolutional Neural Networks (CNNs) for image classification, leveraging TensorFlow & PyTorch.
Designed NLP pipelines for text classification, sentiment analysis, and chatbot development using OpenAI's GPT model.
Engineered Python-based data processing scripts, improving data ingestion efficiency by 35%.
Conducted feature engineering, text preprocessing (tokenization, lemmatization), and vectorization to enhance ML models
Implemented Python scripts to process datasets and classify images using CNNs, leveraging frameworks such as TensorFlow or PyTorch for deep learning model development
Conducted text preprocessing (tokenization, stop word removal, lemmatization) and utilized the text-davinci- 003 model via OpenAI's API to generate text completions based on custom prompts for natural language processing tasks
University of Missouri, Kansas City Aug 2021 – Dec 2022
Designed and implemented a data pipeline integrating data from Snowflake, Web API, and AWS S3 into a centralized AWS S3 storage.
Developed three Apache Spark jobs in Python to extract, transform, and load (ETL) data from each source.
Built a main Spark job to orchestrate data ingestion, ensuring smooth data transfer to AWS S3.
Automated the data pipeline using Apache Airflow to trigger ETL processes upon new data availability.
Ensured fault tolerance & scalability by deploying Spark jobs on AWS EMR clusters.
Integrated AWS Lambda & Event Bridge to trigger data ingestion when new data arrives in Snowflake or S3.
Plot temperature trends over time using ggplot2 and created dashboards with shiny.
Build an end-to-end ML pipeline using R libraries like caret, mlr3, or tidy models.
Used Latent Dirichlet Allocation (LDA) to discover topics in large text corpora.
Worked on Customer Segmentation using Clustering in R. Programming Languages Python, Java, C
Other skills Pandas, NumPy, Spark, Pyspark, Hive, HDFS, Cloudera, AWS, GCP, Azure Excel (Pivot tables, Macros, Toggles, Multiple response) Power BI (DAX, Charts, Reports, Dashboards)
Certificates: Python Developer, Getting started with AWS Machine Learning, Using Databases with Python, Certified Ethical Hacker, Data Structures and algorithms, Python for Data Science