Applied Machine Learning, Computer Science

Location:

Plainsboro, NJ

Posted:

June 19, 2025

Contact this candidate

Resume:

Srijinesh Alanka

*********.******@*****.*** +(1-732-***-**** linkedin.com/in/srijinesh-alanka-186511240

SUMMARY

An aspiring Computer Science graduate and an AWS certified professional with a well-versed background in the fundamentals of programming and problem solving. Having two years of experience in the field of data engineering, pipeline construction and analysis using python and cloud-based frameworks like AWS, I seek a challenging position in these domains to strengthen my expertise and learn more about handling data efficiently and build efficient and scalable solutions to contribute to real world problems in the business and IT domain.

LICENSES & CERTIFICATIONS

AWS Certified Developer Associate (DVA-CO2) – Cred Id(4e45cc9a6c5c4cef84c9eb1340cc12cb)

EDUCATION

Master of Science, Applied Machine Learning, Computer Science

University of Maryland, College Park

Bachelor of Engineering, Computer Science

Chaitanya Bharati Institute of Technology

SKILLS

Programming Languages: C, Python, R, HTML, CSS, JavaScript

Database Management: Oracle SQL, MySQL, DynamoDB, RDS, MongoDB

Frameworks: Django, Flask, Apache Spark, LangChain

Data Visualization and Analytics: Matplotlib, PyTorch, NumPy, Tableau, PowerBI,

Data Engineering: DataBricks, Apache Airflow, AWS Glue, AWS Step Functions, Jenkins, GitHub Actions

Cloud Computing: Amazon Web Services (API Gateway, CloudFormation, Kinesis, CloudWatch, CloudTrail, SNS, SQS, Athena, Lambda, Boto3, Elastic Load Balancing, ECS, Autoscaling Groups)

Microsoft Azure (API Management, Azure Resource Manager, Event Hubs, Monitor, Activity Logs, Service Bus, Storage Queues, Azure Data Explorer, Azure Functions, Azure SDK for Python, Azure Load Balancer, Azure Container Instances, Virtual Machine Scale Sets)

EXPERIENCE

Lab Assistant Developer, University of Maryland, College Park Nov 2023 - Apr 2024

o Ingested various sources of healthcare data, APIs and datasets pertaining to scientific medical articles using python requests.

o Finalized AllenAI/Cord 19 dataset for Covid-19 articles, their summaries and full text, along with the PubMED dataset for healthcare research articles.

o Studied and utilized Hugging Face hub of LLMs to determine the best model for fine grain control and response to healthcare data.

o Performed tokenization operations, text cleaning, and embedding of summaries in dataset using SpaCy, NLTK, and sci-kit learn.

o Tested n-grams language modelling over sentence embeddings of training data, analyzed error in the model learning and extended token vocabulary with scientific and medical terminology.

o Committed to DistilBART, a medically finetuned LLM of BERT for text summarization and achieved a Rouge score of 0.35 by an increment of 25% over initial embedding models.

Cloud Data Engineer Assistant, University of Maryland, College Park May 2024 - May 2025

Cloud-Based Conversational News

o Spearheaded end-to-end development of scalable and serverless framework-based services using CloudFormation YAML configuration files.

o Harnessed AWS Sagemaker and Bedrock to apply foundational language models for question answering.

o Constructed an ETL pipeline to ingest data through batches from a public News API via Amazon API Gateway and Lambda function.

o Utilized AWS Glue jobs to improve data pipeline processing by 5 GB at 55% ingestion latency

Time Series Analysis

o Extracted batch data from public API and defined data stores (Oracle SQL) into a unified DataBricks platform as a Spark RDD.

o Performed time series forecasting, coded LSTM networks and applied random forest regression to predict earthquake trends within at 94% accuracy.

Image Data Crawling

o Overviewed the cleaning of 120 GB of image-based data in S3 flat storage. o Reduced operational overhead and cost by crawling S3 storage for metadata.

o Initialized AWS Glue Jobs for the automation of tasks involving filtering of job-specific image data based on matching metadata with image path splicing.

o Conducted secure copies of valid image data from redundant data to clean S3 bucket to ensure data deduplication and security.

o Reduced data latency and processing power with Glue Jobs by 40% and monitored health and metrics with CloudWatch Logs and CloudWatch Metrics.

PROJECTS

Textbook Based Retrieval Augmented Generation

o Used PDF Plumber to read and extract the content of a 753-page psychology textbook.

o Utilized the re module to clean the data and parse it to convert each page data into dictionary style key value pairs and further load it into a json file.

o Conducted data wrangling by cleaning up blank key value pairs and eliminating initial pages of table of contents, preface and the final references that might populate data in a bad way.

o Compared average length of text in every page and assigned hyperparameters of 800 length and 200 overlap for data chunking before embedding with Sentence MiniLLM transformers.

o Constructed FAISS vector store database for storing embeddings and constructed an RAG pipeline for retrieval and context generation using Langchain.

o Huggingface authenticated Mistral 7b LLM was chosen with token length of 512 to generate a smoother response for the context and performed with a BLEU score from 0.17 to 0.31.

Flood Prediction

o Performed data scrubbing and deduplication before using Python Seaborn to analyze the correlation in a dataset of recurring floods and related natural disasters.

o Integrated a random forest regressive model to provide stronger predictions and observe outlier metric values like the out-of-bag score. o Achieved an accuracy of 94.6% with random forest regressor by performing hyperparameter tuning over values like the n_estimators.

Bird Image Classification

o Performed various transformation techniques and exploratory methods to view the Caltech USCB Birds Dataset, using python pillow.

o Normalized and scaled images using PyTorch for maintaining similar metadata and image size for smooth model training.

o Compiled torch data loaders to streamline pipeline training of images to a simple CNN built from scratch.

o Conducted transfer learning in the event of a lack of training data before switching to a ImageNet pretrained over CIFAR 100 for sequential results to achieve 81% accuracy from 3 to 7 epochs.

Contact this candidate