Machine Learning Data Engineer

Location:

Manhattan, NY, 10007

Posted:

July 08, 2025

Contact this candidate

Resume:

Jaswanth Kolisetty

551-***-**** ******************@*****.*** LinkedIn Github

Professional Summary

Software & ML Engineer with 2+ years of experience delivering scalable, cloud-native platforms for machine learning, time series analytics, and LLM-powered applications. Proven track record in building ML pipelines, microservices, and backend APIs using Python, FastAPI, Spark, and Docker. Adept at deploying production-grade systems on AWS and GCP. Collaborative, product-minded, and ready to own the full stack from data ingestion to model delivery and analytics UI. Education

Stevens Institute of Technology August 2023 – December 2024 Master of Science in Computer Science (GPA: 3.9 / 4.00) Hoboken, USA

• Relevant Coursework: Cloud Computing, Database Management Systems, Agile Methods, Knowledge Discovery and Data Mining SRM University July 2019 – June 2023

Bachelor of Technology in Computer Science (GPA: 3.8 / 4.00) Guntur, India

• Relevant Coursework: Data Science, Web Mining, Database Management Systems, Information Retrieval, Java Programming Technical Skills

Programming & Databases: Python, Java, R, SQL, Spark SQL, PostgreSQL, NoSQL, MongoDB Data Engineering: PySpark, Kafka, Spark, Hadoop, Airﬂow, ETL Pipelines, Data Modeling, Snowﬂake, Parquet, Avro Cloud & DevOps:AWS (EC2, MSK, Lambda, ECS, S3, RDS, SNS, SQS), GCP, Docker, Kubernetes, Terraform, CI/CD (GitHub Actions), Git Machine Learning: Scikit-learn, TensorFlow, PyTorch, BiomedCLIP, MLﬂow, Sagemaker, Vertex AI, Model Monitoring NLP & GenAIs: LLM Fine-tuning, LangChain, HuggingFace, RAG, spaCy, OCR-to-NLP, Prompt Engineering, Sentence Transformers Backend/API: FastAPI, Flask, RESTful APIs, GraphQL, WebSockets, Microservices Analytics & Insights: Statistical Modeling, A/B Testing, Power BI, Data Visualization, Business Intelligence Tools Experience

Trigyan February 2024 – Present

AI/ML & Data Engineer New Jersey, USA

• Designed 5+ production-grade LLM + RAG systems using LangChain, FAISS, and GraphDB; deployed using FastAPI and GraphQL.

• Engineered MLOps pipelines with MLﬂow to automate deployment on AWS ECS, reducing model downtime by 40%.

• Built a lung cancer segmentation and classiﬁcation model using PyTorch, achieving 90% accuracy across CT/MRI data.

• Integrated PostgreSQL and S3 using Airﬂow-based ETL orchestration, enabling near real-time reporting pipelines.

• Architected data models ensuring schema consistency and resilience for analytics across multiple departments. SRM University January 2023 – June 2023

Research Analyst Guntur, India

• Designed a security model for Hadoop using Hyperledger Fabric, enabling immutable audit trails and enhancing HDFS integrity.

• Engineered a distributed Spark-based implementation of the Apriori algorithm on Hadoop to analyze seasonal retail trends.

• Processed over 20M+ transaction logs using PySpark and Created interactive Power BI dashboards to translate insights.

• Authored two IEEE papers on data mining and Hadoop architectures showcasing expertise in large-scale data analytics. 360 Research Foundation January 2022 – December 2022 Data Engineer Intern Guntur, India

• Designed AWS serverless pipelines using Lambda and Step Functions to process large-scale clinical data sets.

• Developed FastAPI-based ETL services with RBAC, retry logic, and S3-based ﬁle management for healthcare analytics.

• Achieved 60% eﬃciency gain in data ingestion through automated cleansing and transformation pipelines. Projects

RAG Analytics Assistant FastAPI, LangChain, HuggingFace, Transformers, Google Compute Engine

• Created a GenAI-powered document assistant integrating LLaMA3-8B via RAG for context-aware Q&A, deployed using Ollama on GCP Compute Engine for hardware-accelerated inference.

• Enabled fallback logic and semantic search with sentence-transformers, improving response accuracy by 30%. Big Data & ETL Pipeline Spark, Hadoop, PySpark, PostgreSQL, S3

• Built scalable data pipelines for 100M+ records using PySpark and deployed on AWS.

• Automated data ingestion and transformation, cutting latency for BI teams by 50%. Document OCR & Key-Value Extraction doctr, PyTorch, FastAPI, Tesseract, spaCy

• Built an OCR pipeline using doctr backed by PyTorch to extract structured ﬁelds from scanned clinical forms and IDs

• Integrated BiomedParse LLM as a fallback for low-conﬁdence OCR output, enabling accurate recovery of biomedical entities Certiﬁcations

• AWS Certiﬁed Solutions Architect – Associate — Mar 2025 to Mar 2028

• LangChain: Chat with Your Data — DeepLearning.AI — Feb 2025

• Multimodal LLaMA 3.2 — DeepLearning.AI — Jan 2025

Contact this candidate