Priyank Negi
857-***-**** *********@*****.*** priyank7n.com github.com/PriyanK7n linkedin.com/in/priyank7n/ SUMMARY
Skilled in data collection via web scraping, cleaning, modeling, transformation, exploratory analysis, predictive modeling & open-source AI/ML integration, with experience developing DevOps/MLOps/ETL Data workflows. Proficient in relational/non- relational databases, NLP, LLM, RAG, prompt engineering, data governance & reporting. Experienced in building CI/CD pipelines and Terraform cloud deployments focused on performance, privacy, and stakeholder requirements. Demonstrated exposure to cross-functional collaboration, visualization, observability, test-driven development, rapid experimentation and documentation EDUCATION
Boston University, Boston, MA 2022-2024
Master of Science in Electrical and Computer Engineering Data Analytics Subconcentration GPA: 3.8/4.0
• Relevant Coursework: Machine Learning, Deep Learning, Natural Language Processing, Big Data Analytics for Business, Software Design, Image and Video Computing, Artificial General Intelligence, ECE Product Design and Cyber Security. Guru Gobind Singh Indraprastha University, New Delhi, India 2017-2021 Bachelor of Technology in Electronics and Communication Engineering SKILLS
Programming Languages & Environments: Python, SQL, HTML/CSS, Bash, Linux, Unix, PyTest, VS Code, Terraform, Typescript Data Engineering & MLOps Tools: PySpark, DBT, Apache Airflow, Kafka, Hive, MLflow, Git, Weights & Biases, DVC, RAGAS DevOps Tools: Docker, Kubernetes, GitHub Actions, Jenkins, Argo CD, Prometheus, Grafana, FastAPI, Flask, Selenium Databases, Vector Search & Embeddings: MySQL, MongoDB, Postgres, BigQuery, Snowflake, Chroma, FAISS Vector Store Cloud Tools: GCP (VM, Vertex AI, Dataproc), AWS (EC2, ECR, S3, Bedrock, EMR, App Runner, EKS, Cloudwatch), Azure (ACR, VMs) Libraries: PyTorch, Scikit-learn, Numpy, Pandas, PySpark MLLib, Hugging Face, Crew AI, Open AI, LangChain, Ollama Skills: Statistics, Machine Learning, Deep Learning, Natural Language Processing, Big Data, REST API, LLM, Agile,Communication PROFESSIONAL EXPERIENCE
KGS Technology Group, Inc
Data Engineer Mar 2025 - Aug 2025
• Maintain PySpark-based ETL pipelines on AWS EMR multi-node EC2 cluster & orchestrate workflows with Airflow. Build modular SQL transformations in dbt to enable batch, incremental, and near real-time Snowflake loading. Manage dbt models, optimize Spark jobs, apply version control with Git, and support delivery of daily stakeholder Tableau reports.
• Automate CI/CD pipelines with Jenkins to build, test, and deploy dbt models, Spark ETL jobs, and Airflow DAGs. Manage observability with Prometheus & Grafana to monitor Spark metrics, dbt tests, & DAG health. Set up alerts for data quality issues, schema drift & latency spikes. Collaborate with architects & analysts for rapid root-cause resolution. Optimum AI (Microsoft Funded Startup)
AI/ML Developer Highlighted in a LinkedIn Post Apr 2024 - Jun 2024
• Built & integrated AI Agent for debt negotiation advice, tool for function calling, & user’s financial data profiling pipeline in AI Financial Planner app. Set a multi-LLM setup with steerer LLM for strategy selection & dialog LLM for personalized advice generation with few-shot & dynamic prompting. Built synthetic JSON with experts & deployed app on AWS PAAS.
• Led teacher-student fine-tuning by distilling curated knowledge from large LLMs into Mistral, improving efficiency & reducing hallucinations. Operated two dialogue/strategy LLMs persisting in-context (user profiles, session) in mongoDB,
& migrated inference from Public API to self-hosted Ollama server on AWS EC2 K8s, improving scalability,privacy & cost Boston University (BU)
Ingalls Monitor Ingalls Engineering Resource Center (ERC), Boston University May 2024 - Nov 2024, Jan 2023 - Jan 2024
• Oversaw ERC's daily operations with hourly 40+ students, enhanced operational efficiency by automation, managed ERC website, wrote SOPs and reports, and collaborated with IT department for troubleshooting ERC systems & printers.
• Delivered technical training and support to students and monitors; assisted ENG Dean's Office research and BU’s room access platform (10k+ users); troubleshot students’ HPC-based deep learning projects and maintained documentation. Grader Electrical and Computer Engineering Department Aug 2023 - Dec 2023
• Designed, evaluated, and graded assignments and projects for 60 students in the graduate-level EC 523 Deep Learning course, providing feedback on their use of Python and PyTorch under the guidance of Professor Kayhan Batmangelich. Technical Engineer, Data Science BU Spark! (Tech Incubation & Experiential Lab) Jan 2023 - May 2023
• Provided technical support to student teams & product managers in higher-education research & data science projects.
• Managed multiple project documentations, engaged stakeholders, version control Git Pull Requests, ensured technical solutions met client requirements, and facilitated knowledge sharing in graduate-lvl CS 506 data science tools course. Omdena Machine Learning Engineer Apr 2022 - Sep 2022
• Developed MediBot, Conversational AI chatbot with RASA by rule-based NLU, constructed a medical disease database by scraping medical websites, & unsupervisedly labeled 100k tweets using active learning (hugging face transformers).
• Demonstrated sentiment analysis of tweets on Tableau dashboard & gave PowerPoint presentations of project to stakeholders and over 400 collaborators, including technical, non-technical audiences, and experts, on a company day.
• Led the development of an intelligent waste segrega@on applica@on using computer vision (ResNet, VGG16 + SVM), deployed using Gradio on HuggingFace Spaces, with potential cost reduction and alignment with UN Sustainable Goals. PROJECT EXPERIENCE
MLOps with GitOps-Driven CI/CD for Machine Efficiency Prediction GitHub Jan - Feb 2025
• Gage machine mantainance (multi-label classification) by predicting efficiency (ML, Docker, Flask & Kubernetes, GCP).
• Developed a MLOps pipeline with CI/CD (Jenkins, Argo CD), automating data processing, model training & kubernetes based deployment through GitOps principles with GitHub-triggered webhooks for seamless automation and rollouts. MLOps-Powered CI/CD Phishing Detection Pipeline for Network Traffic Data GitHub Sept - Oct 2024
• Developed a ETL pipeline, loaded data into MongoDB, & integrated AWS feature store for data ingestion. Performed data transformation (missing values, class imbalance), validation (schema, drift checks), unit, integration & functional tests. Trained & tuned ML classification models (Logistic Regression, Decision Tree, GBT, RandomForest, XG/AdaBoost).
• Implemented logging, custom exception handling, and experiment tracking of model artifacts in DagsHub via MLflow. Built a RESTful API with FastAPI, providing endpoints for model training and real-time and batch phishing prediction.
• Built a CI/CD pipeline with GitHub Actions for automated deployments on AWS (EC2, ECR) and Azure (ACR, Web App), using a self-hosted runner to build, deploy, & monitor tasks, and integrated AWS S3 for artifact storage synchronization. ETL & Survival Prediction Project with Apache Airflow, Docker, Feature Store, ML Monitoring GitHub July - Aug 2024
• Built Airflow DAGs using PythonOperator and TaskFlow API for math operations to demonstrate modular orchestration; deployed ETL pipeline using Apache Airflow with docker and PostgreSQL database for ingesting NASA APOD API data.
• Developed a scalable ETL pipeline for Titanic survival prediction using Apache Airflow on GCP, loading data from GCS to PostgreSQL db, processing with Redis-based feature store, & integrating model training & data versioning with DVC. Implemented monitoring with prometheus & grafana to detect data drift & trigger model retraining via real time alerts. Extractive Summarization Task (EST) Improvement and MLOps Pipeline Construction GitHub Nov 2023 - Dec 2023
• Improved BertSum paper’s EST performance on CNN/DailyMail dataset by 5% through architectural changes (adding CNN, LSTM-RNN, Transformer layers on pre-trained BERT using Hugging Face), fine tuning, and hyperparameter tuning.
• Deployed EST application as microservices in a scalable MLOps pipeline using open-source tools like Streamlit, Docker, minikube K8s (scaling), MLFlow (experiment tracking), Prometheus & Grafana (monitoring), & GCPPostgreSQL (logging). NeurIPS Synthetic Paper Acceptance Dataset Construction & Recommendation Systems Exploration Blog July 2023 - Aug 2023
• Constructed a dataset contrasting 1k GPT-4 generated and NeurIPS human reviews in collaboration with 30 students.
• Analyzed effects of increase/decrease sparsity & imputation on matrix factorization (collaborative filtering) algorithm. Devised Hybrid Approach for Cost-Effective & High-Quality 3D Reconstructions with Point Cloud GitHub May 2023 -Jun 2023
• Transform images to 3D point clouds by fine-tuning deeplearning models (MVSNET, Monodepth, ZoeDepth) on outdoor
& self-captured drone data & improved point cloud’s density & quality using diffusion & NERF 3D reconstruction system Analyze NYC Taxi Trips to Understand Demand & Optimize Revenue GitHub, Blog Feb 2023 - May 2023
• Conducted large-scale exploratory data analysis on 75GB+ NYC Taxi & Limousine trips data (2018-2021) with Spark SQL and BigQuery, identifying key demand and revenue patterns for taxi companies by integrating trips and location data.
• Performed cleaning, feature engineering, and predictive modeling using Spark MLlib models (Linear regression, GBT Regressor, Random Forest) to validate EDA findings and extract additional insights through feature importance analysis. ADDITIONAL EXPERIENCES
Volunteer Data Scientist – Bright Mind Enrichment & Schooling (501(c)(3) Nonprofit) Dec 2024 - Mar 2025
• Co-led & coordinated 100+ volunteer team. Oversaw weekly data collection pipelines analyzing 50+ org philanthropic activities. Supported grant outreach by rule-based & Gen AI emails; ran A/B tests to evaluate & improve engagement. Project Lead – MassMutual Data Days for Good 2023 (Competition) GitHub, Blog Jun - Jun 2023
• Led a team of six to analyze STEM and CS education disparities across MA school districts using socio-economic data.
• Conducted EDA to identify correlations &assess NGO impact on STEM education outcomes across districts for NCF client MS Student Ambassador – Electrical & Computer Engineering, Boston University College of Engineering July 2023 - Jan 2024
• Drove departmental initiatives & mentored 30+ peers on academics & research; received 2023 MS Community Award Vice President – Student Association of Graduate Engineers (SAGE), Boston University May 2023 - Jan 2024
• Led cross-disciplinary events with reps and board members to enrich MS/PhD student experience and foster inclusivity. Research Contributor – ML Reproducibility Challenge 2020 (Papers with Code) Code, Report, Experiments Jan 2021 - May 2021
• Represented FastAI open-source community in reproducing memory & compute claims from ICLR 2020 Reformer paper