Data Engineer

Location:

Princeton, NJ

Posted:

February 21, 2025

Contact this candidate

Resume:

Sona Krishnan

Plainsboro, New Jersey **********@*****.*** +1-609-***-**** linkedin github

Education

University of Illinois at Urbana-Champaign

BS in Statistics and Computer Science GPA: 3.53

Aug 2021 – May 2025

Cornell University

Master of Engineering in Computer Science (Part-Time) Aug 2025 - May 2027

New York City, New York

• Relevant Coursework: Data Structures, Numerical Methods, Intro to Algs & Models of Comp, Intro to Computer Systems, Database Systems, Compilers, Data Mining, Artificial Intelligence, Statistical Modeling I & II. Work Experience

InvisiblCloud Remote

Data Engineer – GenAI & Platform

Jan 2024 – Present

• Built and managed data pipelines using Apache Airflow, orchestrating event-driven and scheduled workflows for multi-omics data processing.

• Designed Airflow DAGs for ETL pipelines, integrating PythonOperator, KubernetesPodOperator, and S3KeySensor to automate ML model training.

• Orchestrated AI/ML workloads on multi-cloud Kubernetes clusters, using Terraform for infrastructure provisioning.

• Implemented HPA & VPA to dynamically adjust AI model execution resources based on workload demand.

• Configured Ingress controllers for load balancing AI inference requests and ensured secure communication between microservices.

• Developed observability pipelines using Prometheus and Grafana to monitor DAG execution times and pod metrics. Northrop Grumman Palmdale, California

Data Science Intern

Jun 2023 – Aug 2023

• Designed a material selection recommendation system, increasing engineers’ decision-making speed by 15%.

• Built machine learning models with Keras and Scikit-Learn, achieving 90% accuracy through cross-validation techniques.

• Analyzed key performance indicators through EDA and implemented targeted optimizations reducing error rates by 25%.

Verde Finance Remote

Software Developer Intern

May 2022 – May 2023

• Built ETL pipelines with AWS Glue, reducing data processing time and ensuring seamless data integration.

• Developed financial scoring algorithms, achieving 85% accuracy compared to competitor benchmarks.

• Created serverless back-end systems using AWS Lambda and DynamoDB, improving real-time data processing efficiency.

Projects

Personalized Workout Scheduler SQL, Node.js, React.js Github

• Built a full-stack fitness application hosted on Google Cloud Platform with RESTful APIs to enable real-time CRUD operations for personalized workout plans, reducing user task completion time.

• Implemented efficient query optimization and state management for integration between front-end and back-end. Custom Dynamic Memory Allocator C Github

• Developed a custom memory management system in C, focusing on performance and efficient memory usage. Cardiovascular Risk Prediction Model Python, Scikit-Learn, Pandas Github

• Designed and implemented supervised learning models, including SVM, Logistic Regression, and Random Forest, achieving an 85.8% accuracy in predicting cardiovascular disease risk.

• Addressed class imbalance using SMOTE and optimized model performance with GridSearchCV.

• Evaluated models with metrics like ROC-AUC, F1-score, and confusion matrices. Skills

Languages: Python, C, C++, Java, R, SQL, NoSQL, HTML, CSS, Javascript, C# Tools & Libraries: AWS(BedRock, Lambda, Glue, DynamoDB), RAG, Azure, React.js, GraphQL, Flask, Apache Airflow, Spark, Angular.js, Scikit-Learn, PostgreSQL, MongoDB, Node.js, Neo4j, ElasticSearch, LLM Guardrails, Terraform Certifications: AWS Certified Cloud Practitioner, IBM Professional Machine Learning Certificate

Contact this candidate