Engineering Intern Data

Location:

Syracuse, NY

Posted:

July 15, 2025

Contact this candidate

Resume:

Gnyani Enugandula

New York *******@*****.*** +1-315-***-**** LinkedIn Github Portfolio

EDUCATION

Master’s in Information Systems, Syracuse University, Syracuse, NY Aug 2023 – May 2025 Bachelor’s in Electronics Engineering, Mumbai University, Navi Mumbai, Maharashtra Aug 2019 – May 2023 TECHNICAL SKILLS

Languages & Scripting: Python, SQL, R, PySpark, KornShell (KSH) Data Testing & Validation: Pytest, Schema Drift Detection, Data Profiling, QA Automation ETL & Data Engineering: Databricks, Delta Lake, Apache Airflow, SSIS, REST APIs CI/CD & DevOps: GitHub Actions, Jenkins, Docker, Power Automate Cloud Platforms & Databases: AWS (S3, EC2, Lambda), Azure, Redshift, PostgreSQL, MySQL, SQL Server, Snowflake BI & Visualization: Power BI, Tableau, Looker, Excel, Streamlit PROFESSIONAL EXPERIENCE

Data Engineering Intern RELI Group Inc. Syracuse, New York Feb 2025 – Present

• Engineered a scheduled pipeline that extracts RFPs via Govwin API, applies LLM-based classification, and generates daily reports, reducing manual review effort by 70% and improving opportunity discovery.

• Built validation scripts in Python and SQL to detect schema drift, missing values, and anomalies across multi-source ingestion pipelines, improving data reliability and increasing early defect detection by 40%.

• Embedded RAG-based retrieval logic into the CapMatrix tool to auto-score SOW alignment with 200+ contracts, achieving 85% confidence in capability matching for strategic bids.

• Automated SharePoint data pipelines with Power Automate, reducing weekly reporting turnaround from 2 hours to 10 minutes, and standardizing output formats across stakeholder teams.

• Built internal QA tools including a contract formatter and metadata validation suite to ensure schema consistency, traceability, and audit-readiness across the proposal development lifecycle. AI Researcher NEXIS Lab, Syracuse University Syracuse, New York Feb 2025 – May 2025

• Designed QA workflows to validate BERT-based model outputs across 10,000+ social media posts, achieving 92% classification accuracy and reducing manual tagging by 60%.

• Built end-to-end testing pipelines for sentiment labels and keyword tagging, improving traceability and reproducibility across multiple annotation cycles and data refreshes.

• Developed a real-time QA dashboard in Streamlit to monitor prediction shifts, sentiment drift, and annotation mismatches, reducing error review time by 40%.

Research Fellow Ballotpedia Syracuse, New York Jan 2025 – Mar 2025

• Automated quality assurance on 1,000+ electoral records using Python, SQL, and R, increasing data reliability from 75% to 95% while aligning output to public data standards.

• Built scalable ingestion pipelines for cleansing raw electoral data and added exception tagging logic to prioritize anomalies for manual review by election data researchers.

• Applied XGBoost-based outlier detection models to flag inconsistent records and reduce overall manual QA effort by 30%, improving early anomaly detection across datasets.

• Maintained audit logs, validation checklists, and test documentation to ensure traceability, reproducibility, and regulatory compliance in public-facing electoral datasets.

PROJECTS

Past Performance Matcher Python, RAG, Sentence Transformers, Vector DB, Cosine Similarity Mar 2025 – present

• Built a RAG-based engine to match RFPs with prior contracts using vector embeddings and cosine similarity, achieving 89% accuracy in top-3 capability matches.

• Implemented a scoring pipeline with context-aware keyword filters, exception tagging, and QA thresholds to align with proposal vetting and reviewer workflows.

• Logged match results, false positives, and feedback for tuning thresholds and generating audit-ready logs to support decision trace- ability and iterative performance improvements.

Databricks ETL Validation System PySpark, SQL, Delta Lake, Git, Databricks Feb 2025 - May 2025

• Developed a scalable test automation system using PySpark and SQL to validate data transformations and schema consistency across Delta Lake tables in Databricks.

• Designed reusable test scripts integrated into Databricks Workflows and GitHub Actions, enabling CI/CD pipeline support for distributed data quality checks.

• Improved defect identification across ingestion and transformation layers by 40%, increasing confidence in production data pipelines.

Contact this candidate