Gnyani Enugandula
New York *******@*****.*** +1-315-***-**** LinkedIn Github Portfolio
EDUCATION
Master’s in Information Systems, Syracuse University, Syracuse, NY Aug 2023 – May 2025 Bachelor’s in Electronics Engineering, Mumbai University, Navi Mumbai, Maharashtra Aug 2019 – May 2023 TECHNICAL SKILLS
Languages & Scripting: Python, SQL, R, PySpark, KornShell (KSH) Data Testing & Validation: Pytest, Schema Drift Detection, Data Profiling, QA Automation ETL & Data Engineering: Databricks, Delta Lake, Apache Airflow, SSIS, REST APIs CI/CD & DevOps: GitHub Actions, Jenkins, Docker, Power Automate Cloud Platforms & Databases: AWS (S3, EC2, Lambda), Azure, Redshift, PostgreSQL, MySQL, SQL Server, Snowflake BI & Visualization: Power BI, Tableau, Looker, Excel, Streamlit PROFESSIONAL EXPERIENCE
Data Engineering Intern RELI Group Inc. Syracuse, New York Feb 2025 – Present
• Engineered a scheduled pipeline that extracts RFPs via Govwin API, applies LLM-based classification, and generates daily reports, reducing manual review effort by 70% and improving opportunity discovery.
• Built validation scripts in Python and SQL to detect schema drift, missing values, and anomalies across multi-source ingestion pipelines, improving data reliability and increasing early defect detection by 40%.
• Embedded RAG-based retrieval logic into the CapMatrix tool to auto-score SOW alignment with 200+ contracts, achieving 85% confidence in capability matching for strategic bids.
• Automated SharePoint data pipelines with Power Automate, reducing weekly reporting turnaround from 2 hours to 10 minutes, and standardizing output formats across stakeholder teams.
• Built internal QA tools including a contract formatter and metadata validation suite to ensure schema consistency, traceability, and audit-readiness across the proposal development lifecycle. AI Researcher NEXIS Lab, Syracuse University Syracuse, New York Feb 2025 – May 2025
• Designed QA workflows to validate BERT-based model outputs across 10,000+ social media posts, achieving 92% classification accuracy and reducing manual tagging by 60%.
• Built end-to-end testing pipelines for sentiment labels and keyword tagging, improving traceability and reproducibility across multiple annotation cycles and data refreshes.
• Developed a real-time QA dashboard in Streamlit to monitor prediction shifts, sentiment drift, and annotation mismatches, reducing error review time by 40%.
Research Fellow Ballotpedia Syracuse, New York Jan 2025 – Mar 2025
• Automated quality assurance on 1,000+ electoral records using Python, SQL, and R, increasing data reliability from 75% to 95% while aligning output to public data standards.
• Built scalable ingestion pipelines for cleansing raw electoral data and added exception tagging logic to prioritize anomalies for manual review by election data researchers.
• Applied XGBoost-based outlier detection models to flag inconsistent records and reduce overall manual QA effort by 30%, improving early anomaly detection across datasets.
• Maintained audit logs, validation checklists, and test documentation to ensure traceability, reproducibility, and regulatory compliance in public-facing electoral datasets.
PROJECTS
Past Performance Matcher Python, RAG, Sentence Transformers, Vector DB, Cosine Similarity Mar 2025 – present
• Built a RAG-based engine to match RFPs with prior contracts using vector embeddings and cosine similarity, achieving 89% accuracy in top-3 capability matches.
• Implemented a scoring pipeline with context-aware keyword filters, exception tagging, and QA thresholds to align with proposal vetting and reviewer workflows.
• Logged match results, false positives, and feedback for tuning thresholds and generating audit-ready logs to support decision trace- ability and iterative performance improvements.
Databricks ETL Validation System PySpark, SQL, Delta Lake, Git, Databricks Feb 2025 - May 2025
• Developed a scalable test automation system using PySpark and SQL to validate data transformations and schema consistency across Delta Lake tables in Databricks.
• Designed reusable test scripts integrated into Databricks Workflows and GitHub Actions, enabling CI/CD pipeline support for distributed data quality checks.
• Improved defect identification across ingestion and transformation layers by 40%, increasing confidence in production data pipelines.