Junior Data Engineer Data Quality & Analytics

Location:

San Jose, CA

Salary:

100k

Posted:

February 23, 2026

Contact this candidate

Resume:

vinaykumargoud Data Engineer

Spark SQL Python Snowflake Airflow Kafka +1-913-***-****

***********************@*****.*** Linkedln Github Portfolio Professional Summary:

Data Engineer with 2+ years of experience building scalable ETL pipelines using SQL, Spark/PySpark, Kafka, and cloud platforms. Experienced in processing high-volume datasets, improving data reliability, and delivering scalable analytics-ready data products. Currently developing an intelligent agent layer on top of my TradeLoop ETL pipeline to enable automated monitoring and smarter workflow orchestration.

Technical Skills:

Programming/Scripting Language: Python, PySpark, SQL

Database & Datawarehouse: Amazon DocumentDB,MongoDB,Snowflake, Redshift, BigQuery, PostgreSQL, MySQL,

Bigdata Frameworks & Tools :Hadoop, Spark, AWS EMR, AWS Glue, Airflow, DataBricks,Dbt,Kafka

CloudPlatforms:AWS(Lambda,VPC,Athena,Redshift,S3,EC2,EMR,Glue,Kinesis,DMS,RDS,Sagemaker,QuickSight)Azure( Data Factory, Synapse Analytics, DataBricks, ADLS,Azure Power BI)

Visualization & BI: Power BI, Tableau,Qlik sense

Data Modeling & Architecture: Dimension Modeling,Lakehouse Architecture

Generative AI & LLMs:LangChain, Vector Databases (FAISS), LLM Embeddings, Prompt Engineering

Others: Azure Devops,Github actions,,AirFlow,Fivetran,Terraform,CI/CD Certifications:

AWS Certified Solutions Architect – Associate SAA -C03 (2024)

Oracle Certification – 1Z0-922: MySQL Implementation Specialist (2025) Education:

Masters in Computer Science

University of Central Missouri, USA August 2023 – May 2025

Bachelors in Electronics & Communication Engineering

MLR Institute of Technology and Management, India August 2017 – May 2021 Experience:

Quant Dynamics SolutionInc Santa Clara Jan2025-present Role: Data Engineer

Led the end-to-end architecture and delivery of a Proof-of -concept for cloud -native analytics platform, translating complex client requirments into scalable AWS-based solutions

Reduced internal ETL product job runtimes from 90 minutes to 5 minutes by replacing a single-threaded JDBC approach with an Athena unload solution, significantly enhancing efficiency

The design and documentation of high and low-level architectures incorporate KMS and VPC private endpoints to ensure 100% compliance with data security governance standards while promoting asset reuse through a centralized data catalog.

Designed and managed S3-based Data Lakehouse using databricks and Delta Lake,processing 1TB+ of daily event data while implementing Delta-native features like Z-Order indexing and data Coalescing to solve the small -file problem

Implemented a write-audit-publish data quality framework to ensure data correctness, completeness, and reliability, preventing >95% of bad data from reaching downstream consumers and establishing trusted analytics datasets for business.

Implemented CI/CD pipelines using GitHub Actions to automate data ingestion and Snowflake table creation via Fivetran, reducing manual effort by 50%.

Engineered robust Airflow DAGs with custom python validation checks to monitor cloud billing and usage patterns.

Delivered high-impact data visualizations and dashboards for executives by presenting highly-aggregated trends, simple charts, and intuitive KPIs, enabling >30%faster data-driven decision-making and maximizing business impact. Cognizant Technology Solutions Corporation

Clorox & Hyderabad, India July 2022–Aug 2023

Role: ETL Developer

Optimized SAP ERP Operations with Big Data Integrations and Streamlined PO deletions, workflow corrections, and Idoc reprocessing by integrating ETL pipelines, improving business continuity by 98%

Integrated document- based data from MongoDb into centralized ETL pipelines, transforming complex json structures for downstream analytics in snowflake

Enhance Data Pipeline Reliability and Achieved 99.5% data transmission reliability with proactive monitoring, automated failure recovery, and orchestration using Apche airflow, Kafka, and Aws step functions Cognizant Technology Solutions Corporation

Walgreens & Hyderabad,India July 2021 – June 2022 Role: EDI Analyst

Actively monitored and resolved EDI transactions (850, 855, 856, 810), identifying errors and ensuring smooth data flow. Project:

Trade-Loop: Real-Time Marketplace ETL Pipeline:

I built the Trade-Loop marketplace app from scratch specifically to generate real-world data flows—giving firsthand visibility into how upstream data behaves. This hands-on approach is the most practical way to master data engineering, and serves as a valuable learning resource for freshers entering the field.

Built an end-to-end data engineering solution for a marketplace application, enabling real-time and batch data ingestion, transformation, and analytics.The pipeline processes user activity, product listings, and transactions through Kafka streams, transforms data using Spark, and loads into Snowflake for BI consumption—all orchestrated via Airflow in Docker.

Contact this candidate