Post Job Free
Sign in

Data Engineer Scientist

Location:
Austin, TX
Posted:
October 15, 2025

Contact this candidate

Resume:

AadarshGaikwad

****************@*****.*** 978-***-**** GitHub LinkedIn Portfolio

Education

Northeastern University Boston, MA

Masters in Data Science Jan 2024–May 2025

Experience

J.P. Morgan Chase & Co

Data Scientist Apr 2023 – Dec 2023

• Migrated credit card campaign analysis from legacy Oracle systems to Databricks, processing 500K+ customer transaction records using PySpark to support promotional targeting decisions, enabling 3x faster campaign scoring runs.

• Built customer propensity models using PySpark MLlib and XGBoost, managed features in Databricks Feature Store, and tracked model versions with MLflow, improving prediction accuracy and driving campaign ROI for marketing teams.

• Developed campaign performance pipeline with Unity Catalog and Delta Tables, centralizing metrics for 3 marketing teams and eliminating manual quarterly reports.

• Shipped lightweight APIs with Node.js and DynamoDB to trigger Databricks scoring jobs, reducing campaign deployment cycles from days to hours.

Data Engineer Mar 2022 – Apr 2023

• Built attribution data pipelines processing 3+ TB of cross-channel customer data using AWS Step Functions to orchestrate EMR, Lambda, and Glue ETL jobs, enabling marketing teams to track campaign performance across display ads and web.

• Optimized PySpark jobs and EMR tuning with Spot Instances, reducing compute costs by $50K annually while maintaining sub-2-hour processing windows for executive dashboards.

• Designed Snowflake data warehouse using Snowpipe, clustering keys, and materialized views for real-time attribution reporting, replacing legacy MySQL database.

• Automated CI/CD using Docker, Terraform, and GitHub Actions with blue-green deployment strategy, reducing production release times and minimizing downtime risk.

Wells Fargo

Data Engineer Apr 2021 – Mar 2022

• Supported fraud detection initiatives by building SageMaker data pipelines that delivered curated feature datasets (transaction velocity, merchant patterns, location anomalies) from PostgreSQL OLTP databases to S3 data lake for ML teams.

• Implemented feature engineering using PySpark within SageMaker Processing including one-hot encoding, scaling, and time-based aggregations for fraud models.

• Created data quality validation framework using Great Expectations, checking for nulls, outliers, and schema consistency in MySQL transaction tables with CloudWatch alerting.

• Configured S3 Intelligent-Tiering to move infrequently accessed data between storage tiers, reducing storage costs by 20%. Cognizant Technology Solutions

Data Engineer Jul 2018 – Apr 2021

• Engineered regulatory reporting pipelines processing 1 TB+/month of financial and compliance data from Oracle and SQL Server databases with PySpark and Hive ETL workflows, generating CECL/CCAR reports.

• Designed REST API microservices with FastAPI and SQLAlchemy to migrate data from MySQL OLTP systems to Hadoop, processing 5M+ records daily with audit trails.

• Enforced GDPR using tokenization and column-level encryption, aligning with enterprise-wide data governance policies.

• Secured internal APIs using OAuth 2.0 and JWT, added rate limiting, and conducted testing using Postman and Pytest.

• Improved PySpark SQL query performance using broadcast joins, column pruning, and predicate pushdown techniques for nightly ETL processing.

Certifications

Amazon Web Services: AWS Certified Data Engineer – Associate [Link] Projects

Research Assistant: Designed a retrieval-augmented generation (RAG) pipeline using AWS Bedrock (Llama-3-70B), LangChain and ChromaDB to generate context-aware responses from historical support data and knowledge base. Skills

ML & AI: LLM Fine-tuning (PEFT, QLoRA), Prompt Engineering, RAG, LangChain, AWS Bedrock, DSPy Languages: Python, SQL, JavaScript/Node.js, Shell Scripting Data & Cloud: AWS (SageMaker, EMR, Glue, Step Functions, Lambda, S3, Athena, CloudWatch), Databricks, Snowflake, BigQuery

Big Data: Spark/PySpark, Hadoop, Hive

DevOps: Docker, Terraform, GitHub Actions, Jenkins Databases: PostgreSQL, MySQL, Oracle, SQL Server, MongoDB, DynamoDB, ChromaDB



Contact this candidate