Data & GenAI Engineer with 7+ Years Experience

Location:

Chicago, IL

Posted:

February 12, 2026

Contact this candidate

Resume:

ROHAN CHICKALKAR

Chicago, IL 972-***-**** ***************@*****.*** linkedin.com/in/rohan-chickalkar Professional Summary:

• Data & GenAI Engineer with 7+ years designing large-scale ETL, data lake, and RAG systems across AWS, Azure, GCP, and Databricks, delivering pipelines processing 50M+ records across retail, finance, and healthcare.

• Built 20+ enterprise ETL workflows using Spark, Glue, Dataflow, Data Factory, and Delta Lake, enabling reliable ingestion of 10M–40M monthly operational and transactional events.

• Engineered 150+ ML features on Databricks using Spark, Delta Lake, and MLflow, supporting 5 enterprise credit-risk and fraud-detection models with full lineage and experiment tracking.

• Delivered 3 production RAG platforms using Vertex AI, Embeddings, Vector Search, and BigQuery, indexing 30K documents and supporting 1M+ annual mechanic and operations queries.

• Designed 12+ domain data models on BigQuery, Redshift, and Snowflake, optimizing analytics for more than 2TB monthly data growth and improving query speeds by 30%.

• Developed 6 high-throughput microservices using Cloud Run, Lambda, and Azure Functions, routing 500K+ daily events with sub-900 ms latency to support diagnostic and retail analytics workloads.

• Implemented 25+ data-quality rule sets detecting 30K+ anomalies monthly across regulated domains, improving downstream reliability for pipelines spanning 4 global regions.

• Automated 500K daily incremental updates using S3, Cloud Storage, and Delta Lake, ensuring SLA-bound delivery with 99.9%+ success across 8 business ingestion domains.

• Built 10 ML monitoring dashboards using BigQuery, MLflow, and Looker, tracking drift and quality metrics across 3K daily inference events for strict governance and audit needs.

• Collaborated with 25+ stakeholders across engineering, analytics, and ML teams to align RAG, ETL, and feature store logic with 15 compliance-bound business requirements. Technical Skills:

• Programming Languages: Python, Java, Scala, SQL, R, Shell Scripting

• Big Data Technologies: Apache Spark, Hadoop, Hive, Sqoop, Pig, HBase, Kafka

• GenAI & LLM Technologies: LLMs, GPT, LLaMA, Prompt Engineering, RAG, Embedding Models, Vector Search, Vector DBs, LangChain, LangGraph, OpenAI API, HuggingFace, Transformers, Vertex AI GenAI Studio

• Cloud Platforms:

• AWS (S3, Glue, Lambda, Redshift, EMR)

• Azure (Data Factory, Functions, Monitor)

• GCP (BigQuery, Dataflow, Composer, Vertex AI, Cloud Run)

• Data Modeling & Databases: OLTP/OLAP Modeling, Star/Snowflake Schema, Normalization, BigQuery, Snowflake, Redshift, Azure SQL, MySQL, PostgreSQL

• ETL Tools: Glue, Dataflow, Data Factory, Databricks Workflows/Jobs, Cloud Composer, Airflow

• Database Architectures: Data Lake, Lakehouse, Delta Lake, Data Warehouse, MPP Systems, Columnar Storage, Partitioning & Clustering

• Data Processing Libraries: PySpark, Pandas, NumPy, Delta Lake APIs, Beam (Dataflow), Spark SQL

• Machine Learning & Graph Processing: Scikit-learn, XGBoost, MLflow, Feature Stores, Vertex AI Training/Endpoints, GraphFrames, NetworkX

• BI & Reporting Tools: Power BI, Tableau, AWS QuickSight, Looker, BigQuery BI Engine

• APIs & Authentication: REST APIs, OAuth2, JWT, API Gateway, Cloud Endpoints, Secrets Manager, Key Management

(KMS)

• Development Methodologies: Agile, Scrum, Kanban, Test-Driven Development (TDD), MLOps Best Practice

• CI/CD & Version Control: Git, GitHub, GitLab, Jenkins, Azure DevOps, Cloud Build, Docker Professional Experience:

GenAI Engineer Toyota Motor Corp, USA Aug 2024 – Present Project: GenAI Chat Assistant for Vehicle Diagnostics & Troubleshooting (Vertex AI)

• Designed a Vertex AI RAG pipeline processing 5M repair and diagnostic logs, enabling mechanics to query faults, symptoms, and resolutions using natural language.

• Generated 30K document embeddings using Vertex AI Embeddings and stored them in Vector Search, improving retrieval accuracy for 12K service terms.

• Ingested 2TB historical sensor and telematics data into BigQuery, powering LLM-backed troubleshooting recommendations.

• Built 6 Cloud Run microservices for query routing, context assembly, and response ranking, maintaining sub-900 ms latency.

• Created 18 prompt templates optimizing LLM responses for diagnostics, repair steps, error codes, and part replacements across 4 vehicle platforms.

• Constructed 10 data-quality rules detecting 20K malformed logs monthly before embedding ingestion.

• Implemented 3 multi-turn dialogue flows with Vertex AI Endpoints, improving continuity across 8 mechanic interaction scenarios.

• Automated daily refresh of 500K new service records via Cloud Composer, ensuring latest troubleshooting content remains indexed.

• Tuned vector similarity search for 1M queries/month, improving retrieval precision by 25% and reducing hallucination risk.

• Built 4 guardrail layers enforcing PHI-safe and safety-critical response boundaries with <1% rejection error rate.

• Logged 3K model inference events daily into BigQuery for drift monitoring and regulatory auditability.

• Collaborated with 9 service engineers to align LLM outputs with OEM diagnostic standards and 15 compliance constraints.

Data/ML Engineer American Express, USA May 2020 – July 2022

Project: Credit Risk Feature Engineering & Scoring Pipeline on Databricks using the required framework.

• Engineered 150 credit-risk features using Databricks, Spark, and Delta Lake, covering utilization, delinquency, and payment trends for 5 enterprise scoring models.

• Processed 40M daily account and transaction rows through Spark ETL pipelines with consistent sub-25 minute runtimes across 3 feature layers.

• Built 18 reusable feature notebooks orchestrated through Databricks Jobs, supporting daily and intraday model refresh cycles.

• Implemented 25 data-quality checks catching 30K monthly anomalies in balances, credit limits, and payment histories before landing in Delta Lake.

• Created 6 incremental merge workflows using Delta Lake MERGE INTO logic, updating 12M account records with <5 minute latency.

• Logged 3K model-training artifacts using MLflow, improving reproducibility and traceability for risk model experiments.

• Designed 10 time-window aggregations capturing 90, 180, and 360 day risk patterns for enhanced feature stability.

• Integrated 4 upstream financial systems delivering 8M daily events into curated Delta Lake gold tables.

• Tuned Spark partitions to handle 2TB monthly growth, producing 20% faster feature-generation cycles.

• Automated 5 validation reports summarizing feature drift, missingness, and distribution shifts across 3 environments.

• Optimized Spark joins for 50M row datasets, reducing compute cost by 30% using broadcast strategies.

• Collaborated with 7 data scientists to align feature logic with regulatory and model-governance criteria across 15 attributes.

Data Engineer Amazon India May 2020 – July 2022

Project: Retail Order Lifecycle Lakehouse on AWS using the required framework.

• Designed an AWS ETL pipeline ingesting 30M monthly order, shipment, and return events into S3, Glue, and Redshift, enabling unified visibility for 8 retail domains.

• Engineered 15 Glue jobs to standardize order status, cancellation codes, and delivery promises, delivering consistent sub-20 minute batch loads.

• Processed 500K daily incremental order updates using Lambda and S3 event triggers, ensuring accurate near–real- time lifecycle tracking.

• Modeled 12 fact and 20 dimension tables in Redshift to support analytics teams querying 60M rows/day across fulfillment and CX workflows.

• Implemented reconciliation logic detecting 25K discrepancies monthly across OMS, transportation, and carrier systems.

• Built 6 SLA-bound pipelines in Glue maintaining 99.9% on-time delivery of curated datasets to downstream users.

• Automated 3 retry and fallback mechanisms handling 10K transient ingestion failures weekly without manual intervention.

• Tuned Redshift sort and dist keys for 2TB monthly data growth, improving query performance by 35%.

• Consolidated 40K daily shipments from 5 carrier feeds using standardized schemas and automated code mappings.

• Integrated 7 retail systems, including OMS, WMS, and CX tools, enabling consistent order lifecycle alignment across 4 regions.

• Added 25 data-quality rules catching 18K anomalies per cycle across address, item, and promise-level attributes.

• Partnered with 9 cross-functional teams to refine order-lifecycle logic and streamline delivery-exception reporting across 3 fulfillment networks.

Data Engineer Cognizant, India Dec 2018 – Oct 2020 Project: Healthcare Claims Ingestion & Standardization on GCP

• Built a GCP ETL pipeline ingesting 20M monthly claims through Cloud Storage, Dataflow, and BigQuery, enabling curated outputs for 6 healthcare domains.

• Engineered 12 ingestion templates in Dataflow to standardize institutional, professional, and pharmacy claims with consistent sub-15 minute loads.

• Processed 500K daily JSON, CSV, and X12 files via Cloud Storage triggers, ensuring validated, PHI-safe downstream delivery.

• Designed 25 business-rule transformations in Dataflow to normalize diagnosis, procedure, and billing codes across 3 payer systems.

• Built 10 dimensional tables and 8 fact layers in BigQuery, supporting analytics teams querying 50M rows/day.

• Implemented automated anomaly checks detecting 30K data-quality exceptions monthly across eligibility, provider, and claims datasets.

• Integrated 4 external payer sources using standardized schemas, reducing onboarding effort by 40%.

• Automated 5 domain-specific reconciliation checks, resolving 8K mismatches per cycle.

• Tuned BigQuery partitions to handle 2TB monthly growth while achieving consistent 25% query performance gains.

• Established lineage for 600 assets using Data Catalog, improving audit visibility for HIPAA-governed workflows.

• Built 3 SLA-bound pipelines in Composer ensuring 99.9% on-time daily delivery.

• Partnered with 7 cross-functional teams to align claim-standardization logic with 15 regulatory attributes. Python Developer Medline, India Mar 2017 – Nov 2019 Project: Claims Data Processing Pipeline on Azure

• Built an Azure ETL pipeline ingesting 10M monthly claims rows using Data Factory and Python, improving data readiness for analytic teams across 3 operational units.

• Developed 5 reusable Python modules for schema checks, null scans, and type enforcement, reducing manual QA cycles by 30%.

• Orchestrated 12 ingestion workflows in Azure Data Factory, enabling hourly refresh windows with consistent sub-20 minute runtimes.

• Implemented Azure Functions to auto-cleanse 50K daily malformed records, eliminating 2 recurring production defects.

• Optimized claim parsing logic to handle 100K complex nested JSON files, reducing pipeline latency by 40%.

• Created validation rules for 25 business attributes, catching 15K data-quality exceptions per month before landing.

• Automated audit logging across 4 stages using Azure Monitor, enabling traceability for 100% of pipeline executions.

• Developed a durable retry mechanism processing 5K transient failures weekly with zero manual intervention.

• Containerized Python jobs into 3 deployable units to streamline promotions across 2 environments.

• Tuned Azure SQL staging for 1M upserts/day via partitioning and index optimization, improving write throughput by 35%.

• Integrated 3 downstream systems through curated datasets refreshed every 4 hours.

• Coordinated with 6 cross-functional stakeholders to align validation logic with 5 compliance requirements. Certifications:

• Google Cloud Certified Professional Data Engineer

• Databricks Certified: Data Engineer Associate

• Microsoft Certified Fabric Data Engineer Associate

• AWS Generative AI Applications Professional Certificate

• IBM Data Science Professional Certificate

• Microsoft Certified Azure AI Fundamentals

Education:

Texas A&M University Commerce, TX

Master’s in Business Analytics & Artificial Intelligence, Major in Data Science GPA 3.70, Dean's Excellence Scholarship

Publications:

Employee Attrition Using Machine Learning Algorithms — Springer, Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems, vol.288, Singapore, 2022

Contact this candidate