Indumathy Sandrasekaran
Data Engineer - AI expertise Phone: 804-***-****
Atlanta, GA Email: ****************@*****.***
SUMMARY
●Data Engineer with AI expertise and 11+ years of IT experience, including 5+ years designing, developing, and supporting Spark based batch data pipelines on Azure Databricks.
●Proven ability to build regulatory-grade data transformations, automation utilities, and operational controls using Python.
●Hands-on experience integrating data engineering with machine learning workflows, including feature store development, model ready datasets, vector embeddings, and orchestration of ML pipelines.
●Extensive experience with Snowflake, SQL performance tuning, batch scheduling, testing, and production support in Agile environments.
●Skilled in integrating Azure (ADF, ADLS Gen2, Synapse, Event Hubs) services into unified data platforms supporting analytics, forecasting, and ML model development.
●Proven ability to migrate on-premises and cloud data into Azure platforms, ensuring data quality, integrity, and performance.
●Proven ability to transform raw data into actionable intelligence, enabling data driven decision making and operational efficiency.
●Hands-on experience building customer segmentation and regression models to increase retail market share.
●Expertise in data visualization and dashboards using Tableau, Power BI, R Shiny, ggplot2, matplotlib, and seaborn.
●Proficient in NLP and text analytics with Python (NLTK, gensim) and R.
●Experienced in security and compliance: IAM, Key Vault/KMS, secrets management, VPC networking, RBAC, auditability.
●Excellent communication, stakeholder collaboration, and problem solving skills bridging technical and business teams.
technical Skills
Programming Languages
Python, PySpark, SQL, Unix/Linux basics, R
Cloud Platforms
Databricks, Snowflake, Azure (ADF, ADLS Gen2, Synapse), AWS (S3, Lambda, Textract) Apache Spark (Databricks).
Big Data & Processing
Apache Spark (Databricks), Spark SQL, PySpark, Scala/Spark understanding (DAGs, stages, jobs, failure analysis, optimization)
AI Data Engineering
Batch ETL/ELT pipelines, large-scale data processing Data validation, controls, reconciliation Azure Data Factory, batch scheduling concepts, Feature store pipelines, Embedding generation & vector storage, Data, validation, lineage, governance, ML Modeling and algorithms.
Visualization
Advanced Excel, Tableau, R- Studio, Rapid- Miner, Power BI
Databases
Snowflake, Azure SQL, Azure Synapse (Dedicated & Serverless), SQL Server, MySQL, Oracle
Workflow & Scheduling
Autosys scheduler (batch job setup & maintenance – exposure/support level), Dependency management, retries, alerts, monitoring
Security & Governance
Azure Key Vault, RBAC, IAM, Auditability, access control, compliance-oriented data handling.
PROFESSIONAL EXPERIENCE
Informarik Data Solutions Inc Data Engineer Atlanta, GA May 2022 - Mar 2026
RACETRAC:
●Engineered and optimized Spark based batch and AI ready data pipelines on Azure Databricks,
improving large scale data processing throughput by 40–50% and accelerating predictive analytics,
forecasting, and ML model development.
●Architected scalable ELT pipelines in Azure Data Factory to ingest data from SQL Server, REST APIs, S3 and Blob Storage into ADLS Gen2, improving ingestion efficiency by 50–60% and enabling automated, event-driven workflows for analytics and ML use cases.
●Design and manage Azure Elastic jobs to automate and schedule index maintenance, backups, data consistency checks.
●Built Snowflake UDFs and stored procedures (JavaScript/Python) to support complex business logic and reusable transformation frameworks.
●Developed multi table merge logic using Delta Lake MERGE, window functions, and schema enforced pipelines to support SCD Type 1/2 patterns, improving data accuracy by 40%.
●Created external tables over ADLS Gen2 and Delta Lake to enable cross cloud analytics and unified reporting across Azure and AWS ecosystems.
●Tuned Databricks SQL and Spark workloads by optimizing Delta file sizes, Z ORDERing, partition pruning, and caching strategies to significantly improve query performance.
●Developed advanced PySpark transformations for feature engineering, cleansing, deduplication, and schema enforcement, enhancing data quality and producing model ready datasets with 35–45% higher reliability.
●Improved operational stability by tuning Spark jobs (partitioning, optimized joins), analyzing DAGs, integrating Databricks with ADF, reducing pipeline failures and strengthening end to end observability.
●Analyzed Spark DAGs, jobs, stages, and failures, applying partitioning and optimization techniques to improve performance and stability.
●Partnered with architecture, infrastructure, and business teams to understand data requirements and deliver data centric solutions.
●Participated in Agile ceremonies including sprint planning, stand-ups, backlog grooming.
Cloudingest Inc Data Engineer Alpharetta, GA Dec 2023 – Jan 2025
RACETRAC:
●Implemented Snowflake external tables and secure data sharing to integrate cross cloud datasets from ADLS, S3, and partner systems.
●Designed Snowflake compatible data models, Azure Synapse external tables, and large scale PySpark transformations in Databricks, increasing processing throughput by 40% and enhancing reporting accuracy.
●Implemented elastic jobs to perform monitoring and diagnostics for the databases.
●Implemented near real time streaming ingestion using Event Hubs and Stream Analytics, reducing data latency and enabling timely ML feature delivery.
●Collaborated with stakeholders to define reporting metrics and deliver insights via Power BI.
Idexcel Inc Data Engineer Herndon, VA May 2022 – Dec 2023
RACETRAC:
●Built scalable ELT workflows in Snowflake using Tasks, Streams, and stored procedures, supporting RaceTrac’s enterprise wide reporting and ML pipelines.
●Automated Snowflake schema deployments and data quality checks using Azure DevOps, improving release consistency and reducing deployment errors.
●Designed and maintained end-to-end monthly batch pipelines powering RaceTrac’s Staffing Optimization Solution (SoS), integrating data from Workday, EDW, Radiant, Comm-Data data.
●Built scalable data extraction and transformation workflows to generate store, date, transaction, and employee datasets used across demand, PTO, and attrition models.
Iidexcel Inc Data Scientist Herndon, VA Oct 2021 - Apr 2022
CYNC:
●Cleansed noisy text based images using computer vision techniques, improving OCR accuracy and downstream text extraction quality by 35–45%.
●Built automation scripts to parse JSON and load structured data into relational databases, reducing manual ingestion time by 60% and improving data consistency.
●Extracted text and structured fields from large document sets using AWS Textract, accelerating document processing workflows.
●Managed AWS CloudWatch, EC2, security groups, and Elastic Load Balancing to enhance system reliability and reduce operational issues.
Eitaceis Software Analyst SantaClara, CA Oct 2021 - Dec 2021
●Performed data modeling, statistical analysis, forecasting, data cleaning, processing, and characterization.
●Built dashboards and data stores supporting predictive analytics for risk data.
Alpha Recon Data Scientist Colorado Springs, CO Jul 2021 - Oct 2021
●Performed Time series forecasting for prediction of the Alpha recon stocks.
●Performed data normalization, outlier detection, missing value imputation, and anomaly correction for noisy financial time series
Vtech Solution Business Development Intern Washington Dc, VA Jul 2021 - Oct2021
●Perform gap analysis, impact analysis, process modeling using data engineering process.
Elder Research Capstone Project Richmond Dc, VA Jan 2021 - May 2021
●Built a machine learning pipeline using ADNI clinical and biomarker data to predict Alzheimer’s progression up to 5 years with a Random Forest model achieving 90% accuracy.
●Engineered an all pairs temporal feature technique and deployed predictions through an interactive DASH dashboard for real time patient risk assessment.
Mphasis (HP Subsidiary) Lead Application Engineer Chennai, India Feb 2013 - Aug 2016
●Applied linear regression models to monitor sales channels, improving forecasting accuracy and enabling faster identification of performance deviations.
●Developed a customer segmentation algorithm in R that drove a 22% increase in market share by targeting high value customer groups more effectively.
●Created monthly and quarterly business monitoring reports using advanced SQL joins and system calendars, enhancing leadership’s visibility into trends and operational performance.
HCL Technologies Limited Specialist Chennai, India Dec 2011-Feb 2013
●Delivered 20% higher prediction accuracy through regression analysis for stock performance compared to previous models, enabling more reliable financial forecasting and strategic planning.
●Enabled rapid integration of new analytical modules by building new rest api into the prediction engine, reducing development time for new features.
●Cleaned and organized stock price and industry trend data using Excel functions (VLOOKUP, PivotTables, charts) to support basic performance analysis. Assisted in preparing simple sales forecasts using Excel trendlines and basic statistical formulas.
CSS Corp Pvt. Ltd. Database Support Engineer Chennai, India Nov 2008 - Dec 2011
●Implemented optimized complex SQL queries which involved multiple joins and reduced query execution, accelerating analytics workflows and improving dashboard responsiveness.
●Strengthened data governance and audit readiness through access controls and validation processes.
Infosys Technologies Software Engineer Mysore, India Nov 2008 - Feb 2009
●Developed C++ applications and supported backend data workflows.
EDUCATION
Virginia Commonwealth University
●Master of Science: Decision Analytics, Richmond, VA
●Post Baccalaureate of Data Science, Richmond, VA
Anna University
●Bachelors of Information Technology, Chennai India
CERTIFICATIONS
●Databricks Certified Data Engineer Associate
●Google Data Analytics Professional
●IBM Data Science
●Oracle SQL Certified Professional