Data Engineer Machine Learning

Location:

Manor, TX

Salary:

80000

Posted:

October 15, 2025

Contact this candidate

Resume:

PRAVASHISH MANDIRAM

Austin, TX 512-***-**** ***********@*****.***

Summary

Data Engineer with 5+ years of experience specializing in building scalable, cloud-native data pipelines and analytics platforms for e-commerce and fintech sectors. Expert in designing and implementing end-to-end ETL processes using PySpark, SQL, and AWS/GCP/Azure services to drive data-driven decision- making. Proven ability to ensure data integrity, optimize data warehousing solutions, and collaborate with cross-functional teams to deliver actionable insights that reduce costs and improve operational efficiency. Skills

• Programming Languages: Python, R, Scala, SQL, PySpark, Java, Git

• Web Development: HTML5, CSS, JavaScript

• Data Warehousing: AWS Redshift, Azure SQL Data Warehouse.

• ETL & Data Modeling: ETL Processes, Data Warehousing, Data Modeling, Apache NiFi, Informatica PowerCenter, Apache Flink, Apache Druid, Apache Beam, Medallion Architecture

• Project Methodologies: Agile, Waterfall Visualization & Reporting: Tableau, Power BI, Excel, SAS, SQL Playground

• Machine Learning: Logistic Regression, Decision Trees, Random Forests, PyTorch, AWS SageMaker

• Statistical Analysis: Linear Regression, ANOVA, Chi-Square

• Python Libraries: NumPy, Pandas, Matplotlib, SciPy, Scikit learn, Seaborn, TensorFlow

• Databases: MySQL, PostgreSQL, SQL Server, AzureDB, MongoDB, Cassandra, NoSQL Databases

• Big Data Technologies: Apache Spark, Apache Hadoop, Apache Kafka

• Cloud Platforms & Services: AWS (S3, Redshift, Glue, EMR, EC2, Lambda), Azure (Data Factory, Databricks, Data Lake, Blob, Cosmos DB, ADLS, Synapse Studio), GCP BigQuery

• Orchestration & Workflow: Apache Airflow, Azure Logic Apps

• Containerization & IaC: Docker, Kubernetes, Terraform

• CI/CD & Version Control: Jenkins, Git

Experience

Data Analytics Engineer Tessolve Semiconductor Inc. Austin, TX 05/2025 – Current

• Architected and deployed a scalable cloud data warehouse on AWS Redshift, consolidating data from 15+ test equipment sources to create a single source of truth for engineering analytics.

• Developed automated ETL pipelines using PySpark and AWS Glue to process and validate over 2TB of daily semiconductor test data, improving data availability for analysis by 95%.

• Engineered a suite of Tableau dashboards for yield analysis and failure mode detection, enabling engineers to identify root causes 40% faster and reducing test cycle time.

• Implemented data quality frameworks and automated anomaly detection alerts, decreasing data integrity issues by 30% and increasing trust in analytical reporting.

• Collaborated with validation engineers to define key performance indicators (KPIs) and translate business requirements into technical specifications for data models.

• Orchestrated complex data workflows using Apache Airflow, ensuring reliable and timely daily batch processing for downstream reporting and machine learning applications.

Data Analytics Engineer Virtue Serve Texas, USA (Remote) 12/2023 – 05/2025

• Delivered data engineering solutions for clients in the e-commerce and retail sectors, focusing on marketing and customer analytics.

• Migrated an on-premise client database to Google BigQuery, optimizing query performance and reducing monthly infrastructure costs by 22%.

• Built real-time data pipelines using SQL and Python to integrate Google Analytics 4 data with CRM platforms, enabling a unified view of customer journey and attribution.

• Automated the generation and distribution of weekly performance marketing reports to stakeholders, saving 15+ person-hours per week and accelerating insight delivery.

• Designed and implemented dimensional data models in BigQuery to support complex analytical queries for customer segmentation and lifetime value (LTV) analysis.

• Partnered with data scientists to productionize a recommendation engine model by building a feature store and serving layer using Dataproc and Cloud Functions. Data Engineer Mindtree Limited Bangalore, India 10/2021 – 07/2022

• Developed and optimized PySpark scripts for processing large-scale financial transaction data, improving the efficiency of a critical daily ETL job by 35%.

• Contributed to the design of a star-schema data warehouse on Azure Synapse Analytics to support business intelligence and regulatory reporting needs.

• Wrote complex SQL queries and stored procedures to transform raw banking data into actionable insights for fraud detection and risk management teams.

• Implemented data validation checks within Azure Data Factory pipelines, ensuring 99.8% accuracy in daily financial reconciliations. Data Engineer OLX Remote, India 09/2020 – 09/2021

• Pioneered the migration of core batch processing jobs from legacy systems to a distributed Spark framework on AWS EMR, reducing data processing latency for ad listing data by 40%.

• Designed and implemented a real-time event tracking pipeline using Kafka and AWS Kinesis to capture 5M+ daily user interactions, enabling the product team to analyze user behavior and personalize the homepage.

• Developed automated data quality frameworks using Great Expectations, identifying and resolving 15% of data discrepancies at ingestion, which significantly improved the reliability of business-critical metrics.

• Optimized performance and cost of Hive and Presto queries by refining table partitioning and bucketing strategies, resulting in a 25% reduction in cloud compute spending for the analytics team.

• Collaborated with data scientists to productionize a machine learning model for ad fraud detection by building a feature engineering pipeline that processed terabytes of historical transaction data.

• Authored technical documentation and runbooks for key data pipelines, standardizing best practices and reducing the onboarding time for new team members by 50%. Database Administrator One Card Remote, India 06/2019 – 09/2020

• Spearheaded the database design and implementation for a new customer loyalty program, creating schemas and writing optimized stored procedures that handled a 50% increase in transaction volume without performance degradation.

• Achieved 99.99% database availability for core PostgreSQL clusters through proactive monitoring, performance tuning, and implementing a robust disaster recovery strategy using WAL archiving and point-in-time recovery.

• Enhanced database security and compliance with PCI-DSS standards by automating vulnerability scans, encrypting sensitive customer PII at rest, and rigorously auditing user access privileges.

• Slashed report generation times for the finance team by 60% by optimizing complex SQL queries and creating materialized views for recurring analytical requests on transaction data.

• Automated routine maintenance tasks such as vacuuming, indexing, and backups using Python scripts, reclaiming 10 hours of manual work per week for the DevOps team.

Education

Master of Engineering: Computer Science

University of Cincinnati

2024

OH, USA

Contact this candidate