Data Engineer Machine Learning

Location:

Denton, TX, 76201

Salary:

70000

Posted:

September 10, 2025

Contact this candidate

Resume:

Mounika Veeramachaneni Data Engineer

**********************@*****.*** (940) 354 – 4123 USA LinkedIn Summary

Skilled Data Engineer with 3+ years of strong expertise in building scalable data pipelines, optimizing data architectures, and ensuring data quality across cloud platforms. Proficient in ETL development, data modeling, and performance tuning using modern tools and frameworks. Adept at collaborating with cross-functional teams to deliver reliable, analytics-ready data that supports business intelligence, reporting, and machine learning initiatives. Technical Skills

• Cloud Platforms & Services: AWS (S3, Glue, Lambda, CloudWatch, Redshift), Microsoft Azure (Data Lake Storage Gen2, Purview, Monitor, Synapse Analytics), GCP (BigQuery, Dataflow)

• Data Engineering & ETL: Apache Spark, PySpark, Apache Airflow, SQL, Scala, Hadoop, AWS Glue, ADF, DBT, Informatica

• Data Warehousing & Lakehouse: Delta Lake, Snowflake, Amazon Redshift, Azure Synapse, Databricks, Google BigQuery

• Data Modeling & Architecture: Dimensional Modeling, Star & Snowflake Schema, Slowly Changing Dimensions (SCD Type 2), Partitioning, Schema Evolution, Data Lakehouse Architecture

• Programming Languages: Python, SQL, Scala, Shell Scripting

• Data Quality & Testing: Great Expectations, dbt Tests, Custom Validation Scripts, Unit Testing, Data Profiling

• Monitoring, Logging & Orchestration: Apache Airflow, AWS CloudWatch, Databricks Ganglia, Spark UI, Azure Monitor

• Data Governance & Lineage: Azure Purview, Collibra, Metadata Management, Data Cataloging

• Machine Learning Integration: ML Feature Engineering, Model Inference Pipelines, Integration with XGBoost, Scikit-learn

• Business Intelligence & Visualization: Power BI, Tableau, Looker, Seaborn, Matplotlib, Streamlit

• Soft Skills & Collaboration: Agile Methodologies, Cross-functional Team Collaboration, Documentation (Confluence), Stakeholder Communication, Problem Solving

Professional Experience

Data Engineer, MetLife 10/2024 – Present Remote, USA

• Worked on customer data platform design by collaborating with underwriters and analysts during requirement sessions, integrating policy, claims, and CRM data, which enabled 360-degree views and improved retention and cross-sell conversion by 40%.

• Developed ETL pipelines using Apache Spark, AWS Glue, and SQL to ingest and transform structured and semi-structured data from legacy insurance systems, which reduced data latency and improved processing speed by 30%.

• Modeled curated datasets on Delta Lake using partitioning and Slowly Changing Dimensions Type 2 to support actuarial analysis, underwriting reports, and executive dashboards while ensuring data integrity and fast query performance.

• Designed and deployed ETL pipelines in PySpark to feed production-ready datasets into an XGBoost-based lapse prediction model, focusing on feature engineering, schema consistency, and data validation with full documentation in Confluence.

• Automated regulatory reporting for NAIC and DOL using Python, Pandas, SQL, and AWS Lambda which reduced manual reporting effort by 65% and improved the accuracy and timeliness of monthly submissions.

• Implemented data pipeline monitoring using AWS CloudWatch and Power BI and documented DAGs and job logic in Git and Confluence which reduced incident response time by 50% and increased operational transparency. Data Engineer, eBay 01/2022 – 07/2023 Remote, India

• Architected and optimized a scalable Data Lakehouse solution leveraging Azure Data Lake Storage Gen2 and Databricks (Apache Spark), handling ingestion and transformation of 2TB+ e-commerce data weekly to support analytics and ML workloads.

• Developed and maintained complex ETL/ELT pipelines using PySpark, Scala, and Delta Lake, enabling efficient batch and incremental data processing with ACID compliance, schema enforcement, and time travel capabilities.

• Implemented automated data governance and metadata management using Azure Purview and custom metadata-driven workflows, ensuring end-to-end data lineage, cataloging, and compliance with enterprise data policies.

• Monitored pipeline health and optimized query performance via Azure Monitor, Databricks Ganglia, and Spark UI, reducing job runtimes by 30%, achieving 99.9% system uptime, and improving cost-efficiency through cluster auto-scaling and job parallelism.

• Collaborated with data scientists, analysts, and product teams to design efficient data partitioning, z-order indexing, and schema evolution strategies, enhancing query speed by 40% and enabling seamless adaptation to evolving data sources. Jr Data Engineer, eBay 01/2021 – 12/2021 Remote, India

• Assisted in building and maintaining batch ETL pipelines using Python, Apache Spark, and Hadoop, processing large-scale transactional data to prepare clean, reliable datasets for reporting and analytics teams.

• Supported data validation efforts by implementing automated tests with Great Expectations and collaborating with senior engineers to troubleshoot data pipeline issues, improving data accuracy and pipeline stability. Education

University of North Texas — Denton, Texas, USA

Master of Science in Computer and Information Science 08/2023 – 05/2025 Vel Tech University — Chennai, India

Bachelor of Engineering in Information Technology 07/2019 – 05/2023 Projects

Customer Churn Prediction Web Application

• Engineered and deployed a real-time churn prediction model using Scikit-learn, Streamlit, and Pickle, reducing manual analysis by 70% and tripling high-risk user detection to support personalized retention strategies. Flight Delay Analytics and Prediction for Enhanced Operational Efficiency

• Queried and processed 30M+ flight records using Hadoop Hive and AWS S3, then designed a predictive pipeline with RapidMiner and Alter AI Studio (98% accuracy) and visualized insights using Seaborn and Matplotlib.

Contact this candidate