Data Engineer Quality

Location:

Denton, TX

Salary:

90,000

Posted:

October 15, 2025

Contact this candidate

Resume:

PROFESSIONAL SUMMARY

Data Engineer with *+ years of proven experience designing and delivering scalable, secure data pipelines on AWS, Azure, and GCP. I specialize in transforming raw data into accurate, trusted insights that drive critical business decisions. My focus is on automating complex workflows, ensuring stringent data quality and governance, and partnering closely with data scientists and business teams to deliver practical, timely solutions. I take full ownership of projects from architecture through deployment delivering reliable, high-impact results that accelerate reporting, enable compliance, and improve operational efficiency.

TECHNICAL SKILLS

Programming & Scripting: Python, SQL, T-SQL, R, Java, JavaScript, HTML, Bash, Unix Shell

Big Data & Cloud Platforms: Apache Spark, Hadoop, Kafka, AWS (S3, EMR, Redshift, Lambda, EC2, Glue, IAM), GCP (BigQuery, Dataflow, Pub/Sub), Azure (Databricks, Data Factory, Synapse, Event Hubs)

ETL & Data Integration: Talend, Informatica, AWS Glue, dbt

Databases & Warehousing: Snowflake, BigQuery, Redshift, SQL Server, PostgreSQL, MySQL, Databricks

Data Quality & Governance: Great Expectations, Monte Carlo, Informatica Data Quality, Apache Atlas, Collibra, GDPR, HIPAA

Orchestration & Containers: Apache Airflow, Docker, Kubernetes

DevOps & CI/CD: Terraform, Jenkins, GitLab CI, GitHub Actions

Visualization & Modeling: Power BI, Tableau, Looker, Excel (DAX), ER Studio

Version Control & Monitoring: Git, Splunk, Grafana

Project Management: Agile, Scrum, JIRA

Soft Skills: Problem-Solving, Collaboration, Critical Thinking, Code Review, Time Management

WORK EXPERIENCE

American Airlines Data Engineer Feb 2024 - Current

Partner with flight operations, dispatch, and data science teams in an Agile Scrum environment to design and deliver real-time data solutions for operational decision-making.

Build a high-volume streaming pipeline using Kafka, Azure Event Hubs, and Snowpipe Streaming, processing 10+ TB/day to feed flight delay prediction models and live operational dashboards.

Develop Spark Structured Streaming jobs in Databricks, applying SQL- based aggregations and business rules and aggregations to reduce delay detection time by 40%.

Create automated data validation scripts in SQL and Python using Great Expectations to detect nulls, duplicates, and schema mismatches, reducing downstream data issues by 90%.

Implemented Monte Carlo anomaly monitoring with Python-based alerts, improving issue identification speed by 30% and cutting resolution time.

Orchestrate ETL workflows in Apache Airflow with Python DAGs, adding validation gates, retries, and Slack-based alerts to improve reliability and operational visibility.

Automate multi-region infrastructure provisioning with Terraform and CI/CD deployments with GitHub Actions, reducing setup time by 65% across dev and prod environments.

Manage and optimize Snowflake and dbt transformation layers, modeling curated datasets with CTEs, window functions, and star schema design and troubleshoot slow-running queries, optimize materializations, and lower query costs by 25%.

Enhance data accessibility and analytics by integrating Microsoft Fabric with Power BI to deliver trusted, business-ready datasets for self-service and operational reporting.

Enforce data governance with Unity Catalog, applying row-level security and audit logging to ensure compliance and secure data access.

Optum Data Engineer Jan 2021 - Dec 2022

Worked with compliance officers, BI teams, and engineers to design HIPAA-compliant data solutions, ensuring secure handling of PHI while supporting patient care optimization and audit readiness.

Built a unified patient data platform on AWS and GCP, consolidating 50M+ records from Redshift, BigQuery, and PostgreSQL, improving reporting speed by 60%.

Developed ETL pipelines using AWS Glue, Google Dataflow, PySpark, Python, and SQL, applying transformations, deduplication, and PHI masking to ensure security and analytics readiness.

Integrated Informatica Data Quality with AI-based anomaly detection, reducing manual review by 80% and achieving 99.9% PHI detection accuracy.

Built multi-cloud streaming pipelines with Kafka, Kinesis, and Pub/Sub, processing 500K+ medical device events per minute for clinical dashboards and ML-driven risk scoring.

Automated metadata and lineage tracking with Apache Atlas, AWS Glue Data Catalog, and IAM, reducing audit preparation time from weeks to days.

Developed Python Flask APIs to provide secure, governed data access for internal applications.

Automated infrastructure provisioning with Terraform and deployments with GitLab CI, reducing setup time from days to hours.

Delivered Tableau dashboards built on curated, trusted datasets, enabling self-service reporting across multiple departments and reducing ad-hoc data requests by 30%.

Capital One Data Analyst May 2019 - Jan 2021

Analyzed large-scale financial datasets with SQL, Python, and T-SQL to deliver actionable insights supporting marketing and finance decision-making.

Built and automated ETL pipelines to clean, unify, and prepare data from multiple sources, significantly improving reporting reliability and enabling near real-time visibility across teams.

Created interactive dashboards and self-service reports in Power BI, Tableau, and Excel (DAX), driving a 15% boost in customer engagement through A/B testing and data-driven marketing strategies.

Established data governance standards data dictionaries, source-to-target documentation, and validation processes cutting data inconsistencies by 25%.

Collaborated with data engineers to refine data models and optimize complex queries, improving analytics performance and scalability.

Supported Agile sprint planning and backlog grooming using JIRA, ensuring delivery aligned with business priorities and timelines.

EDUCATION

Masters in Advance data Analytics (GPA 3.9)

The University of North Texas, Denton, TX

PROJECTS

Data Migration Java, Spring boot, Apache Kafka, Cockroach DB, Microservice

Implemented a data migrator service to extract records from legacy mainframe DB, upload them to S3 storage and dispatch the object’s metadata to configured Kafka topics, transferring 350 million data records within 2 hours.

Architectured a data pipeline to transition live data events from legacy databases to targetDB, the transactions are pushed to RabbitMQ and published to Kafka, and then stored in the targetDB, ensuring real-time accuracy.

Azure Movie Recommender Azure Databricks (Spark ML), Azure Data Factory, Azure Logic Apps

Built a personalized recommendation system using collaborative filtering in Spark ML, processing 500K+ movie ratings to achieve 85% match accuracy in testing.

Automated rating data ingestion through Azure Data Factory and orchestrated recommendations delivery with Azure Logic Apps for seamless end to end deployment.

Collision Severity Analysis Python, Scikit learn, Pandas, Matplotlib

Applied Logistic Regression and Decision Trees on roadway incident datasets to predict collision severity, improving classification accuracy to 78% after feature engineering and hyperparameter tuning.

Identified top 5 contributing factors through statistical analysis and visualizations, providing actionable insights for traffic safety improvements.

CERTIFICATES

Google Data Analytics Certification (link)

AWS Certified Data Engineer (link)

Microsoft Certified Power BI Data Analyst Associate (link)

Databricks Certified Data Engineer Associate (In Progress)

940-***-**** ****************@*****.*** LinkedIn Portfolio

AMITHA PEDDIREDDY

Contact this candidate