Data Engineer Data Quality & Verification Specialist

Location:

Fort Myers, FL

Posted:

June 12, 2026

Contact this candidate

Resume:

Ananth Ananthula

New York City, NY +1-667-***-**** **************@*****.*** LinkedIn

SUMMARY

Data Engineer with 5+ years of experience in data verification, data quality assurance, database management, and record accuracy across healthcare and finance domains. Experienced reviewing and validating high-volume records for accuracy, consistency, completeness, duplicates, missing values, and source-to-target mismatches before they affect reporting or operations. Strong background maintaining organized digital records, documenting quality checks, protecting confidential PHI/PII and financial information, and using spreadsheets, SQL, Python, and database systems to resolve discrepancies. Comfortable working independently in remote production-support settings, communicating data issues clearly, and giving business teams practical updates on data quality metrics, audit results, and records-management status.

PROFESSIONAL EXPERIENCE

Humana, Data Engineer, Data Quality & Verification Jan 2024 – Present

• Reviewed and verified claims, eligibility, pharmacy, payer operations, and member-support datasets across AWS, Snowflake, and Databricks, processing ~4.6M records/day and improving priority dataset availability from ~3 hours to ~40 minutes during peak refresh windows.

• Led data verification and X12 / EDI ingestion standardization for 837, 835, 834, and 270/271 healthcare feeds by implementing schema contracts, normalization rules, reject routing, and source-to-target reconciliation controls across payer records.

• Engineered idempotent Python, PySpark, and SQL checks using natural keys, surrogate keys, checksum validation, and incremental MERGE/upsert logic to identify duplicate records, late-arriving corrections, and rerun-safe production loads.

• Designed curated Snowflake dimensional models for member, provider, plan, claim, eligibility, and pharmacy subject areas using SCD Type 2, audit columns, and payer business-rule mappings for reporting and operational records management.

• Set up data quality assurance routines in Airflow, dbt, Great Expectations, and SQL validation suites to check accuracy, consistency, completeness, control totals, dependency handling, retries, SLA tracking, and recovery across Dev, QA, and Production.

• Validated priority near-real-time healthcare events through Snowpipe Streaming and scheduled COPY loads, checking record counts, required fields, duplicate keys, and reconciliation totals while controlling warehouse and ingestion costs.

• Maintained organized digital records and documentation for pipeline runs, quality exceptions, rejected files, rerun decisions, and audit evidence so analysts and operations teams could trace how discrepancies were resolved.

• Generated recurring data quality metrics and status updates for stakeholders, covering missing records, rejected records, duplicate keys, load freshness, reconciliation breaks, and open follow-ups with source-system teams.

• Protected confidential PHI/PII datasets by applying Snowflake RBAC, Unity Catalog, masking policies, IAM, KMS encryption, Lake Formation controls, and access review support for HIPAA-aligned handling of sensitive records.

• Supported internal teams in resolving data-related issues by tracing source files, comparing database records, reviewing audit columns, and explaining mismatch patterns in clear written updates for remote production-support workflows.

Environment: AWS S3, Glue, Lambda, MWAA, Snowflake, Snowpipe, Snowpipe Streaming, Databricks, Delta Lake, Python, PySpark, SQL, dbt, Apache Airflow, Great Expectations, REST APIs, Parquet, JSON, Excel, spreadsheets, database systems, record validation, data verification, data entry validation, reconciliation, control totals, duplicate checks, missing-data checks, CloudWatch, SNS, PagerDuty, Lake Formation, Snowflake RBAC, Unity Catalog, IAM, KMS, PHI/PII, X12/EDI, 837, 835, 834, 270/271

Aster, Associate Data Engineer Feb 2021 – Jun 2023

• Owned the design of an Azure Databricks and Delta Lake finance lakehouse, organizing Confirm, Refine, and Publish layers to support revenue, billing, collections, and settlement data products with controlled schema evolution, incremental upserts, and audit-ready reporting.

• Built ingestion pipelines using Azure Data Factory, ADLS Gen2, and Auto Loader to onboard transactional finance data from ERP, billing, and Teradata/SQL source systems into curated layers with reliable freshness and rerun-safe recovery.

• Validated finance records with SQL and PySpark checks for schema consistency, required fields, duplicate transactions, control totals, and source-to-target reconciliation before month-end reporting outputs were published.

• Drove CDC replication using HVR to keep priority finance dashboards and downstream reporting tables within sub-25-minute freshness SLAs for business, operations, and leadership reporting.

• Developed high-volume transformations in Azure Databricks using Spark, PySpark, and SQL, implementing deterministic backfills, deduplication, and incremental load patterns to process ~3.2M records/day without rerun drift.

• Standardized curated outputs in Parquet with compaction, partition optimization, and workload-aware table design, reducing small-file overhead and improving performance for scan-heavy finance analytics and month-end close reporting.

• Optimized serving layers in Azure Synapse Analytics and downstream finance marts for Power BI by applying dimensional modeling, partition pruning, and query-tuning patterns to improve reliability on large revenue and reconciliation datasets.

• Documented exception handling, rejected loads, data corrections, and refresh notes in reusable runbooks, helping remote and offshore teams resolve recurring finance data issues without reopening the same investigation.

Environment: Azure Data Factory, ADLS Gen2, Azure Databricks, Delta Lake, Auto Loader, dbt, Databricks Workflows, Azure Synapse Analytics, Spark, PySpark, SQL, HVR, Unity Catalog, Power BI, Excel, spreadsheets, ERP records, billing records, reconciliation, duplicate checks, control totals, data validation, records documentation, Azure DevOps, Git

Aster, Data Engineer Intern Aug 2020 – Feb 2021

• Supported ingestion of finance, billing, and reconciliation source data from ERP extracts, SQL systems, spreadsheets, and flat files into ADLS Gen2, helping establish the base layer for revenue and settlement reporting.

• Built and maintained parameterized Azure Data Factory pipelines to move raw transactional data into curated storage zones with dependency handling, scheduling, and rerun-safe execution.

• Assisted in developing transformation workflows in Azure Databricks using Spark, PySpark, and SQL to clean, deduplicate, and standardize ~0.9M rows/day for downstream finance analytics and reporting.

• Implemented validation checks for schema consistency, file integrity, missing values, and control totals, catching mismatches before downstream reporting and reducing manual rework during recurring cycles.

• Maintained organized load notes, issue logs, and reconciliation documentation for recurring refreshes, giving senior engineers a clear trail when files failed or source data arrived incomplete.

• Assisted with baseline data quality monitoring using Python and SQL, improving early detection of incomplete loads, mismatches, and reporting gaps across finance source feeds.

Environment: Azure Data Factory, ADLS Gen2, Azure Databricks, Azure Synapse Analytics, Spark, PySpark, SQL, Python, Excel, spreadsheets, flat files, Parquet, control totals, data entry validation, missing-data checks, duplicate checks, issue logs, Azure DevOps, Git

SKILLS

Data Verification & Quality Assurance: Data verification, data quality assurance, data entry validation, accuracy checks, consistency checks, completeness checks, duplicate detection, missing-data review, control totals, reconciliation, exception handling

Database & Records Management: Database systems, records management, organized digital documentation, audit evidence, source-to-target mapping, SQL queries, dimensional records, record corrections, refresh logs

Spreadsheets & Reporting: Microsoft Excel, spreadsheets, Power BI, Tableau, data quality metrics, status updates, audit summaries, discrepancy reports, operational dashboards

Confidential Data Handling: PHI/PII handling, HIPAA-aligned controls, RBAC, masking policies, IAM, KMS encryption, access reviews, responsible handling of sensitive business information

Programming & Automation: Python, PySpark, SQL, Shell Scripting, REST APIs, dbt, Great Expectations, validation suites, reusable scripts

Cloud & Data Platforms: AWS, Azure, Snowflake, Databricks, Delta Lake, Azure Synapse Analytics, ADLS Gen2, S3, PostgreSQL, Teradata, DynamoDB

ETL & Workflow Support: Azure Data Factory, Apache Airflow, MWAA, Auto Loader, Snowpipe, Databricks Workflows, HVR, batch pipelines, near-real-time ingestion, rerun-safe loads

Work Style: Remote collaboration, independent task ownership, written communication, time management, problem solving, production support, Agile/Scrum

PROJECTS

Real-Time Cloud Monitoring Anomaly Detection and Risk Dashboard

• Built a streaming pipeline for infrastructure performance metrics using AWS Kinesis and AWS Lambda, persisting enriched and labeled outputs in DynamoDB for low-latency retrieval.

• Engineered features in Python and applied unsupervised learning (K-Means, DBSCAN) to cluster normal resource behavior and flag anomalies without static threshold rules.

• Designed a risk-scoring model that maps cluster patterns to operational states (idle, normal, high-load, critical) and powers a dashboard that prioritizes high-impact alerts for faster triage.

AI Voice Ordering Assistant for Restaurants

• Built an end-to-end voice ordering assistant using Silero VAD, speech-to-text, an LLM (LLaMA 3.1), and text-to-speech (Kokoro), supporting multi-turn conversations with confirmation and correction flows.

• Developed a low-latency FastAPI service that orchestrates transcription, LLM inference, and speech synthesis with concurrent sessions, timeouts, retries, and fallback handling for interruptions and partial utterances.

• Integrated with a POS-style workflow backed by SQLite to retrieve menus dynamically, validate items and modifiers, generate structured JSON orders, and add tracing and structured logging for production-grade observability.

EDUCATION

University of Maryland, MD Master of Science, Data Science GPA: 3.9/4.0 May 2025

CERTIFICATIONS

AWS Solutions Architect – Associate

Databricks Certified Data Engineer – Associate

Microsoft Certified Fabric Data Engineer – Associate

Contact this candidate