Tarun Reddy Marri
Data Engineer
+1-940-***-**** ************@*****.*** TX, USA
SUMMARY
Data Engineer with 5+ years of experience designing, building, and optimizing cloud-native data pipelines and enterprise- scale data platforms across healthcare and SaaS domains. Skilled in ETL/ELT, data modeling, streaming, and big data frameworks, with a proven record of delivering scalable, compliant, and cost-efficient solutions. Adept at collaborating with cross-functional teams to enable data-driven decision making and support regulatory requirements (HIPAA, HITRUST, GDPR, SOC 2).
SKILLS
Programming & Automation: Python, SQL, Bash, PowerShell Databases: Oracle, PostgreSQL, SQL Server, MySQL, MongoDB, DynamoDB, Cassandra, Couchbase Data Warehousing: Snowflake, Google BigQuery, Oracle Exadata, AWS Redshift, Azure Synapse Frameworks: Apache Spark, Databricks, Hadoop, Hive, Kafka, Flink, Kinesis, Spark Streaming ETL / Orchestration: Apache Airflow, dbt, Informatica, Talend, ODI Cloud Platforms: AWS (S3, Glue, EMR, Lambda, Redshift, Kinesis), Azure (Data Factory, Databricks, Synapse), GCP (BigQuery, Dataflow, Dataproc, Pub/Sub) DevOps & Infra: Docker, Kubernetes, Terraform, Ansible, Jenkins, GitHub Actions, GitLab CI Monitoring & Quality: Prometheus, Grafana, ELK, Great Expectations Data Governance & Compliance: Apache Atlas, Collibra, Alation, HIPAA, HITRUST, GDPR, SOC 2, ISO 27001, CCPA BI & Analytics: Tableau, Power BI, Looker, Superset EXPERIENCE
Data Engineer
UnitedHealth Group Dec 2023 – Present
• Built ETL pipelines in Databricks (PySpark, SQL) and Apache Airflow, processing 10TB+ of claims/EHR data daily and reducing batch runtime by 40%.
• Developed a lakehouse architecture with Azure Data Lake, Snowflake, and Delta Lake, applying partitioning and indexing to cut query latency by 35%.
• Designed real-time ingestion pipelines using Apache Kafka, Azure Event Hubs, Apache Flink, and GCP Pub/Sub, streaming 200K+ events/sec for patient monitoring dashboards.
• Standardized healthcare datasets using HL7, FHIR APIs, and coding systems (ICD, CPT, SNOMED) to improve interoperability across providers.
• Automated data validation and transformation with dbt and Great Expectations, increasing compliance reporting accuracy from 96% to 99.8%.
• Leveraged AWS S3, Glue, and EMR for batch processing and archival, while using AWS Lambda for serverless transformations, reducing processing costs by 20%.
• Enforced HIPAA, HITRUST, and GDPR compliance with data anonymization, de-identification, RBAC, and encryption-at- rest/in-transit for PHI/PII datasets.
• Partnered with ML teams to deploy fraud detection models via MLflow, Kubernetes, and APIs, saving $2.3M annually in fraudulent claim payouts.
• Streamlined deployments with Terraform, Docker, Kubernetes, and GitHub Actions, cutting pipeline release cycles from 2 weeks to 3 days.
• Established observability with Prometheus, Grafana, and ELK stack, ensuring 99.9% SLA compliance across distributed workflows.
• Collaborated with data scientists, analysts, and business stakeholders, clearly translating technical roadmaps into actionable data products.
• Documented system design and maintained onboarding guides, mentoring junior engineers and enabling faster team ramp-up.
Data Engineer
Oracle Jan 2019 – July 2022
• Built ETL workflows with Informatica, Oracle Data Integrator (ODI), and PL/SQL, reducing manual reconciliation effort by 40%.
• Migrated warehouses to OCI, AWS Redshift/S3, and GCP BigQuery/Dataflow/Dataproc, using Terraform and Ansible, cutting infra costs by 30% and doubling query throughput.
• Delivered streaming analytics pipelines with Kafka, Spark Streaming, Kinesis, and Flume, enabling sub-second churn predictions that lowered churn by 7% YOY.
• Improved performance of Oracle, SQL Server, and PostgreSQL with partitioning, sharding, and indexing, reducing reporting times by 45%.
• Built Hadoop/Hive batch pipelines processing 3B+ daily records, supporting retail demand forecasting accuracy improvements of 15%.
• Engineered customer analytics for churn, upsell, and cross-sell opportunities, increasing retention and expansion revenue across enterprise accounts.
• Created dimensional models in Exadata and SQL Server, powering Power BI, Tableau, and Superset dashboards and boosting C-suite adoption by 20%.
• Implemented data security frameworks to meet SOC 2, ISO 27001, GDPR, and CCPA compliance, including fine-grained access control and tenant-level data segregation for multi-tenant SaaS customers.
• Automated monitoring with Python, Bash, and PowerShell scripts, saving 30+ engineer hours/month.
• Deployed CI/CD pipelines using Jenkins, GitLab CI, and GitHub, reducing release failures by 25%.
• Coordinated delivery of 20+ enterprise data transformation projects with cross-functional teams using Jira, Confluence, and Slack.
EDUCATION
Masters in Information Systems and Technology,
University of North Texas, Denton USA
Bachelors in Computer Science and Engineering,
GITAM University, Hyderabad India