Data Engineer Big

Location:

Dehradun, Uttarakhand, India

Salary:

90000

Posted:

September 10, 2025

Contact this candidate

Resume:

Tarun Reddy Marri

Data Engineer

+1-940-***-**** ************@*****.*** TX, USA

SUMMARY

Data Engineer with 5+ years of experience designing, building, and optimizing cloud-native data pipelines and enterprise- scale data platforms across healthcare and SaaS domains. Skilled in ETL/ELT, data modeling, streaming, and big data frameworks, with a proven record of delivering scalable, compliant, and cost-efficient solutions. Adept at collaborating with cross-functional teams to enable data-driven decision making and support regulatory requirements (HIPAA, HITRUST, GDPR, SOC 2).

SKILLS

Programming & Automation: Python, SQL, Bash, PowerShell Databases: Oracle, PostgreSQL, SQL Server, MySQL, MongoDB, DynamoDB, Cassandra, Couchbase Data Warehousing: Snowflake, Google BigQuery, Oracle Exadata, AWS Redshift, Azure Synapse Frameworks: Apache Spark, Databricks, Hadoop, Hive, Kafka, Flink, Kinesis, Spark Streaming ETL / Orchestration: Apache Airflow, dbt, Informatica, Talend, ODI Cloud Platforms: AWS (S3, Glue, EMR, Lambda, Redshift, Kinesis), Azure (Data Factory, Databricks, Synapse), GCP (BigQuery, Dataflow, Dataproc, Pub/Sub) DevOps & Infra: Docker, Kubernetes, Terraform, Ansible, Jenkins, GitHub Actions, GitLab CI Monitoring & Quality: Prometheus, Grafana, ELK, Great Expectations Data Governance & Compliance: Apache Atlas, Collibra, Alation, HIPAA, HITRUST, GDPR, SOC 2, ISO 27001, CCPA BI & Analytics: Tableau, Power BI, Looker, Superset EXPERIENCE

Data Engineer

UnitedHealth Group Dec 2023 – Present

• Built ETL pipelines in Databricks (PySpark, SQL) and Apache Airflow, processing 10TB+ of claims/EHR data daily and reducing batch runtime by 40%.

• Developed a lakehouse architecture with Azure Data Lake, Snowflake, and Delta Lake, applying partitioning and indexing to cut query latency by 35%.

• Designed real-time ingestion pipelines using Apache Kafka, Azure Event Hubs, Apache Flink, and GCP Pub/Sub, streaming 200K+ events/sec for patient monitoring dashboards.

• Standardized healthcare datasets using HL7, FHIR APIs, and coding systems (ICD, CPT, SNOMED) to improve interoperability across providers.

• Automated data validation and transformation with dbt and Great Expectations, increasing compliance reporting accuracy from 96% to 99.8%.

• Leveraged AWS S3, Glue, and EMR for batch processing and archival, while using AWS Lambda for serverless transformations, reducing processing costs by 20%.

• Enforced HIPAA, HITRUST, and GDPR compliance with data anonymization, de-identification, RBAC, and encryption-at- rest/in-transit for PHI/PII datasets.

• Partnered with ML teams to deploy fraud detection models via MLflow, Kubernetes, and APIs, saving $2.3M annually in fraudulent claim payouts.

• Streamlined deployments with Terraform, Docker, Kubernetes, and GitHub Actions, cutting pipeline release cycles from 2 weeks to 3 days.

• Established observability with Prometheus, Grafana, and ELK stack, ensuring 99.9% SLA compliance across distributed workflows.

• Collaborated with data scientists, analysts, and business stakeholders, clearly translating technical roadmaps into actionable data products.

• Documented system design and maintained onboarding guides, mentoring junior engineers and enabling faster team ramp-up.

Data Engineer

Oracle Jan 2019 – July 2022

• Built ETL workflows with Informatica, Oracle Data Integrator (ODI), and PL/SQL, reducing manual reconciliation effort by 40%.

• Migrated warehouses to OCI, AWS Redshift/S3, and GCP BigQuery/Dataflow/Dataproc, using Terraform and Ansible, cutting infra costs by 30% and doubling query throughput.

• Delivered streaming analytics pipelines with Kafka, Spark Streaming, Kinesis, and Flume, enabling sub-second churn predictions that lowered churn by 7% YOY.

• Improved performance of Oracle, SQL Server, and PostgreSQL with partitioning, sharding, and indexing, reducing reporting times by 45%.

• Built Hadoop/Hive batch pipelines processing 3B+ daily records, supporting retail demand forecasting accuracy improvements of 15%.

• Engineered customer analytics for churn, upsell, and cross-sell opportunities, increasing retention and expansion revenue across enterprise accounts.

• Created dimensional models in Exadata and SQL Server, powering Power BI, Tableau, and Superset dashboards and boosting C-suite adoption by 20%.

• Implemented data security frameworks to meet SOC 2, ISO 27001, GDPR, and CCPA compliance, including fine-grained access control and tenant-level data segregation for multi-tenant SaaS customers.

• Automated monitoring with Python, Bash, and PowerShell scripts, saving 30+ engineer hours/month.

• Deployed CI/CD pipelines using Jenkins, GitLab CI, and GitHub, reducing release failures by 25%.

• Coordinated delivery of 20+ enterprise data transformation projects with cross-functional teams using Jira, Confluence, and Slack.

EDUCATION

Masters in Information Systems and Technology,

University of North Texas, Denton USA

Bachelors in Computer Science and Engineering,

GITAM University, Hyderabad India

Contact this candidate