PROFESSIONAL SUMMARY
Analytics-focused Data Engineer with 5+ years of experience designing data warehouses, dbt models, and ELT pipelines across AWS, Azure, and GCP. Skilled in SQL, Python, Snowflake, BigQuery, and Databricks, with a proven track record of building scalable datasets, dashboards, and reporting products that support cross-functional stakeholders. Strong in dimensional modeling, documentation, data governance, and stakeholder training, bridging the gap between engineering and business needs.
TECHNICAL SKILLS
•Data Modeling & Warehousing: dbt, Dimensional Modeling (Star/Snowflake Schema), Data Marts, Data Dictionaries, Snowflake, BigQuery, Redshift, Databricks, Synapse
•Programming Languages: SQL (advanced queries, window functions, CTEs, UDFs), Python (Pandas, NumPy, PySpark), Scala, Java, R ; Pipelines & Integration: Airflow, AWS Glue, Azure Data Factory, Fivetran, Informatica, Talend, Apache NiFi, ETL/ELT Development, Reverse ETL
•Big Data & Streaming: Apache Spark, Hadoop, Flink, Hive, Kafka, Kinesis, Pub/Sub, Event Hubs, Beam, Storm
•Visualization & Analytics: Tableau, Power BI, Looker, Mode, Quicksight, Dashboard Design, KPI Development, Self-Service Enablement
•Cloud Platforms: AWS (S3, Lambda, Glue, Redshift, EMR, Kinesis, ECS, DynamoDB, SQS, SNS, IAM, CloudWatch, KMS), Azure (Databricks, Data Lake, Synapse, ADF, Functions, Cosmos DB), GCP (BigQuery, Dataflow, Pub/Sub, Composer, Cloud Storage)
•Databases: SQL Server, Oracle, MySQL, PostgreSQL, Teradata, MongoDB, Cassandra, HBase, IBM DB2, NoSQL
•DevOps & Orchestration: GitHub/GitLab/SVN, Jenkins, Terraform, AWS CloudFormation, Docker, Kubernetes, CI/CD Pipelines
•Data Governance & Documentation: Data Lineage, Data Dictionary, Testing Frameworks, HIPAA/GDPR Compliance, RBAC, Security Best Practices; Soft Skills: Stakeholder Collaboration, Documentation & Knowledge Sharing, Problem-Solving, Critical Thinking, Communication, Time Management
•Search & AI/ML Infra: Modern Search Systems (Elasticsearch/Vespa), Graph/Vector DB concepts, Ontologies, RAG-based Systems, MLOps Standards
WORK EXPERIENCE
Progressive Insurance Data Engineer Mayfield, OH May 2024 – Current,
•Built scalable ingestion pipelines for heterogeneous claims and policy data by optimizing existing AWS pipelines (Kafka, Spark), ensuring freshness, reliability, and versioning for downstream consumption and analytics systems.
•Developed metadata enrichment pipelines in Python and AWS Glue to parse, extract, and tag unstructured claims text (adjuster notes), improving feature richness for semantic search and RAG-based conversational systems.
•Integrated and leveraged systems and models from data science partner teams, designing production ML data pipelines for claims data that adhered to MLOps standards for continuous feature monitoring and model improvement.
•Ensured data quality and observability by building monitoring systems (CloudWatch, Snowflake Streams) to track ingestion of claims data and retrieval health metrics (accuracy, coverage, timeliness).
•Executed conceptual design for a hybrid retrieval system, planning the indexing of structured policy data with an ElasticSearch index to optimize reporting precision and latency for operational users.
•Managed data schemas and metadata for policy documents and assets, enforcing strict versioning and indexing protocols crucial for accurate historical information retrieval.
•Authored and maintained core ETL job codebases in Java and Python handling claims data, demonstrating strong CS fundamentals and code hygiene across the distributed data platform.
•Assisted in defining architectural requirements for future vectorization strategies on unstructured claims data, providing the initial groundwork for a vector database implementation to enhance retrieval relevance.
•Collaborated with product teams to align data infrastructure development with business requirements, ensuring new data assets were immediately accessible and usable for analytics workflows.
•Optimized pipelines using distributed/streaming systems for scale and speed, successfully reducing data latency for analytical consumption of claims data by over 20%.
Optum Data Engineer Bangalore India Sep 2020 –Dec2022
•Collaborated with ML and information retrieval teams to structure and standardize complex patient records, aligning data infrastructure with the needs for clinical models and rich data retrieval.
•Built end-to-end ETL pipelines in Azure Data Factory and SSIS to process 50M+ patient records daily across claims, pharmacy, and eligibility, ensuring accurate inputs for regulatory and executive reporting.
•Designed and implemented a Delta Lakehouse architecture in Databricks with Bronze, Silver, and Gold layers, adding schema evolution, lineage tracking, and audit-ready metadata tagging.
•Partnered with clinicians, actuaries, and business analysts to translate healthcare requirements into data models and transformations, producing datasets that supported outcome tracking and fraud detection.
•Developed semantic layers in Synapse and Databricks that standardized key business definitions like patient outcomes, claim approval rates, and fraud scores across reporting teams.
•Delivered Power BI dashboards for executives that visualized claim rejection trends, pharmacy utilization, and patient health outcomes, cutting reporting delays by 30%.
•Built real-time streaming pipelines using Event Hubs and Structured Streaming, giving operational teams live monitoring of claims rejections and fraudulent activity.
•Implemented Slowly Changing Dimensions (SCD Type 1 and 2) in Synapse and Databricks to maintain accurate historical data for compliance and regulatory reporting.
•Automated promotion of ADF pipelines and Databricks notebooks using Azure DevOps pipelines and terraform, ensuring consistent deployments across environments.
•Authored detailed documentation for ETL jobs, data models, and business metrics, reducing onboarding time and improving knowledge sharing among engineering and analytics teams.
•Partnered with data governance teams to apply HIPAA and GDPR standards, implementing encryption, masking, and secure access controls for sensitive datasets.
•Created pipeline monitoring dashboards with Azure Monitor and Grafana, adding alerts for latency and failure rates, which improved SLA compliance and reduced downtime.
•Supported data science teams by curating datasets for predictive modeling, enabling churn, fraud detection, and risk scoring models to move into production faster.
ADP Data Analyst & Engineering Hyderabad India. Sep 2019 – Sep 202
•Partnered with HR and finance to gather and understand data requirements, creating centralized SQL datasets from payroll and HR records that supported audits and workforce analytics.
•Designed and maintained automated ETL processes using Informatica/Talend, implementing robust error recovery and exception handling for complex payroll data flows.
•Applied strong CS fundamentals and Python proficiency to conduct extensive data profiling, cleansing, and standardization of employee payroll data.
•Created and maintained metadata enrichment pipelines to integrate and standardize HR and payroll data fields, improving the semantic consistency required for future information retrieval systems.
•Designed audit-ready schemas and governance structures for payroll records that implicitly supported the foundational concepts of ontologies by enforcing strict relational integrity.
•Optimized legacy ETL workflows by identifying inefficiencies, redesigning data flows, and improving the overall reliability of monthly and quarterly payroll processing.
•Delivered interactive Business Intelligence dashboards visualized payroll accuracy KPIs, supporting executive decision-making.
•Demonstrated the ability to work independently and with a team across all stakeholders, successfully delivering enhanced functionality across reporting and financial data quality initiatives.
•Wrote complex Advanced SQL queries to support ad-hoc audit requests and regulatory compliance checks for payroll data.
•Assisted in configuring job dependencies within the scheduling system (analogous to Airflow) to manage payroll processing deadlines and ensure data freshness.
PROJECTS
•Weather Impact on Accident Severity – Applied Logistic Regression & Random Forest models on Hadoop to analyze weather-related accident data, improving severity prediction accuracy by 15%.
•Zero to Snowflake AI Data Cloud Quickstart: Built an end-to-end platform on Snowflake using the Tasty Bytes dataset, including warehouses, dynamic tables, and automated ELT pipelines for JSON/CSV/Parquet data. Leveraged Cortex AI for LLM-based summarization, AISQL sentiment analysis, and conversational BI, while applying Horizon governance and integrating third-party datasets for enriched customer analytics.
•Netflix Content Strategy Optimization: Analyzed Netflix viewership data using BigQuery to identify trends in genre popularity, binge patterns, and regional content performance. Developed SQL queries to cluster user watch behaviors and visualized KPIs using Tableau dashboards.
EDUCATION
University of North Texas M.S. Advanced Data Analytics (GPA 3.88/4) Dec 2024
CERTIFICATES & ACHIEVEMENTS
•AWS Certified Data Engineer – Associate( Link) Google Data Analytics & Professional Certified (Link, Link).
•Hacker Rank Python Certificate( Link) Databricks Academy Accreditation - Generative AI Fundamentals(Link)
SIVATEJA D
Data & Analytics Engineer
+1-940-***-**** *********@*****.*** Linkedin