Post Job Free
Sign in

Data Engineer Big

Location:
Newark, DE
Salary:
70000
Posted:
October 15, 2025

Contact this candidate

Resume:

SAI THARUN BODLA

Delaware, USA +1-302-***-**** ***********@*****.***

SUMMARY

Data Engineer with 3 years of experience designing and optimizing data pipelines, building scalable ETL workflows, and developing robust data models across cloud platforms. Skilled in Python, SQL, and big data technologies with expertise in data integration, governance, and visualization to support analytics, compliance, and enterprise-wide decision-making. SKILLS

Programming Languages: Python (pandas, NumPy, PySpark), SQL, R, CSS, HTML, JavaScript Big Data & Distributed Systems: Apache Spark, Hadoop (HDFS, MapReduce), Hive, Pig, Kafka, Flink, AWS EMR, Databricks Data Warehousing & Databases: Redshift, Snowflake, Google BigQuery, Teradata, PostgreSQL, MySQL, MongoDB, Cassandra ETL & Data Integration: Apache NiFi, Talend, Informatica, SSIS, Airflow, Luigi, AWS Glue Cloud Platforms & Services: Azure (Data Factory, Synapse Analytics), Microsoft Fabric, AWS (S3, Lambda, EC2, RDS, Kinesis) Data Modeling & Architecture: Star Schema, Snowflake Schema, Dimensional Modeling, Data Lake & Lakehouse Design, OLTP & OLAP

DevOps & CI/CD: Git, GitHub, Jenkins, Docker, Kubernetes, Terraform Business Intelligence & Visualization: Tableau, Power BI Workflow Orchestration & Automation: Apache Airflow, Prefect, AWS Step Functions, Cron Jobs Data Quality & Governance: Data Validation, Data Profiling, Data Lineage, GDPR & HIPAA Compliance, Metadata Management Other Tools & Frameworks: Jupyter Notebook, VS Code, PyCharm, Excel (Advanced), REST APIs, JSON, XML WORK EXPERIENCE

CVS Health Aug 2024 – Present

Data Engineer Delaware, USA

• Developed ETL pipelines in Azure Databricks using SQL and Python to ingest and transform over 5TB of daily pharmacy claims and eligibility data, ensuring timely availability for analytics and regulatory reporting.

• Established a Delta Lake architecture to consolidate data from pharmacy, provider, and member systems, improving interoperability across 15 enterprise applications and reducing manual reconciliation efforts.

• Built Informatica-based validation workflows with more than 50 automated business rules, which decreased claims processing errors by 30% and strengthened data compliance.

• Designed Snowflake data models with optimized partitioning and clustering strategies, enabling actuarial and finance teams to run complex queries 50% faster.

• Streamlined releases by implementing CI/CD pipelines with Azure DevOps, cutting deployment cycles from two weeks to four days and improving delivery consistency.

• Delivered curated datasets into Power BI dashboards, giving executives visibility into PBM performance metrics and driving $12M in annual operational savings through data-driven decisions. SoftAge Group Jul 2021 – Jul 2023

Data Engineer India

• Developed automated ingestion pipelines using Python, SQL, and Apache Airflow to capture and schedule the flow of customer KYC documents from Paytm’s onboarding systems into centralized storage, ensuring reliability and reducing manual handling.

• Applied OCR and image-processing techniques with Tesseract and OpenCV to convert scanned ID proofs into structured text fields (name, date of birth, address, ID number), enabling seamless metadata integration into compliance workflows.

• Designed data warehouse schemas in AWS Redshift and PostgreSQL, organizing millions of KYC records into optimized tables that improved query performance and reduced compliance reporting time from several hours to under 20 minutes.

• Built Kafka-based streaming pipelines to process real-time KYC events such as submission, validation, and approval, providing operations teams with live monitoring of document flow and improving turnaround time for verification.

• Implemented validation frameworks with Great Expectations and custom Python scripts to enforce schema integrity and completeness checks, lowering document rejection rates by more than 10% across the pipeline.

• Strengthened security and compliance controls by integrating AWS KMS encryption, IAM role-based access, and automated audit logging, ensuring secure handling of PII and meeting all RBI KYC regulatory requirements. PROJECTS

Symptom-Based Disease Prediction and Ranking System Jan 2024 – Apr 2024

• Designed and implemented a machine learning solution that predicts and ranks potential diseases based on user symptoms. Applied Naive Bayes, Decision Tree, and Random Forest on real healthcare datasets, achieving 80% accuracy with an optimized Random Forest model. Built a user-friendly interface to display ranked predictions with confidence scores. Tech Stack: Python, Scikit-learn, Pandas, NumPy, Matplotlib, Random Forest, Decision Tree, Naive Bayes, Tkinter/Streamlit. EDUCATION

Master of Science in Data Science - University of Delaware, Newark, Delaware, USA GPA: 3.7/4 Bachelor of Technology in Computer Science and Engineering - TKR College of Engineering and Technology, Hyderabad, India GPA: 3.5/4



Contact this candidate