Summary
Kamireddy Rajesh Kumar
Data Engineer
Frisco, TX ******.**************@*******.*** 913-***-**** LinkedIn Data Engineer with 4+ years of experience building scalable ETL/ELT pipelines, cloud data platforms, and data warehouse solutions. Skilled in Python, SQL, PySpark, and Apache Spark with hands-on experience across AWS, Azure, BigQuery, and Snowflake ecosystems. Proven expertise in data modeling, pipeline optimization, and automated data quality frameworks to deliver reliable, high-performance analytics systems. Strong collaborator with cross-functional teams to enable data-driven decision-making and enterprise analytics. Skills
Programming & Scripting: Python, SQL, PySpark, Spark SQL, VBA Data Engineering & Processing: ETL/ELT Pipeline Development, Batch & Streaming Data Processing, Data Transformation, Data Validation & Cleansing, Feature Engineering, Data Modeling (Star/Snowflake Schema), Data Quality Frameworks, Data Orchestration
Cloud & Data Platforms: AWS (S3, Redshift, RDS, AWS Glue), Microsoft Azure, Google BigQuery, Snowflake Big Data Technologies: Apache Spark, PySpark, Distributed Data Processing Data Warehousing & Modeling Tools: dbt, BigQuery Modeling, Warehouse Optimization, Query Performance Tuning, Clustering & Indexing Strategies
Databases: PostgreSQL, MySQL, SQL Server, NoSQL Databases BI & Data Consumption: Power BI, Tableau, Curated Data Layers, Data APIs, Reporting Data Pipelines Workflow & Automation: Data Pipeline Scheduling, Automated Monitoring & Alerting, Workflow Automation Methodologies: Agile/Scrum, Cross-Functional Data Platform Collaboration, Requirements Translation for Analytics Platforms
Professional Experience
Data Engineer, CVS Health Feb 2025 – Present
• Built scalable data quality validation frameworks using Python and SQL, automating ingestion checks and improving pipeline data reliability while reducing manual review efforts by 40%.
• Designed and deployed ETL/ELT data pipelines integrating pharmacy, claims, and member interaction datasets using Azure Data Factory, AWS Glue, and Amazon Redshift, reducing data refresh latency by 50%.
• Engineered and optimized large-scale batch and streaming data processing workflows using PySpark and Spark SQL, enabling high-volume behavioral and transaction data processing for downstream analytics systems.
• Developed data transformation and feature engineering pipelines to prepare structured datasets for predictive modeling, segmentation, and operational reporting across enterprise healthcare platforms.
• Implemented data warehouse schema design, indexing strategies, and query performance tuning, improving reporting efficiency and supporting high-concurrency analytical workloads.
• Built automated data monitoring and orchestration workflows using SQL-based validation rules and pipeline scheduling tools to ensure consistent data availability for analytics and business applications.
• Created reusable data APIs and curated datasets for BI platforms such as Power BI and Tableau, enabling faster dashboard development and standardized enterprise reporting layers.
• Collaborated with analytics, clinical, and marketing teams to design scalable data infrastructure solutions, reducing manual data preparation workflows by 60% and improving cross-team data accessibility. Data Engineer, CueTech Systems Jan 2020 – Jul 2023
• Developed automated data validation, cleansing, and transformation pipelines using Python, improving enterprise data quality and reducing reporting discrepancies by 35%.
• Designed and implemented scalable data warehouse models using BigQuery and dbt, enabling efficient KPI tracking for claims processing, authorization workflows, and patient activity analytics.
• Optimized Snowflake warehouse performance through schema redesign, clustering strategies, and query tuning, reducing average query runtime by 45% and improving compute efficiency.
• Built distributed data processing pipelines using Apache Spark and PySpark, transforming large-scale healthcare datasets into production-ready feature sets for downstream predictive systems.
• Implemented end-to-end ETL pipelines integrating SQL and NoSQL data sources to ingest clinical, operational, and patient experience datasets into centralized analytics platforms.
• Engineered automated data ingestion and orchestration workflows across cloud storage systems (Amazon S3, Redshift, RDS), enabling consistent and scalable data availability for reporting environments.
• Developed reusable semantic and curated data layers to support BI tools such as Power BI and Tableau, enabling standardized enterprise reporting and reducing redundant data preparation.
• Automated recurring operational reporting workflows using Power Query, VBA, and pipeline-based scheduling, significantly reducing manual reporting effort and improving reporting timeliness. Education: Master in Computer Science, University of Central Missouri