Chandra Reddy
Sr Data Engineer
***********@*****.***
https://www.linkedin.com/in/chandra-reddy-244645268/
PROFESSIONAL SUMMARY
• Experienced Data Engineer with 8+years of expertise in designing, optimizing, and automating scalable data pipelines.
• Proficient in cloud-based data platforms, ETL development, real-time analytics, and machine learning-driven solutions to enhance business operations and decision-making.
• Strong background in data governance, security compliance, and infrastructure automation.
• Expertise in data warehousing, ETL optimization, and real-time data processing using Apache Spark, Kafka, and cloud-native solutions.
• Proven ability to migrate and modernize on-premise data architectures to cloud platforms, reducing operational costs and improving performance.
• Experience in implementing CI/CD automation for data pipelines using Terraform, Azure DevOps, and GitLab CI/CD.
• Skilled in developing data visualization solutions using Power BI, Looker, and Tableau to drive business insights.
• Expertise in ETL/ELT architecture, data integration, and SQL Server Integration Services (SSIS).
• Adept at building and optimizing scalable data pipelines, developing complex SQL queries and stored procedures, and managing enterprise-wide data workflows.
• Strong background in data warehousing, ETL optimization, and real-time analytics using tools such as Apache Spark, Databricks, and Snowflake.
• Adept at working in cross-functional teams, collaborating with business stakeholders, and mentoring junior engineers.
• Experience with large-scale data processing frameworks like Databricks, Snowflake, and Google BigQuery.
• Strong proficiency in Python, SQL, and Scala for data engineering and analytics solutions.
• Expertise in implementing data governance, lineage tracking, and security frameworks for enterprise-wide data compliance.
• Proven track record in developing data pipelines for real-time streaming applications using Kafka and Spark Streaming.
• Experience in building predictive analytics models using ML frameworks and integrating them into data pipelines.
• Optimized big data architectures to improve performance, reduce costs, and support large-scale business analytics.
• Strong background in automating monitoring and alerting mechanisms for data workflows to ensure operational excellence.
• Architected and implemented data-driven solutions that improved operational efficiency, reducing costs and time-to-insight.
• Adept at building and optimizing scalable data pipelines using BigQuery, designing robust data models, and integrating data from ERP systems to support analytics and business intelligence.
• Developed cost-effective cloud architectures that enhanced resource utilization and minimized redundant infrastructure.
• Enhanced data quality through automated data profiling, anomaly detection, and data cleansing processes.
• Applied advanced data security practices, ensuring regulatory compliance across multiple industries (HIPAA, GDPR, etc.).
• Mentored junior engineers on best practices for data engineering, cloud solutions, and ML/AI integration
AREAS OF EXPERTISE
Big Data Technologies: Apache Spark (PySpark), Apache Kafka, Databricks, AWS Kinesis, Google Cloud Dataflow.
Data Governance & Compliance: HIPAA, GDPR, CCPA, FCRA, AWS Lake Formation.
Machine Learning: BigQuery ML, Databricks ML, Azure Machine Learning.
Data Visualization: Power BI, Looker, Adobe Analytics.
Cloud Platforms: AWS (Glue, Lambda, S3, RDS, Kinesis, IAM), Snowflake, GCP (BigQuery, Dataflow), Azure
Languages: Python, SQL, Scala, Shell
ERP Systems: SAP HANA, Oracle ERP integration
ETL Tools: Apache Spark (PySpark), Apache Kafka, Azure Data Factory, Google Cloud Dataflow (Apache Beam), dbt, Informatica.
CI/CD Tools: Azure DevOps (YAML Pipelines), GitLab CI/CD, Terraform, CloudFormation.
Data Warehousing: Snowflake, AWS Redshift, Google BigQuery, Azure Synapse Analytics.
Monitoring & Logging: CloudWatch, Stackdriver, Datadog, Splunk, Azure Monitor
EDUCATION:
• Master’s in Computer Science Western Illinois University, Macomb, IL Dec-2016
• Bachelor of Technology in Computer Science Jawaharlal Nehru Technological University, India May-2015
WORK EXPERIENCE:
Client: CVS Health, Illinois May 2023 to Present Role: Senior Data Engineer
• Developed scalable ETL pipelines using Google Cloud Dataflow (Apache Beam) and BigQuery to process prescription transactions, patient claims, and insurance data.
• Engineered real-time fraud detection and prescription validation using Google Pub/Sub and Apache Kafka, improving patient safety.
• Led migration of on-prem healthcare data warehouses to Google BigQuery, reducing latency by 50% and cutting operational costs.
• Built HIPAA-compliant workflows with Google Cloud Healthcare API and FHIR, ensuring regulatory compliance.
• Designed Looker dashboards with BigQuery ML to track patient adherence, improving pharmacy outreach and engagement.
• Automated claims processing ETL pipelines using Cloud Composer (Airflow) and dbt, enhancing reconciliation accuracy.
• Managed CI/CD workflows with Terraform and Google Cloud Build, streamlining deployments and infrastructure consistency.
• Monitored and optimized data workflows using Stackdriver and Datadog, ensuring high pipeline reliability.
• Integrated Google Cloud Storage with external data sources to enable seamless data exchange between providers and insurers.
• Implemented automated anomaly detection using BigQuery ML to identify potential prescription fraud.
• Designed batch and real-time ingestion frameworks for high-volume healthcare transactions, ensuring data accuracy and timeliness.
• Conducted performance tuning of BigQuery queries and partitioning strategies, reducing query costs by 30%.
• Developed patient segmentation models to enhance targeted healthcare interventions and improve patient outcomes.
• Collaborated with security teams to implement advanced encryption techniques for healthcare data at rest and in transit.
• Led the integration of predictive models for patient care using BigQuery ML, improving medication adherence predictions by 20%.
• Automated cloud-based reporting solutions, resulting in a 40% reduction in manual intervention and accelerating business insights delivery.
• Developed healthcare-specific data models to optimize drug pricing and benefit management, driving a 15% reduction in insurance claim processing errors.
• Collaborated with cross-functional teams to streamline data pipelines for multi-source ingestion, improving data availability for real-time decision-making.
• Conducted a comprehensive cost audit of existing data infrastructure, identifying opportunities for cost savings through cloud resource optimization, resulting in a 25% reduction in data processing costs.
• Designed and implemented scalable data workflows for processing healthcare claims, reducing processing time by 30% while ensuring HIPAA compliance.
• Developed and maintained complex ETL pipelines using Python and GCP Dataflow, processing healthcare claims and prescription data.
• Led root cause analysis for data anomalies in pipelines; developed automated break/fix scripts and monitoring alerts via Stack driver and Datadog.
• Architected secure, scalable Snowflake and Big Query data warehouses; implemented partitioning strategies to optimize performance.
• Built alerting and anomaly detection systems using Big Query ML, proactively addressing fraud and data quality issues.
• Developed HIPAA-compliant ingestion pipelines integrated with real-time data sources using Kafka and Google Pub/Sub.
• Automated infrastructure with Terraform and streamlined CI/CD using Cloud Build and Git.
• Partnered with stakeholders to ensure data integrity across internal dashboards (Looker), improving patient engagement metrics by 20%.
• Developed and maintained complex data models and ETL pipelines using Python, PySpark, GCP Dataflow, and BigQuery for processing healthcare claims and prescription data.
• Led migration from on-prem to BigQuery, improving performance and reducing costs by 50%.
• Integrated SAP HANA and other ERP data sources with GCP-based analytics platform.
• Built real-time fraud detection and patient segmentation models using BigQuery ML.
• Automated CI/CD workflows using Terraform and Cloud Build, improving deployment speed and reliability.
• Designed HIPAA-compliant, scalable ingestion frameworks for high-volume transactions using Kafka and Pub/Sub.
Environment: Google Cloud (BigQuery, GCS, Cloud SQL), Google Cloud Dataflow (Apache Beam), Cloud Composer (Airflow), dbt, Apache Spark (PySpark), Dataproc, SQL, Google Pub/Sub, Apache Kafka, SAP HANA, Stackdriver (Google Cloud Operations Suite), Datadog, Splunk, Terraform, Google Cloud Build, Snowflake, Kafka, Debezium, Google Cloud Dataflow, Pub/Sub, SAP HANA, Looker, dbt, Terraform, Big Query ML.
Client: South West,TX Oct 2020 to Apr 2023 Role: Data Engineer
• Designed and implemented Snowflake-based data warehouses, optimizing storage and query performance for large-scale airline datasets.
• Developed real-time and batch ETL pipelines using Snowflake Streams, Tasks, and Snowpipe.
• Optimized Snowflake configurations, partitioning strategies, and materialized views to improve query performance and reduce costs.
• Integrated Snowflake with AWS S3, Azure Blob Storage, and Kafka for seamless cloud data ingestion.
• Automated schema evolution and metadata management using dbt and Python, reducing manual interventions.
• Designed ELT workflows using dbt and Airflow, enhancing data transformation efficiency.
• Built analytics dashboards in Power BI and Tableau, enabling real-time insights for flight operations and customer analytics.
• Led migration projects from on-prem databases to cloud platforms, improving scalability and reducing costs.
• Developed real-time data pipelines for monitoring flight delays, cancellations, and passenger trends.
• Spearheaded the design and implementation of a data pipeline for real-time flight status tracking, reducing flight delay reporting time by 50%.
• Implemented automated anomaly detection models using Databricks ML, flagging outlier patterns in flight data to prevent operational disruptions.
• Architected a multi-cloud solution integrating AWS and Azure data services for enhanced scalability and reliability of critical airline systems.
• Developed data-driven customer segmentation models that personalized marketing efforts, improving customer engagement and loyalty.
• Led the migration of legacy data systems to Snowflake, resulting in a 60% performance improvement in query response times.
• Enhanced ETL workflows using Apache Airflow, enabling more frequent and efficient data transformations with fewer system failures.
• Collaborated with business intelligence teams to build dynamic dashboards in Tableau, providing real-time insights on flight performance, customer behaviour, and financial metrics.
• Leveraged Snowflake Streams & Tasks to create an automated data pipeline that significantly improved data timeliness and reporting accuracy.
• Built serverless ETL pipelines using Snowflake Streams, Tasks, and Snowpipe for real-time flight and operations data ingestion.
• Integrated AWS S3, Kinesis, and Lambda for automated data movement and transformation.
• Implemented alerting tools and automated quality checks to detect and resolve pipeline failures pre-emptively.
• Developed anomaly detection and outlier models using Databricks ML to monitor flight disruptions and trends.
• Created automated metadata management solutions, reducing manual overhead and improving data discovery.
• Architected real-time and batch ETL pipelines using Snowflake, dbt, and Python to process flight operations data.
• Implemented anomaly detection models using Databricks ML and SQL to flag outliers in passenger and flight performance metrics.
• Automated ingestion from ERP systems into Snowflake for centralized analytics.
• Integrated AWS, Azure, and Kafka for a multi-cloud data strategy, enhancing reliability and scale.
Environment: Snowflake, Redshift, SQL Server, Snowflake Streams & Tasks, Snowpipe, Apache Airflow, dbt, Snowflake, SQL Server, PostgreSQL, Spark, Databricks, Python, SQL, PowerShell, Kafka, REST APIs, AWS Data Migration Services, Tableau, Power BI, Git, GitHub Actions, Jenkins, Snowflake RBAC, Data Masking, Encryption.
Client: Macy’s,GA Jan 2017 to Sep 2020 Role: Data Engineer
• Developed ETL pipelines integrating Point-of-Sale (POS), online orders, and inventory systems using Azure Data Factory and Databricks.
• Designed customer behaviour tracking solutions with Azure Event Hubs and Apache Flink, improving personalized marketing.
• Migrated legacy SQL Server data warehouse to Azure Synapse Analytics, increasing reporting efficiency by 45%.
• Built demand forecasting models using Databricks ML and Azure Machine Learning, optimizing inventory management.
• Implemented data quality checks with Great Expectations and Delta Lake, ensuring retail data accuracy.
• Automated CI/CD pipelines with Azure DevOps and Terraform, enhancing deployment efficiency.
• Developed Power BI dashboards for sales, marketing, and inventory analytics, improving business decision-making.
• Integrated data from various sources (SAP, Salesforce, Oracle) into a unified Azure Data Lake, enabling advanced analytics.
• Designed automated reporting solutions for seasonal sales trends, enabling data-driven inventory management.
• Engineered real-time pricing optimization models based on competitor analysis and sales trends.
• Implemented AI-driven churn prediction models to improve customer retention strategies.
• Developed inventory tracking solutions to optimize warehouse logistics and reduce stockouts.
• Automated customer sentiment analysis using NLP models on social media and customer feedback data.
• Designed A/B testing frameworks for digital marketing campaigns, improving conversion rates.
• Developed a comprehensive data pipeline for integrating POS and inventory systems, increasing operational efficiency and enabling real-time stock tracking.
• Led the implementation of machine learning-based demand forecasting models, improving product availability and reducing stockouts by 30%.
• Architected and automated the migration of legacy on-prem data systems to the cloud, improving system uptime and reducing maintenance costs.
• Introduced predictive analytics for customer behavior analysis, resulting in a 20% improvement in marketing targeting accuracy.
• Implemented data lineage and governance strategies using Azure Data Factory and Databricks to ensure transparency and compliance.
• Developed event-driven data pipelines using Azure Data Factory, Event Hubs, and Databricks to capture POS and inventory data in real time.
• Implemented data quality frameworks using Great Expectations, enabling early anomaly detection and cleansing workflows.
• Migrated legacy systems to Azure Synapse and Snowflake, improving report refresh rates by 45%.
• Built and managed real-time monitoring dashboards with Power BI and Azure Monitor for operational insights.
• Automated CI/CD deployment pipelines with Azure DevOps and Terraform for infrastructure consistency.
• Migrated on-prem SQL Server data warehouse to Azure Synapse, boosting report refresh times by 45%.
• Developed ETL solutions for integrating POS and ERP systems using Azure Data Factory and Databricks (PySpark).
• Built demand forecasting and customer churn models using Python and Azure ML.
• Ensured data governance and quality using Great Expectations, Delta Lake, and event-driven architectures.
Environment: Azure (Data Lake Storage, Synapse Analytics, Blob Storage), Azure Data Factory, Databricks, dbt, Apache Spark (PySpark), Delta Lake, Apache Flink, Azure Event Hubs, Snowflake, SQL Server, Azure Monitor, Splunk, Terraform, Azure DevOps (YAML Pipelines) Azure Synapse, ADF, SAP, Databricks, Power BI, Delta Lake, Azure Event Hubs.