RAJESHWAR BADDAM
Data Engineer
Email: *************@*****.*** : Phone: +1-334-***-****: LinkedIn: http://www.linkedin.com/in/rajeshwar-baddam PROFESSIONAL SUMMARY
Highly Skilled Data Engineer with 5+ years of progressive experience designing, developing, and optimizing enterprise-grade data engineering and migration solutions across healthcare, finance, and retail industries. Demonstrated expertise in end-to-end data pipeline development, real-time streaming, data migration, and data lakehouse architectures using Azure, Google Cloud Platform
(GCP), and Amazon Web Services (AWS). Proficient in leveraging Azure Databricks, Delta Live Tables (DLT), Azure Data Factory
(ADF), Snowflake, GCP Big Query, Dataproc, Composer, AWS EMR, Lambda, and Redshift to process high-volume structured, semi-structured, and unstructured datasets. Highly skilled in Apache Spark, PySpark, Spark SQL, Spark Streaming, Kafka, Confluent Schema Registry, and Scala for both batch and real-time data processing. Strong track record in Change Data Capture (CDC), schema evolution, Z-Ordering, and performance tuning for optimized query execution. Extensive experience integrating diverse data sources, including Oracle, Teradata, DB2, PostgreSQL, HL7, REST APIs, MFT, Secure FTP, clickstream, and vendor feeds, into modern cloud-native data ecosystems. Adept at implementing data quality frameworks with Talend, DBT, and PySpark, ensuring compliance with HIPAA, AML, HITECH, and other regulatory frameworks. Strong analytical and visualization skills with Power BI, Looker, and Tableau, delivering interactive dashboards and business intelligence solutions to drive decisions in areas such as fraud detection, customer lifetime value (CLV), provider performance, financial risk analysis, and inventory optimization. Experienced in AI/ML data preparation, creating curated datasets for predictive modeling, demand forecasting, and recommendation engines. Expert in CI/CD automation using Jenkins, GitHub Actions, Terraform, and AWS CloudFormation, with a proven ability to deploy production-ready pipelines and infrastructure-as-code solutions. Collaborative leader in Agile/Scrum environments, partnering with product managers, business analysts, DevOps, QA, and compliance teams to ensure alignment with strategic objectives and timely delivery of high-quality data products. Recognized for driving cross-functional data initiatives, enhancing data governance, improving data lineage and cataloging (Unity Catalog), and introducing best practices for scalable, secure, and efficient data engineering across multi-cloud environments.
TECHNICAL SKILLS
Category Skills & Tools
Programming Languages Python, SQL, Scala, Java, R, Shell Scripting Big Data & Processing Apache Spark (Scala, PySpark), Spark SQL, Spark Streaming, Apache Kafka, Confluent Schema Registry, Hive, Sqoop, MapReduce, Hive UDFs
Cloud Platforms Azure (Azure Data Factory, Databricks, Synapse, Delta Live Tables, Azure Monitor, Log Analytics), AWS (EMR, S3, Lambda, Redshift, Kinesis, CloudFormation), GCP (BigQuery, Cloud Storage, Dataproc, Composer, Cloud Functions, GCP Logging & Monitoring) ETL/ELT & Data
Integration
Talend, Informatica, SSIS, Azure Data Factory, Apache NiFi, DBT, REST APIs, JDBC, Kafka Connect, MFT, Secure FTP, Secure Gateway (SGW)
Data Warehousing Snowflake, BigQuery, Amazon Redshift, Azure SQL Data Warehouse Databases Oracle, MySQL, PostgreSQL, DB2, Cassandra, HDFS Reporting & BI Tools Power BI, Tableau, Looker, R, Excel, PowerPivot Data Analysis Libraries Pandas, NumPy, Matplotlib
Version Control & CI/CD Git, GitHub, Jenkins, Azure DevOps, Terraform Scheduling / Orchestration Apache Airflow, Oozie, Control-M, AutoSys Documentation Tools Confluence, UML Diagrams, Source-to-Target Mapping (STTM), Functional Specifications Agile Tools & Methods Jira, Agile, Scrum, Sprint Planning, Retrospectives CERTIFICATIONS:
AWS Certified - Data Engineer Associate
Google Cloud Certified – Professional Data Engineer PROFESSIONAL EXPERIENCE
Client: UnitedHealth Group, Minnesota, United States February 2025 – Present Role: Data Migration Engineer
Responsibilities:
Lead cross-functional teams including Business Analysts and UX specialists to deliver mobile-first data solutions for healthcare analytics and member engagement.
Design and maintain scalable data pipelines using Azure Databricks and Delta Live Tables (DLT) to process claims, eligibility, and provider data through Bronze, Silver, and Gold layers.
Develop real-time data ingestion pipelines using Apache Kafka and Confluent Schema Registry to capture events from patient portals, EHR systems, and pharmacy logs.
Orchestrate batch workflows in Azure Data Factory (ADF) to load data from Oracle, Teradata, and Epic systems into Azure and Snowflake environments.
Migrate historical data assets from on-premise systems into Snowflake, optimizing data models for reporting, compliance, and predictive analytics.
Ingest third-party healthcare files (CSV, HL7) using MFT and Secure Gateway (SGW) into Databricks landing zones.
Apply robust data validation rules in PySpark and DLT to enforce schema consistency, null checks, and business logic across curated layers.
Configure and optimize Kafka streaming jobs to ensure real-time data processing from thousands of endpoints across hospital and care networks.
Manage schema evolution and topic configurations across Kafka for domains such as lab results, appointments, and care alerts.
Automate CI/CD pipelines using Jenkins, GitHub, and PowerShell for deploying ADF workflows and Databricks jobs across environments.
Maintain end-to-end data lineage and cataloging using Unity Catalog, supporting traceability and HIPAA/HITECH compliance.
Design and enhance Power BI dashboards to visualize claims trends, provider performance, and clinical outcomes with UX and product teams.
Improve performance of DLT pipelines through Z-Ordering, partitioning, and job cluster tuning based on workload types.
Implement Change Data Capture (CDC) and schema evolution strategies in Delta and Snowflake for audit and regulatory reporting.
Introduce best practices for data lakehouse management, including table optimization, schema validation, and job monitoring using Azure Monitor and Log Analytics.
Participate in Agile ceremonies, collaborating with product owners, data stewards, and compliance teams to prioritize engineering tasks. Stack: Azure Databricks, Delta Live Tables (DLT), Azure Data Factory (ADF), Snowflake, Apache Kafka, Confluent Schema Registry, Jenkins, GitHub, PowerShell, Unity Catalog, PySpark, Oracle, Teradata, MFT, Secure Gateway (SGW), REST APIs, SQL, Python, Power BI, Z-Ordering, Change Data Capture (CDC), Azure Monitor, Log Analytics, Agile, HL7, Epic Systems. Client: Hero Housing Finance Limited, Hyderabad India. February 2021 – May 2023 Role: Data Engineer
Responsibilities:
Actively participate in Agile ceremonies including backlog refinement, sprint planning, and retrospectives, ensuring alignment with business goals and delivery timelines.
Designed and developed end-to-end ETL/ELT pipelines using Python, PySpark, and Spark SQL to process large-scale structured and semi-structured financial datasets.
Leveraged Google Cloud Platform (GCP) services including BigQuery, Cloud Storage, Cloud Functions, Dataproc, and GCP Composer (Apache Airflow) to store, transform, analyze, and orchestrate transaction-level and customer profile data.
Built real-time ingestion pipelines using Kafka to capture financial events and replicate critical tables into BigQuery using Change Data Capture (CDC) patterns for near real-time analytics and fraud monitoring.
Executed data migration strategies from Oracle and DB2 into BigQuery, ensuring data consistency, compliance, and high availability for financial reporting.
Designed and developed Power BI and Looker dashboards connected to BigQuery datasets to visualize Customer Lifetime Value
(CLV), delinquency, portfolio exposure, and risk metrics.
Ingested data from REST APIs, JDBC sources, MFT, and secure FTP using Talend and UNIX shell scripts, applying AML compliance and regulatory transformation logic.
Optimized SSIS and Informatica workflows for legacy batch feeds integrated into GCP pipelines.
Automated infrastructure provisioning and security configurations in GCP using Terraform (IAM roles, BigQuery datasets, bucket policies).
Built robust REST/JSON API integrations with external financial data providers for centralized ingestion into BigQuery.
Tuned BigQuery SQL queries and Spark jobs to improve performance of regulatory and operational reporting.
Processed external bank feeds using Azure Synapse Studio before syncing with GCP data warehouses.
Maintained data lineage, schema changes, and transformation logic in DBT for governance and auditing, and implemented rigorous data quality rules in Python and Talend.
Monitored and troubleshot pipelines using GCP Logging and Monitoring, ensuring SLA adherence.
Collaborated effectively across cross-functional teams, applying communication and collaboration skills to align business objectives with technical execution.
Participated in Agile ceremonies, collaborating with stakeholders to align data engineering deliverables with financial business goals. Stack: Google Cloud Platform (GCP), BigQuery, Cloud Storage, Cloud Functions, Dataproc, GCP Composer, Python, PySpark, Spark SQL, Kafka, Change Data Capture (CDC), Oracle, DB2, Power BI, Looker, REST APIs, JDBC, Talend, UNIX Shell Scripting, MFT, Secure FTP, SSIS, Informatica, Terraform, JSON, Azure Synapse Studio, DBT, GCP Logging and Monitoring, Agile Methodologies. Client: DXC Technology, Hyderabad, India. June 2019 – February 2021 Role: Data Engineer
Responsibilities:
Built real-time pipelines using Apache Kafka and Spark Streaming (Scala) to process retail POS transactions, triggering fraud or promotion alerts based on dynamic thresholds.
Developed batch ETL pipelines on AWS EMR to load inventory, sales, and product data from Amazon S3 into partitioned Hive tables using Spark SQL, supporting nightly sales and stock reporting.
Ingested customer demographics, purchase history, and loyalty program data from on-prem Oracle systems into HDFS using Sqoop, maintaining accuracy with incremental updates.
Created REST API integrations to pull product catalog updates, pricing feeds, and currency conversion rates from external vendors into AWS S3 raw zones for downstream analytics.
Designed and deployed AWS Lambda functions (Python) to process event-driven retail logs such as cart abandonment and order status changes, loading processed data into Amazon Redshift for business reporting.
Automated ingestion from supplier FTP servers into Kafka using Apache NiFi, standardizing inconsistent product and order formats into structured schemas.
Built a compliance and audit data mart using Azure Synapse Pipelines, joining internal sales, returns, and supplier data to support vendor audits and retail compliance requirements.
Migrated legacy MapReduce jobs to optimized Spark transformations to aggregate historical sales, seasonal trends, and pricing history for merchandising teams.
Developed Hive UDFs to support custom retail KPIs such as basket size, category performance, and sell-through rates.
Implemented a data quality framework in Talend to validate stock movement, order fulfillment accuracy, and promotional discount rules.
Managed Amazon S3 buckets for staging, curated, and archival retail datasets, applying fine-grained IAM access controls.
Created Tableau dashboards using SQL-based semantic models to visualize KPIs such as daily sales, inventory turnover, and store- level performance.
Tuned performance of Spark EMR jobs by optimizing memory allocation, shuffle partitions, and serialization for high-volume transaction processing.
Prepared clean and enriched feature datasets for downstream AI/ML models used in demand forecasting, recommendation systems, and customer segmentation.
Applied strong problem-solving and decision-making skills to design, develop, and deploy advanced analytics and machine learning solutions for large-scale information systems and CRM platforms.
Designed PostgreSQL-based reporting tables with indexing to support merchandising and operations dashboards.
Developed AWS CloudFormation templates to provision EMR clusters and manage Redshift infrastructure for sales reporting pipelines.
Built custom Kafka producers/consumers in Java to process eCommerce clickstream data for customer behavior analysis.
Wrote Python scripts to transform JSON order and shipment logs into flattened CSV structures for ingestion into PostgreSQL and BI tools.
Documented data flows, lineage diagrams, and technical specifications in Confluence and UML diagrams to ensure transparency.
Participated in Agile ceremonies and cross-functional meetings with merchandising, supply chain, and marketing teams to deliver analytics solutions aligned with business goals.
Stack: Apache Kafka, Spark Streaming (Scala), AWS EMR, Amazon S3, Apache Hive, Spark SQL, Oracle, HDFS, Sqoop, REST APIs, AWS Lambda (Python), Amazon Redshift, Apache NiFi, Azure Synapse Pipelines, MapReduce, Hive UDFs, Talend, IAM, Tableau, AI/ML, PostgreSQL, AWS CloudFormation, Java, JSON, CSV, Confluence, UML, Agile methodologies. EDUCATION DETAILS
Auburn University at Montgomery, Alabama, United States May 2023 – December 2024 Masters, Computer Science (CS).