Data Engineer Senior

Location:

Seattle, WA

Posted:

May 28, 2025

Contact this candidate

Resume:

VENKAT RAO BANDARU

Senior Data Engineer

Phone: 980-***-****

Email: **************@*****.***

LinkedIn :Venkatrao-Bandaru

PROFESSIONAL SUMMARY

Experienced Data Engineer with over 10+ years in IT, specializing in Azure cloud services and Big Data technologies, skilled in designing, developing, and optimizing ETL data pipelines, including batch and real-time streaming solutions using Azure services, Python, and PySpark.

Expertise in Kafka technologies, including Kafka Connect, Kafka Streams (KStreams), KSQL, and Kafka Db Connector for Oracle and MySQL. Proficient in Kafka administration, configuration, and troubleshooting, with hands-on experience in clustering, fault-tolerance, and high-availability (HA) and disaster recovery (DR) models.

Strong experience in Kafka Streaming and KStreams pipelines, with a proven ability to debug complex issues and deploy KStreams clusters for scalable data solutions.

Familiarity with Confluent Control Center and Kafka monitoring tools (UI), proficient in Kafka best practices and querying with KSQL.

Experienced in Azure Services, Cloudera, Hadoop Ecosystem, Spark/Pyspark/Scala, Databricks, MapReduce, Tez, Python, Scala, Hive, Snowflake, relational databases, and visualization tools such as Tableau and Power BI.

Excellent understanding and knowledge of Azure services like Azure Databricks, Azure Stream Analytics, Azure Synapse Analytics, Azure Log Analytics, Azure Security Center, Azure Event Hubs, Azure HDInsight, Azure Logic Apps, Triggers, Azure Cosmos DB.

Collaborated with data analysts and business stakeholders to gather requirements and define data models and reporting structures for optimized analytics.

Designed and implemented Data Warehouse solutions using Snowflake and Star Schema modeling, ensuring optimized data storage and retrieval.

Developed Logical and Physical Data Models for Snowflake, ensuring seamless data integration and scalability.

Established ETL pipelines utilizing Azure Data Factory (ADF), Informatica, PL/SQL, and Snowflake Integrated Services, ensuring seamless data integration across enterprise systems.

Designed and developed scalable data ingestion pipelines using Azure and Snowflake, integrating structured and unstructured data from various sources such as SQL Server, Oracle, Teradata and ADLS Gen2.

Built batch and real-time streaming data pipelines using Azure Event Hub, Apache Kafka, and Databricks Auto Loader, ensuring low-latency data ingestion and transformation.

Implemented Delta Lake architecture with Delta Tables, Delta Live Tables, and Data Catalogs, ensuring data consistency and optimized processing.

Developed large-scale data pipelines using Apache Spark and Hive, optimizing data transformations, aggregations, and analytics processing.

Used Apache Sqoop for importing and exporting data between HDFS and relational databases, ensuring seamless data movement.

Scheduled and managed Hadoop workflows using Apache Oozie, automating data ingestion and transformation tasks.

Optimized Spark performance by implementing SparkSQL, Spark Streaming, and performance tuning techniques in Azure Databricks.

Designed and optimized schemas, tables, and views in Snowflake, utilizing features like Clone and Time Travel for data recovery and historical analysis.

Enhanced query performance by implementing bucketing, partitioning, indexing, and materialized views in Hive, Spark, and Snowflake.

Implemented secure and compliant data pipelines using Voltage Azure Key Vaults, and Managed Identity.

Developed and automated serverless solutions using Azure Functions and Azure Logic Apps, integrating event-driven workflows.

Established workflow automation using Apache Airflow DAGs, YAML configurations, and Terraform scripts.

Developed CI/CD pipelines for automated data deployment, collaborating with DevOps teams using Azure DevOps, GitHub, Bitbucket, and Terraform.

Implemented Machine Learning data pipelines using Azure ML Flow, integrating AI/ML models with big data solutions.

Implemented interactive data visualization dashboards using Power BI and Power BI DAX, enabling business users to derive actionable insights.

Monitored and managed data pipeline security and performance using Azure Security Center, Azure Log Analytics, and Splunk.

Proficient in Agile methodologies, actively participating in daily stand-ups, sprint planning, backlog grooming, and tracking tasks using JIRA and Azure DevOps (ADO).

Maintained version control and managed data engineering codebase using GitHub, Azure DevOps, Bitbucket, and GitLab.

TECHNICAL SKILLS

Big Data Tools Apache Spark, PySpark, Spark SQL, Spark Streaming, Hadoop, HDFS, MapReduce, Hive, Sqoop, Kafka, Flink, HBase, NiFi, Airflow, Delta Lake, Star/Snowflake Schema, SCDs, Cloudera, and Hortonworks. Databases Oracle, MySQL, SQL Server, MongoDB, CosmosDB, Cassandra and Snowflake.

Programming Languages Java, Python, PySpark, Shell script, SQL, Scala. Cloud Services Azure Synapse Analytics, Azure Data Lake Storage (Gen2), Azure SQL Database, Azure Data Warehouse, Azure Databricks, Azure Data Factory, Azure Event Hubs, Azure Logic Apps, Azure Blob Storage, Azure Functions, Azure HDInsight, Azure Key Vault, Azure Application Insights, Azure Monitor.

Version Control and CI/CD

Tools

SVN, Git, GitHub, Azure DevOps, Jenkins, Maven, Bitbucket. Streaming Tools and Cloud

security

Apache Kafka, Azure Event Hubs, Spark Streaming, Azure Logic Apps, APIGEE Edge, Azure API Management, Azure Key Vault, Azure Security Center, Terraform, Azure Monitor.

Development and Design

Tools

Eclipse, Visual Studio, IntelliJ IDEA, Spark MLlib, PySpark. Operating Systems Windows 10/7/XP/2000/NT/98/95, UNIX, LINUX, OS Visualization Tools Power BI, Tableau, Excel (Advanced), SSRS. EDUCATION

Bachelor's Course : Electronics & Communication Engineering University : Jawaharlal Nehru Technological University Master’s Course : Computer Science

University : Northern Arizona University

WORK EXPERIENCE

Client: Centene Corporation, St.Louis, MO

Role: Senior data engineer Nov 2022 - Till Now

Responsibilities:

Designed and built cloud-native data platforms leveraging Azure Data Factory (ADF), Azure Synapse Analytics, Snowflake, and Databricks, ensuring end-to-end data ingestion, processing, and analytics.

Designed and implemented Kafka-based real-time streaming data pipelines using Apache Kafka, Azure Event Hubs, and Spark Streaming, optimizing data flow for low-latency ingestion and processing.

Administered Kafka clusters, resolving performance bottlenecks and ensuring fault tolerance and high availability across Dev, QA, UAT, and PROD environments.

Kafka Connect was utilized to facilitate seamless data integration between Oracle and MySQL databases, enabling real-time data pipelines.

Built and optimized KStreams pipelines for processing high-throughput, real-time data feeds, ensuring minimal latency and robust fault-tolerant architecture.

Developed and deployed KSQL queries to transform data in real-time and optimize streaming analytics workflows.

Migrated on-prem databases (SQL Server, Oracle, PostgreSQL) to Azure SQL, Snowflake, and Synapse SQL Pool, optimizing cost, performance, and scalability.

Implemented Delta Lake on Azure Databricks, improving data lake management with ACID transactions, schema evolution, and versioning.

Optimized data storage and retrieval using Snowflake Time Travel, Cloning, and Multi-Cluster Warehouses, reducing compute costs and enhancing performance.

Configured RBAC (Role-Based Access Control) & Data Encryption using Azure Key Vault and Snowflake Security Policies, ensuring compliance with GDPR, HIPAA, and SOC

Built high-performance ETL data pipelines using Apache Spark, PySpark, and SQL, automating data extraction, transformation, and loading into data lakes and warehouses.

Developed batch data processing workflows using Azure Data Factory (ADF), Snowflake Streams, and Databricks Jobs, ensuring scalability and high throughput.

Tuned Spark job execution using optimized partitioning, caching, shuffle operations, and broadcast joins, reducing processing times by 40-50%.

Converted legacy ETL workflows (Informatica, SSIS) to Spark-based pipelines, improving performance and reducing processing time from hours to minutes.

Architected real-time streaming pipelines using Apache Kafka, Azure Event Hubs, and Spark Streaming, ensuring low-latency, fault-tolerant event processing.

Configured Kafka producers & consumers with optimized partitioning strategies, replication, and failover mechanisms, ensuring seamless message distribution.

Integrated Kafka with Snowflake and Azure Synapse, enabling real-time analytics and operational dashboards.

Built CDC (Change Data Capture) pipelines using Kafka Connect, Debezium, and Snowflake Streams, synchronizing data changes across multiple storage systems.

Developed real-time analytics solutions using Flink, Apache Storm, and Spark Streaming, processing millions of events per second.

Designed and implemented Fact-Dimension models (Star & Snowflake Schema) in Snowflake and Azure Synapse, optimizing for analytical workloads.

Used PolyBase and Snowflake External Tables for seamless integration with external data sources, improving query performance and scalability.

Optimized SQL queries and analytical workloads using partitioning, clustering, materialized views, and indexing, reducing execution times by 30%.

Developed UDFs (User-Defined Functions) in Snowflake, Hive, and SQL for complex data transformation logic.

Implemented bucketing, partitioning, and columnar storage formats (Parquet, ORC, Avro) to enhance query performance in Hive, Snowflake, and Synapse Analytics.

Automated data pipeline orchestration using Apache Airflow, Azure Data Factory, and Oozie, ensuring job scheduling, dependency management, and alerting.

Developed Airflow DAGs for ETL workflows, integrating Kafka, Snowflake, and Databricks for automated data movement.

Designed CI/CD pipelines in Azure DevOps and GitHub Actions, reducing deployment time by 50% and ensuring seamless data pipeline deployments.

Automated infrastructure provisioning using Terraform & ARM Templates, eliminating manual setup errors and accelerating cloud deployments.

Enforced data governance policies using Azure Purview, Snowflake Data Governance, and Data Catalogs, ensuring data lineage and compliance.

Implemented data quality frameworks using Great Expectations and dbt, automating validation, anomaly detection, and data reconciliation.

Designed data encryption strategies using Azure Key Vault, Snowflake Encryption at Rest & Transit, and Voltage SecureData, protecting sensitive data from unauthorized access.

Set up audit logging & monitoring using Azure Monitor, Splunk, and ELK Stack (Elasticsearch, Logstash, Kibana), ensuring real-time issue detection and troubleshooting.

Integrated Power BI & Snowflake/Azure Synapse for real-time data visualization and KPI reporting.

Built interactive dashboards using Power BI DAX, SQL, and Tableau, enabling self-service BI for business users.

Worked closely with data analysts & stakeholders to define business metrics, optimize SQL queries, and generate actionable insights.

Worked with Azure Boards, which is an Agile project management tool that allows tracking and managing tasks where Azure Boards provide features such as Kanban boards and sprint planning, which help the team to work together effectively.

Environment: Azure Data Factory, Azure Databricks, Apache Spark, PySpark, Scala, Snowflake, SnowSQL, SnowPipe, Azure Synapse Analytics, Azure Blob Storage, Azure Event Hubs, Apache Kafka, Spark Streaming, Hive, Hive on Spark, SparkSQL, Jenkins, CI/CD, Python, SQL, dbt (Data Build Tool), Terraform, Apache Airflow, GitHub, ETL, Azure DevOps, JIRA, Agile, Power BI. Client: USAA Financial Services, Phoenix, AZ

Role: Azure Snowflake data engineer Feb 2021 – Oct 2022 Responsibilities:

Developed Kafka-based real-time data processing pipelines integrated with Azure Synapse Analytics and Snowflake, enabling low-latency, fault-tolerant analytics solutions.

Worked extensively with Kafka Streams (KStreams) to build scalable and reliable streaming applications.

Integrated Kafka Connect with relational databases (including MySQL and Oracle) to ensure smooth, real-time data ingestion for data lakes and data warehouses.

Participated in troubleshooting complex Kafka performance issues and administered Kafka clusters across multiple environments.

Engineered end-to-end data ingestion workflows using Azure Data Factory, seamlessly integrating data from SQL databases, CSV files, and REST APIs for real-time and batch processing.

Created data processing workflows in Azure Databricks, leveraging Spark for distributed data processing and transformation tasks.

Assured data quality and integrity by performing data validation, cleansing, and transformation through Azure Data Factory and Databricks.

Designed and deployed a cloud-based data warehouse solution using Snowflake on Azure, ensuring scalability and high-performance analytics.

Developed and optimized Snowflake data pipelines using SnowSQL, Snowflake Integrated Services, and SnowPipe, enabling automated data ingestion and transformation.

Utilized Snowflake Clone and Time Travel features to ensure data recovery, replication, and historical analysis, improving business continuity and auditability.

Established and optimized Snowflake schemas, tables, and views, supporting efficient data storage and retrieval for analytics and reporting.

Developed and optimized Spark jobs for data transformations, aggregations, and machine learning tasks on large datasets.

Utilized Azure Synapse Analytics to enhance big data processing and analytics capabilities, enabling seamless data exploration.

Configured event-based triggers and scheduling mechanisms in Azure Data Factory to automate data pipelines and workflows.

Implemented data lineage and metadata management solutions to monitor data flow and transformations, ensuring data governance and transparency.

Identified and addressed performance bottlenecks in data processing and storage layers, optimizing query execution and reducing latency.

Employed partitioning, indexing, and caching strategies in Snowflake and Azure services to improve query performance and reduce processing time.

Performed performance tuning and capacity planning to ensure scalability and efficiency of the data infrastructure.

Developed a CI/CD framework for data pipelines using Jenkins, ensuring automated deployment and monitoring.

Collaborated with DevOps engineers to build automated CI/CD and test-driven development pipelines using Azure as per client requirements.

Applied programming expertise in Python and Scala, developing custom scripts for automation and data processing.

Participated in executing Hive scripts using Hive on Spark and SparkSQL, optimizing data processing and querying efficiency.

Worked on ETL tasks, ensuring data integrity and stability in data transformation workflows.

Gained hands-on experience with Kafka and Spark Streaming, processing real-time streaming data for business-critical applications.

Developed a data pipeline utilizing Kafka, Spark, and Hive, facilitating data ingestion, transformation, and analysis.

Designed and implemented real-time data processing solutions using Kafka and Spark Streaming, enabling low-latency analytics on high-volume data streams.

Developed Spark Core and Spark SQL scripts using Scala, enhancing data processing efficiency.

Used JIRA for project tracking, creating subtasks for Development, QA, and Partner validation.

Experienced in Agile methodologies, participating in daily stand-ups, sprint planning, and PI Planning on an international scale.

Role: Big data Developer Aug 2019 – Jan 2021

Responsibilities:

Integrated Azure Data Factory (ADF) for orchestrating data pipelines, automating data ingestion from on-premise MySQL to Azure Data Lake Storage (ADLS Gen2) for cloud-based analytics.

Migrated Hadoop-based ETL workflows to Azure Databricks, leveraging Spark on Azure to enhance real-time processing and improve query performance.

Implemented Azure HDInsight to run Hive, Pig, and Spark jobs, ensuring scalable big data processing within the Azure ecosystem.

Used Azure Event Hubs for real-time streaming data ingestion, integrating with Spark Streaming and Kafka for real-time analytics pipelines.

Optimized data warehousing solutions using Azure Synapse Analytics, reducing query execution time and enabling interactive analytics on large datasets.

Engaged in data acquisition, pre-processing, and exploration for a telecommunication project using Scala, ensuring efficient data handling and transformation.

Installed and configured Hive, Pig, Sqoop, and Oozie on a Hadoop cluster, setting up and benchmarking Hadoop clusters for internal use.

Developed and implemented data acquisition jobs using Scala, Sqoop, Hive, and Pig, optimizing MapReduce jobs to efficiently use HDFS with various compression mechanisms via Oozie workflows.

Leveraged Spark for data pre-processing, ensuring missing data removal and feature transformation to improve data quality and enable advanced analytics.

Performed data exploration using Hive, extracting valuable insights from processed data stored in HDFS, enhancing data-driven decision-making.

Imported data from various sources, transforming it with Hive and MapReduce, and loaded data into HDFS; also extracted MySQL data into HDFS using Sqoop for further processing.

Implemented business logic in Hadoop using Hive UDFs, enabling read, write, and query operations on Hadoop data in HBase for real-time access and analytics.

Used Cloudera Manager for continuous monitoring and administration of Hadoop clusters, ensuring timely OS updates, Hadoop patches, and version upgrades.

Developed data pipelines using Sqoop, Pig, and Hive, ingesting customer member data, clinical, biometrics, lab, and claims data into HDFS for further data analytics.

Designed and developed POCs in Spark using Scala, comparing performance between Spark, Hive, and SQL-based querying in Oracle.

Utilized Oozie workflow engine to orchestrate multiple Hive and Pig scripts, integrating Kafka for real- time data processing and loading log file data directly into HDFS.

Worked with different Oozie actions, including Sqoop, Pig, Hive, Shell, and Java actions, designing optimized workflows for automated data ingestion and processing.

Analyzed large-scale datasets, determining optimal aggregation techniques for efficient reporting and analytical processing.

Environment: Scala, Spark, PySpark, Apache Kafka, HDFS, MapReduce, Hive, HiveQL, Pig, Pig UDFs, Sqoop, Oozie, HBase, Cloudera Manager, Spark SQL, Spark Streaming, JSON, Avro, Parquet, CSV, MySQL, Oracle, ETL, Data Pipelines, Data Transformation, Data Aggregation, Azure Data Factory (ADF), Azure Data Lake Storage (ADLS Gen2), Azure Databricks, Azure HDInsight, Azure Event Hubs, Azure Synapse Analytics.

Client: Humana, Austin, TX

Role: Databricks Developer May 2018 – July 2019

Responsibilities:

Collected and aggregated large volumes of weblog data from web servers using PySpark Streaming with Apache Kafka, then stored the data in HDFS for further analysis.

Migrated MapReduce programs into Spark transformations, which optimized data processing using PySpark and Scala for better performance and scalability.

Developed PySpark and Spark-SQL queries, where implemented RDD transformations, actions, and DataFrames for statistical analysis and data filtering.

Utilized PySpark and Spark-SQL to read Parquet data and created Hive tables, then leveraged Schema RDDs to process JSON and Avro data formats.

Performed HiveQL queries to create and manage Hive tables, where implemented partitioning, bucketing, and indexing to enhance query performance.

Developed and optimized Pig scripts, while creating custom UDFs in Java and Python to handle complex data transformations and filtering operations.

Implemented Impala for high-performance SQL queries on large datasets stored in Hive tables, which improved query execution speed.

Worked with Avro, Parquet, JSON, and CSV data serialization formats, ensuring efficient data storage, compression, and retrieval.

Developed and executed Spark Core and SparkSQL queries for ETL processing, aggregations, and analytical transformations.

Implemented Oozie workflows to schedule batch processing jobs, which managed dependencies and dynamic execution flows for automated processing.

Utilized Sqoop to import and export structured data between HDFS and relational databases, enabling efficient data exchange and reporting.

Loaded and processed data in HBase, where optimized storage and retrieval operations using bulk and non-bulk load techniques for high-speed querying.

Analyzed large-scale datasets using PySpark and Hive queries, where optimized performance through indexing, partitioning, and caching techniques.

Developed MapReduce and Spark-based ETL pipelines to automate data ingestion, transformation, and structured output generation for analytical purposes.

Implemented Agile methodologies, while participating in daily Scrum meetings, sprint planning, and continuous integration processes using Jenkins and Maven. Environment: PySpark, Apache Kafka, HDFS, Spark Streaming, Scala, Spark Core, SparkSQL, RDD, DataFrames, Hive, Parquet, JSON, Avro, HiveQL, Partitioning, Bucketing, Indexing, Pig, Pig UDFs, Java, Python, Impala, SQL, Oozie, Sqoop, HBase, MapReduce, ETL, Jenkins, Maven, Agile, Scrum. Client: Target, San Marcos, CA

Role: SQL Developer Oct 2016 – Apr 2018

Responsibilities:

Analyzed business requirements and designed scalable database structures to support data engineering and analytics workflows, including Entity Relationship Diagrams (ERD) and logical/physical database designs.

Created and maintained Stored Procedures, Functions, Triggers, Cursors, and SQL scripts for data processing, ETL workflows, and transformation logic.

Developed and optimized T-SQL queries, implementing indexing strategies, execution plan analysis, and partitioning techniques to enhance query performance.

Designed and developed SSIS packages for extracting, transforming, and loading (ETL) data from various sources into SQL Server and Data Warehouses.

Implemented automated ETL workflows using SSIS, leveraging Conditional Split, Multicast, and Fuzzy Lookup for data validation, cleansing, and enrichment.

Performed data migration and database upgrades from SQL Server 2008 to SQL Server 2014, ensuring zero downtime and data consistency.

Managed SQL Server backups, restores, and disaster recovery strategies, ensuring high availability and data integrity.

Developed Change Data Capture (CDC) solutions to enable incremental data loading and real-time data updates.

Implemented data lineage tracking and metadata management to ensure data integrity, governance, and compliance.

Developed SQL scripts to automate ETL workflows, facilitating seamless data extraction, transformation, and batch/real-time data loading.

Created reports using SQL Server Reporting Services (SSRS) to provide actionable insights for business users.

Integrated SQL Server with SSIS and other ETL tools, enabling efficient data ingestion, transformation, and analytics workflows.

Environment: SQL Server, T-SQL, SSIS, SSRS, Stored Procedures, Functions, Triggers, Cursors, SQL Scripts, Entity Relationship Diagrams (ERD), Indexing, Partitioning, Change Data Capture (CDC), Data Governance, Data Migration, SQL Server Backups and ETL. Client: Dish, Dallas, TX

Role: Data warehouse Developer May 2014 – Sep 2016 Responsibilities:

Developed complex stored procedures, efficient triggers, user-defined functions, and indexed views to optimize query performance and data processing in SQL Server.

Monitored and optimized SQL Server performance tuning, including execution plan analysis, indexing strategies, and query optimization to enhance system efficiency.

Created ETL solutions using SSIS, developing mappings and automated workflows for seamless data extraction, transformation, and loading from SQL Server, Access, and Excel.

Performed data migration and transformation using SQL Server SSIS, ensuring seamless integration of structured and semi-structured data.

Developed Dimensional Data Models for Data Marts and Data Warehouses, identifying Facts, Dimensions, Fact Tables, and Dimension Tables, and implemented Slowly Changing Dimensions

(SCDs).

Designed and built SSAS Cubes and Dimensions, developing Aggregations, KPIs, Measures, Partitioning Cubes, and Data Mining Models to enhance data exploration and reporting.

Created and optimized Data Marts using Star and Snowflake Schema models, ensuring efficient query execution and structured data retrieval.

Developed reports using SSRS, including Ad-hoc Reports, Parameterized Reports, Dashboards, and Scorecards, supporting business intelligence needs.

Implemented drill-down, drill-through, and cascading reports on SSAS Cubes, leveraging MDX scripting and SQL-based querying for advanced reporting and analytics.

Optimized SSIS ETL workflows, implementing parallel processing, data partitioning, and incremental loads to enhance pipeline efficiency.

Managed database administration tasks, including backup, restore, replication, and disaster recovery planning, ensuring data availability and security.

Assisted in transitioning from Data Warehouse development towards Data Engineering, gaining exposure to real-time data processing, ETL automation, and performance engineering. Environment: MS SQL Server, Visual Studio, SSIS, Share point, MS Access, Team Foundation server, Git.

Contact this candidate