Data Warehouse Engineer with 8+ years experience

Location:

McKinney, TX

Posted:

March 19, 2026

Contact this candidate

Resume:

Lakshmi Chandrashekar Madiraju

Sr. Data Engineer

469-***-****

************@*****.***

PROFESSIONAL SUMMARY:

Data Engineer with 8 years of experience building scalable data pipelines and big data solutions across cloud (AWS, Azure) and on-premises environments, handling large-scale structured and unstructured data.

Strong expertise across the full SDLC lifecycle, including requirements gathering, system design, development, testing, deployment, and production support in Agile environments.

Extensive experience in data cleansing, profiling, transformation, and performance tuning, ensuring high-quality, reliable, and optimized data pipelines.

Hands-on experience with the Hadoop ecosystem (HDFS, MapReduce, Spark, Hive, HBase, Cassandra, Kafka, Sqoop, Flume, Impala, Oozie) and strong understanding of distributed architecture including YARN and fault tolerance.

Designed and developed ETL/ELT pipelines using Azure Data Factory, Databricks, and AWS Glue, enabling efficient data ingestion, transformation, and integration across multiple data sources.

Proficient in AWS and Azure cloud platforms, working with services such as S3, EMR, EC2, RDS, Redshift, CloudWatch, Data Lake, Synapse, and Azure SQL Database for scalable data solutions.

Developed and optimized Spark applications using PySpark and Scala, leveraging Spark SQL, DataFrames, and Spark Streaming for both batch and real-time data processing.

Built real-time streaming pipelines using Kafka and Spark Streaming, enabling low-latency processing of high-volume event-driven data.

Strong programming skills in Python, SQL, and Scala, utilizing libraries such as Pandas, NumPy, and SciPy for data analysis, transformation, and numerical computations.

Implemented data orchestration and workflow automation using Apache Airflow along with scheduling and dependency management for production pipelines.

Designed and deployed end-to-end MLOps pipelines, including data ingestion, feature engineering, model training, evaluation, deployment, and monitoring using tools like Kubeflow and MLflow.

Worked with relational and NoSQL databases including PostgreSQL, MySQL, Snowflake, Teradata, MongoDB, HBase, and Cassandra for high-performance data storage and retrieval.

Ensured data quality and pipeline reliability using Great Expectations and monitoring tools like Datadog, along with implementing validation and anomaly detection frameworks.

Automated CI/CD and infrastructure deployment using Jenkins, Docker, and Terraform, ensuring consistent and scalable data platform deployments.

Developed UNIX shell scripts and SQL/Hive-based validation frameworks for automating batch processing, data validation, and pipeline execution.

Created interactive dashboards and data visualizations using Tableau and Python to support business insights and decision-making.

Utilized Git and SVN for version control, managing code changes and enabling collaborative development across teams.

TECHNICAL SKILLS:

Programming Languages

Python, SQL, C, Scala, Unix Shell scripting, Spark SQL

Cloud Services

AWS EC2, S3, RedShift Spectrum, S3, EMR, Dynamo DB, Data Lake, AWS Data-Pipeline, AWS Lambda, Athena, AWS Glue, cloud watch, RDS, Azure Data Factory, Azure Data Lake Storage, MLOps Azure Synapse, Azure SQL, and HDInsight.

Hadoop/Spark Ecosystem

Hadoop, MapReduce, Pig, Hive, Impala, YARN, Kafka, Flume, Sqoop, Zookeeper, Spark, Airflow, MongoDB, Cassandra, HBase, Yarn and Storm, Team Foundation Server (TFS)

ETL and Version Control Tools

Informatica, Talend, Delta Tables, SSIS, DataStage, GIT and Datadog

Databases

MySQL, MS-SQL Server, HBase, Snowflake, DB2, Teradata, Oracle and MongoDB, Cosmos, postgres

Data Visualizations Tools

Power BI, Tableau and MicroStrategy, OTBI, BIP, HR system, Ultipro, Excel

Operating Systems

Linux, Windows, Mac OS, Unix

PROFESSIONAL EXPERIENCE:

Client: Verizon, Irving-Texas Feb’23-present

Role: Sr. Data Engineer

Verizon is a leading telecommunications provider in the United States, offering wireless, broadband, and digital services to millions of customers. The company is known for its strong network infrastructure and leadership in 5G technology, focusing on innovation and connectivity. This project involves building scalable data pipelines using cloud platforms like Microsoft Azure or AWS to deliver critical datasets to business and technical stakeholders. It includes processing large-scale data with tools such as Apache Spark, performing transformations like joins, filters, and aggregations to generate meaningful insights. The role also requires ensuring data quality and documenting workflows, data mappings, and system processes while collaborating with cross-functional teams.

Roles & Responsibilities:

Designed and implemented scalable ETL pipelines using AWS Glue, Apache Airflow, and Python to process high-volume telecom and network event data, enabling efficient real-time and batch ingestion into Amazon S3.

Built and integrated hybrid data architectures, migrating on-premises systems to AWS using secure API frameworks to support large-scale subscriber and operational data processing.

Developed RESTful APIs to ingest real-time data from internal and external systems, improving availability of customer usage and network performance data.

Automated data workflows using AWS Lambda, reducing processing time by 40% and enabling efficient loading of telecom analytics data into Amazon RDS.

Engineered big data solutions using Azure Databricks and Apache Spark to process large-scale 5G and network telemetry datasets, improving data transformation performance.

Implemented real-time streaming pipelines using Azure Event Hub and Databricks, supporting low-latency processing of streaming network and device data.

Designed data integration and data modeling solutions supporting customer billing, revenue analytics, and service usage reporting, ensuring accurate business insights.

Established monitoring and alerting systems using Datadog, achieving 99.9% uptime for mission-critical telecom data pipelines.

Enhanced data reliability by implementing S3 Cross-Region Replication and automated pipelines to support high availability of network and customer data systems.

Applied data quality frameworks using Great Expectations to validate large-scale telecom datasets, ensuring data integrity and accuracy.

Managed PostgreSQL databases and schema migrations using Alembic, optimizing performance for high-throughput workloads.

Orchestrated end-to-end data workflows using Azure Data Factory, improving efficiency of enterprise-scale data pipelines.

Automated CI/CD pipelines using Jenkins, Docker, and Terraform, ensuring scalable and reliable deployments.

Collaborated with cross-functional teams to define KPIs and deliver insights for customer experience, network optimization, and business performance.

Documented data architecture, data pipelines, and workflows to support data governance and enterprise standards.

Environment: Python, SQL, Scala, Spark, PySpark, HiveQL, Azure Data Lake, DB2, HDFS, Sqoop, Azure Data Factory, Blob storage, Databricks, Kafka, JSON, Parquet, PySpark, MLOps, ETL, Azure Databricks, Datadog, Pgadmin, S3, Lambda Postgres, OTBI, BIP, HR system, Ultipro, Azure SQL DB, Azure Event Hubs, Power BI.

Client: Intent Design Ltd, Farmington Hills - Michigan Sep’22-Feb’23

Role: Sr. Data Engineer

Intent Design Ltd is a technology consulting firm specializing in data, cloud solutions, and scalable application development. The project focuses on cleaning, transforming, and analyzing large datasets using Apache Spark to deliver high-quality data for developers and business analysts. It involves building production-ready data pipelines and handling ad-hoc data requests within enterprise environments. The role includes developing Python/Spark jobs for data transformation and aggregation, ensuring data reliability. Additionally, documentation are key to supporting data-driven solutions.

Roles & Responsibilities:

Collaborated with enterprise clients and business stakeholders to gather requirements and translate them into technical specifications, data models, and process workflows.

Designed and implemented scalable ETL pipelines for client-specific use cases using AWS DMS, S3, Glue, Lambda, and Redshift, improving data accuracy by 30%.

Built custom data ingestion frameworks using Apache Spark and Python, enabling efficient processing of multi-source client data across different business domains.

Identified System of Record (SOR) systems and implemented end-to-end data lineage and transformation logic to support client KPI reporting and analytics.

Developed and deployed end-to-end MLOps pipelines tailored to client requirements, including model training, deployment, and monitoring using Amazon SageMaker and Kubernetes.

Led cloud migration initiatives for clients, transitioning on-premises applications to AWS, improving scalability and operational efficiency.

Engineered data transformation workflows using AWS Glue and Apache Spark, delivering high-performance, production-ready data pipelines for client environments.

Performed exploratory data analysis (EDA) to generate business insights and trends, supporting client decision-making processes.

Built and maintained big data ecosystems using HDFS, Hive, and Spark, enabling scalable data processing solutions for diverse client datasets.

Developed real-time data pipelines using Kafka and Spark Streaming to process customer event and activity data for client-facing applications.

Automated workflow orchestration using Apache Airflow, ensuring reliable and scheduled execution of client data pipelines in production environments.

Implemented monitoring and alerting solutions using Datadog, Grafana, and AWS CloudWatch, ensuring high availability and performance of client systems.

Optimized and managed data warehousing solutions in Amazon Redshift, supporting client analytics and reporting needs.

Built serverless data processing solutions using AWS Lambda and Athena, enabling cost-effective and scalable querying for client data.

Tuned Spark applications (memory, parallelism, batch intervals) to improve performance of large-scale client data workloads.

Developed distributed data processing applications using PySpark and Scala, running on YARN clusters for enterprise clients.

Created interactive dashboards and monitoring solutions using Grafana and OpenTSDB, providing real-time visibility into client data systems.

Performed data cleansing, transformation, and validation using Hive, MapReduce, and Spark, ensuring high-quality datasets for downstream client analytics.

Environment: AWS EC2, S3, Lambda functions, RDS, MLOps, Redshift, Cloud Watch, Glue, Athena, Python, HDFS, SQL, Hive, PySpark, Spark, Snowflake, ETL, Scala, Spark-SQL, HBase, Apache Airflow, Shell, NoSQL, Cassandra, Kafka, YARN, Hive, MapReduce, and Kafka.

Client: Bayer, St Louis, Missouri Aug’21-Sep’22

Role: Azure Data Engineer

Bayer is a global enterprise with core competencies in the Life Science fields of health care and agriculture. The project goal is to extract, clean, analyze and design the congregated data, design, and develop the process build systems that collect, manage, and convert raw data into readable and usable information for data scientists and business analysts to interpret and made data accessible for using it to evaluate and optimize their performance.

Roles & Responsibilities:

Worked on gathering requirements, business Analysis, Design and Development, testing and implementation of business rules.

Migrated applications from internal data storage to Azure and migrated the data from HiveQL to Azure and created Hive tables, loading, and analyzing data using hive scripts.

Analyzed large amounts of information from Exploratory data analysis and explored the data by using PySpark to general view distributions, correlation, statistic values, trends, and patterns of data in Databrick.

Worked with the Data Modelers to understand the data for ingestion into the common Data Model, Extracting Data from the different RDBMS Sources like DB2, Teradata to the HDFS using Sqoop.

Worked on the extracted data for analyzing data Mounting Azure Data Lake and Blob to Databricks.

Used Databricks, Scala and Spark for creating the data workflows and capturing the data from Delta tables in Delta lakes.

Worked on creating Azure Data Factory and managing policies for Data Factory and Utilized Blob storage for storage and backup on Azure.

Contributed to the development of MLOps infrastructure and tooling, including custom scripts, libraries, and automation frameworks, to streamline and scale machine learning operations across the organization.

Performed streaming of the applications in Azure Notebooks using Kafka and Spark.

Worked on producing JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data.

Performed ETL operations in Azure Databricks by connecting to different RDBMS using Kafka.

Automated the development process in Azure cloud which can ingest data daily from web service and load into Azure SQL DB.

Worked on building and developing Python scripts for validation of files in Databricks and automated the process using ADF.

Worked on performing the Streaming of pipelines using Azure Event Hubs and Stream Analytics to analyze data.

Developed the Framework for creation of new snapshots and deletion of old snapshots in Azure Blob Storage and worked on setting up the life cycle policies to back the data from Delta lakes.

Used PySpark for developing Spark applications in Python on distributed environment to load high volume files with different schema in to PySpark Data frames and process them to reload into Azure SQL DB tables.

Worked on creating GIT repositories and specified branching strategies using GitHub that best fitted the needs of the client.

Worked on writing the Databricks code and ADF pipeline with fully parameterized for efficient code management.

Automated using Unix shell scripts for high-level automation of executing HQL files and transferring the files to client server.

Implemented of distributed stream processing platform with low latency and seamless integration, with data and analytics services inside and outside Azure to build your complete big data pipeline.

Developed complex SQL queries using stored procedures, common table expressions, temporary table to support Power BI reports.

Used Project documentation feature: Documented all MicroStrategy objects like reports, prompts, filters and metrics for user reference. Developed reports for MicroStrategy Web users.

Worked on creating few Tableau dashboard reports, Heat map charts and supported numerous dashboards, pie charts and heat map charts that were built on Teradata database.

Environment: Python, SQL, Scala, Spark, PySpark, HiveQL, Azure Data Lake, DB2, Teradata, HDFS, Sqoop, Azure Data Factory, Blob storage, Databricks, Kafka, JSON, MLOps, Parquet, PySpark, ETL, Azure SQL DB, Azure Event Hubs, GIT, Unix, MicroStrategy, Power BI, and Tableau.

Client: Ford, India Jan’20-Dec’20

Role: Data Engineer

Ford Motor Corporation manufactures, sells, leases, and repairs passenger cars, trucks, buses, and their related parts worldwide. The main aim of this project is to identify and understand the business requirements and produce statistical analysis as needed to support these of business initiatives and create data products used to increase productivity and deliver data pipelines for analytically driven use cases, review existing analytic processes, and build the dashboards and reports.

Roles & Responsibilities:

Worked with the business analysts for collecting the data from the business analysts and users in the various file formats.

Migrated the data gathered from the internal sources to AWS using the EC2 instances for processing the data and S3 buckets for storage.

Worked with AWS Glue for transforming the data from the S3 buckets to the target database in Spark and configured the data load in the data pipeline from the S3 buckets into the Redshift.

Worked with AWS Cloud Watch for collecting the data and monitor instances.

Developed real-time data pipeline using Spark to ingest customer events/activity data into Hive and Cassandra from Kafka.

Skilled in implementing log aggregation, analysis, and correlation using Datadog Logging, enabling centralized log management, search, and visualization for troubleshooting and root cause analysis.

Created the Hive tables for further transforming the data into the HDFS for storing the data using the Pig scripts.

Conducted exploratory data analysis using Python to perform analysis to understand the distribution of cases.

Worked on building the Spark applications for extracting and transforming of data from the workflows using Scala.

Created ETL pipelines and developed the Spark applications and used AWS Glue for data ingestion.

Explored large amounts of information by using PySpark, and Plotly to general view distributions, correlation, statistical values, trends, and patterns of data.

Worked on monitoring the SQL scripts and improved the performance by modifying the script using PySpark SQL.

Used Apache Airflow for automating and validating the scripts for the execution of the data-driven workflow and ensure the productivity.

Worked on planning, tracking, and executing workable streams and held sprint/scrum meetings following Agile methodology.

Environment: Python, HBase, Spark, Amazon Redshift, AWS RDS, EC2 instances, S3, AWS Data Sync, Redshift, AWS Cloud Watch, Cassandra, Hive, Spark, Kafka, HDFS, Pig, Spark, Scala, ETL, AWS Glue, PySpark, SQL scripts, PySpark SQL, Apache Airflow, and Agile.

Client: Wissen Infotech Ltd, India Jan’19-Dec’19

Role: Azure Data Engineer

Wissen Infotech is a global IT services and consulting firm specializing in delivering data engineering, cloud transformation, and digital solutions for clients across financial services, healthcare, and technology domains. The primary goal of this project is to design and implement scalable data pipelines and data processing solutions to handle complex, high-volume datasets across enterprise systems. This includes applying data transformations, ensuring data quality through cleansing and validation, and optimizing data workflows for performance and reliability. The role also involves identifying and implementing process improvements, monitoring pipeline performance, and developing data visualizations to support business insights and decision-making.

Roles & Responsibilities:

Migrated legacy data systems to Azure SQL Database, improving scalability and performance of enterprise data platforms.

Designed and implemented end-to-end data solutions on Azure, including data storage, ingestion, processing, and visualization.

Developed and maintained ETL/ELT pipelines using Azure Data Factory and SSIS, enabling efficient data integration across multiple sources.

Built big data processing solutions using Azure Data Lake, HDInsight, and Spark, supporting large-scale data analytics.

Developed Spark applications using Python and Scala to extract, transform, and process data from diverse data sources.

Implemented real-time data streaming pipelines using Apache Kafka and Azure Stream Analytics for low-latency data processing.

Managed and optimized data in Azure Data Lake Storage, integrating with other Azure services for scalable data workflows.

Migrated legacy Pig and MapReduce jobs to Spark SQL, significantly improving data processing performance and efficiency.

Engineered data transformation workflows using Apache Spark and Python, enabling efficient batch and streaming data processing.

Configured and maintained Zookeeper for managing distributed systems and tracking high-volume data nodes.

Worked with SQL Server, SSIS, and SSRS for data integration, reporting, and enterprise data management.

Built and orchestrated data pipelines using Apache Airflow, ensuring reliable workflow scheduling and execution.

Re-architected existing applications into Azure Data Lake, Data Factory, and SQL Data Warehouse, modernizing legacy systems.

Designed logical and physical data models for staging, data warehouse, and data mart layers.

Developed interactive dashboards and reports using Power BI and SSRS, enabling data-driven decision-making.

Participated in full SDLC lifecycle using Agile (Scrum) methodology, delivering high-quality, scalable data solutions.

Environment: Azure SQL databases, Azure Data Lake Analytics/Store, HDInsight, Spark, Scala, Kafka, Azure Stream Analytics, Azure Data Lake Storage, Data Lake Analytics, Pig, Spark SQL, MapReduce, HDFS, Zookeeper, SSIS, SSRS, SQL Database and SQL Datawarehouse, Power BI, Agile.

Client: Evolet Technologies, India Jun’18-Dec’18

Role: SQL Developer

Roles & Responsibilities:

Participated in full Software Development Life Cycle (SDLC) including requirements gathering, design, development, testing, deployment, and maintenance.

Gathered business requirements and developed data integration solutions, transforming data from multiple file formats into DB2 and Teradata systems.

Performed data analysis, mining, and validation to support business decision-making and ensure data accuracy.

Designed and deployed SSIS packages using control flow tasks (Data Flow, For Each Loop, Execute SQL Task) for ETL processing.

Developed and optimized SQL and T-SQL queries for data transformation and performance tuning using SQL Profiler and Database Tuning Advisor.

Implemented ETL logging frameworks in SSIS to track data loads, monitor performance, and ensure data reliability.

Built UNIX shell scripts to automate ETL workflows in data warehouse environments.

Ensured data quality and integrity by applying constraints such as Primary Key, Unique, and Check constraints.

Developed database objects including stored procedures, functions, triggers, views, and complex joins using SQL and PL/SQL.

Managed SQL Server administration tasks, including performance tuning, indexing, security, backup strategies, and job automation.

Implemented data extraction and loading processes from OLTP systems to staging and enterprise data warehouse using SSIS.

Designed and optimized SSIS data flow transformations (Aggregate, Conditional Split, Derived Column) for efficient data processing.

Performed data validation and testing by comparing source and target systems to ensure accuracy in the data warehouse.

Created reports and dashboards using SSRS and Tableau, including drill-down, parameterized, and analytical reports.

Supported ad-hoc reporting and business analysis, delivering actionable insights through data visualization tools.

Environment: DB2, Teradata, SQL, T-SQL, SQL Server Integration Services, PL/SQL, MS Visio, MS Access, MS Office, Excel, Unix scripting, OLTP, SQL Server Reporting Services and Tableau.

EDUCATION DETAILS:

Bachelor in Information Science, Visvesvaraya Technological University – 2020

Master in Computer Science, Cleveland State University - 2022

Contact this candidate