Senior Data Engineer Cloud & Lakehouse Expert

Location:

Richardson, TX

Posted:

March 19, 2026

Contact this candidate

Resume:

Kushal Kumar S (Data Engineer)

Phone: 945-***-**** Location: Dallas, Texas, USA Mail: ***********@*****.*** SUMMARY

• Data Engineer with 5+ years of experience designing, building, and optimizing scalable data pipelines, data platforms, and Lakehouse architectures across cloud and on-premise environments.

• Strong expertise in SQL, Python, and PySpark for large-scale data processing, transformation, validation, and perfor- mance optimization of enterprise datasets. Extensive experience with cloud platforms (AWS, Azure, GCP) and modern data warehouses including Snowflake, BigQuery, Redshift, Synapse, and Azure Data Lake.

• Proven experience building ETL/ELT pipelines using Apache Spark, Databricks, Airflow, AWS Glue, Azure Data Fac- tory, and Informatica for batch and real-time ingestion.

• Hands-on experience with big data and streaming technologies such as Apache Kafka, Spark Streaming, and AWS Ki- nesis for real-time and event-driven data architectures.

• Strong background in the Hadoop ecosystem including HDFS, Hive, HBase, Pig, Sqoop, Impala, Flume, Zookeeper, and MapReduce for distributed storage and parallel data processing.

• Experience supporting AI and ML pipelines including feature engineering, model training data preparation, and scala- ble inference pipelines using TensorFlow, PyTorch, and Scikit-learn.

• Practical experience with NLP, LLMs, and Generative AI pipelines, enabling text processing, embeddings, vector data- bases, prompt engineering, and AI-driven analytics solutions.

• Proficient in DevOps and DataOps practices using Git, CI/CD, Docker, Kubernetes, and GitHub Copilot to accelerate development, automate deployments, and improve engineering productivity.

• Strong collaboration with data scientists, analysts, and business stakeholders to deliver high-quality datasets, semantic models, and BI dashboards using Power BI, Tableau, and Looker. EDUCATION

University of North Texas

Master’s in advanced data Analytics (AUG 2023 -MAY 2025) TECHNICAL SKILLS

• Programming & Query Languages: Python, SQL, PySpark, Java, Scala

• Data Engineering & Big Data: Apache Spark, Hadoop, Hive, HDFS, MapReduce, Databricks, Delta Lake

• Databases & Data Stores: MySQL, PostgreSQL, Oracle, MongoDB, Cassandra, DynamoDB

• Data Warehousing & Lakehouse: Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse, Azure Data Lake, Am- azon S3

• ETL / ELT & Data Integration: Apache Airflow, AWS Glue, Azure Data Factory, Informatica, Talend, Apache NiFi, dbt

• Streaming & Real-Time Processing: Apache Kafka, Spark Streaming, AWS Kinesis, Google Pub/Sub

• Cloud Platforms: AWS, Microsoft Azure, Google Cloud Platform (GCP)

• DevOps, Containers & CI/CD: Docker, Kubernetes, Jenkins, Terraform, Git, GitHub, GitLab

• AI & Machine Learning Pipelines: ML data pipelines, feature engineering, model training data preparation, ML workflow support

• NLP, LLM & Generative AI: Text processing, embeddings, vector databases, LLM data pipelines, prompt engineering, GenAI data preparation

• AI Development Tools: GitHub Copilot, AI-assisted development, code optimization, productivity automation

• BI & Data Visualization: Power BI, Tableau, Looker

• Other Data Platforms & Tools: Databricks, Delta Lake, AWS Lambda, Apache Beam, Luigi CERTIFICATION

AWS Certified Solutions Architect - Associate (SAA-C03)

Google Cloud Certified Professional Data Engineer

Microsoft Certified Azure Data Fundamentals

Microsoft Certified Data Engineer Associate (DP-203) EMPLOYMENT HISTORY

Azure Data Engineer Upbound Group, Plano, Texas, USA Apr 2025 - Present Upbound Group is a technology and data-driven company that provides financial products for consumers. I develop and optimize data models, and implement storage solutions utilizing Azure Data Lake, Azure SQL Database, and Azure Cosmos DB.

Responsibilities and Achievements:

Designed and built scalable data pipelines using Azure Data Factory and Azure Synapse Analytics to integrate enterprise data from operational, supply chain, and financial systems.

Developed high-performance data processing solutions using Azure Databricks and PySpark for large-scale batch and near real-time analytics. Implemented Lakehouse architecture using Azure Data Lake Storage (ADLS Gen2) and Delta Lake, enabling reliable, scalable, and cost-efficient data storage.

Built robust ETL/ELT pipelines to ingest and transform structured and semi-structured data from multiple enterprise sources into centralized data platforms.

Optimized Synapse SQL, Spark jobs, and data pipelines through partitioning, indexing, and performance tuning to improve query performance and reduce processing time.

Integrated enterprise datasets with Power BI dashboards and reporting solutions, enabling business insights for operations, finance, and leadership teams. Automated data platform deployments using Azure DevOps CI/CD pipelines, improving code quality, version control, and release management.

Developed and supported AI and Machine Learning data pipelines using Azure Machine Learning, enabling fea- ture engineering, model training datasets, and scalable inference workflows.

Built data pipelines for NLP, LLM, and Generative AI use cases, supporting text analytics, embeddings generation, vector-based search, and AI-driven insights.

Leveraged GitHub Copilot to accelerate Python, SQL, and PySpark development, improving engineering produc- tivity and maintaining high-quality data engineering standards. AWS Data Engineer eBay, Austin, Texas, USA Nov 2023 - Mar 2025 eBay Inc. is an American multinational e-commerce company. Implemented and managed data storage solutions using Am- azon S3 (for unstructured data), Amazon Redshift (for data warehousing), and Amazon RDS or Aurora (for relational data). Responsibilities and Achievements:

Developed ETL/ELT pipelines using AWS Glue, Amazon S3, and Python, enabling reliable ingestion and transfor- mation of large-scale transactional, marketplace, and customer datasets.

Implemented data lake and warehouse solutions using Amazon S3 and Amazon Redshift, supporting high-volume analytics, reporting, and business intelligence across e-commerce platforms.

Processed large datasets using Apache Spark on AWS EMR, improving performance and scalability for batch data processing and analytical workloads.

Built near real-time data pipelines using Amazon Kinesis and AWS Lambda to capture and process marketplace events, order transactions, and user activity streams.

Performed complex data transformation, validation, and optimization using Python and SQL to improve data qual- ity and query performance. Utilized big data technologies including Hadoop, Kafka, Hive, and Impala for distrib- uted processing, streaming data ingestion, and large-scale analytics.

Managed enterprise data ingestion and integration using Informatica, Talend, Apache NiFi, Sqoop, Flume, and Zookeeper, supporting efficient data movement across multiple platforms and systems. GCP Data Engineer FIS, Bangalore, India Jan 2022 - Jul 2023 Fidelity National Information Services, Inc. (FIS) is an American multinational corporation which offers a wide range of fi- nancial products and services. Built and maintained integrations with internal and external data sources and APIs, ensuring compatibility and interoperability between different systems and platforms. Responsibilities and Achievements:

Migrated legacy enterprise data platforms to Google Cloud Platform (GCP) using Cloud Storage and BigQuery for scalable and secure storage of large financial and transactional datasets.

Implemented data ingestion and transformation pipelines using Cloud Dataflow, Cloud Dataprep, and BigQuery SQL, ensuring high data quality and consistency across enterprise systems.

Optimized BigQuery performance using partitioning, clustering, and query tuning, improving analytics perfor- mance for large-scale financial datasets.

Automated event-driven workflows using Cloud Functions and Pub/Sub, enabling efficient processing of streaming data and operational events. Monitored and maintained data pipelines using Cloud Logging and Cloud Monitoring, ensuring reliability, performance, and SLA compliance of data platforms.

Managed structured and NoSQL databases including MySQL, PostgreSQL, MongoDB, Cassandra, and HBase, sup- porting high-performance data storage, querying, and enterprise analytics. Data Analyst Mankind Pharma, Bangalore, India Jun 2020 - Dec 2021 Mankind Pharma is an Indian multinational pharmaceutical and healthcare product company. Performed data transformation and processing using BigQuery SQL, Dataproc (for Apache Spark and Hadoop), or Dataflow (for Apache Beam). Optimize data processing jobs for performance and cost-efficiency. Responsibilities and Achievements:

• Implemented real-time data streaming pipelines using Apache Kafka and AWS Kinesis, enabling continuous processing of operational and transactional datasets.

• Utilized Apache Spark and Hadoop HDFS for large-scale batch data processing, improving performance of enterprise analytics workloads. Automated data pipeline scheduling, version control, and CI/CD processes using Jenkins and Git, ensuring reliable and consistent data operations.

• Maintained enterprise data warehouses usinokg Snowflake, optimizing SQL queries and supporting reporting and ana- lytics across business teams.

• Delivered business intelligence dashboards using Power BI and Tableau, while enforcing data governance, RBAC ac- cess control, and data security standards for sensitive enterprise data.

Contact this candidate