Post Job Free
Sign in

Senior Data Engineer - Cloud Data Platform Expert

Location:
Denton, TX, 76201
Salary:
95000
Posted:
March 19, 2026

Contact this candidate

Resume:

Diwakar Gupta

Data Engineer

Denton, TX ************.***@*****.*** 940-***-**** LinkedIn

PROFESSIONAL SUMMARY

• Data Engineer with 5+ years of experience designing, developing, and optimizing large-scale data pipelines, distributed processing systems, and cloud-based data platforms across global enterprises including Meta, BlackRock, and Honeywell. Skilled in building real- time and batch ETL workflows leveraging Python, SQL, Apache Spark, PySpark, Kafka, Airflow, and dbt to deliver reliable, high- performance data solutions.

• Proven expertise across AWS (Redshift, Kinesis, Glue), Azure (Data Lake, Delta Lake, Synapse), and Snowflake, with strong focus on data modeling, performance optimization, and automation through CI/CD, Docker, Kubernetes, and Terraform. Experienced in feature engineering and ML data pipelines supporting production-grade machine learning systems using FBLearner Flow and PySpark MLlib. Recognized for improving data throughput, reducing latency, and enhancing analytical reliability across petabyte-scale environments. TECHNICAL SKILLS

Programming & Scripting: Python, Java, Scala, R, Shell Scripting, SQL (T-SQL, PL/SQL), NoSQL (MongoDB, DynamoDB, Cassandra) Big Data & Streaming Technologies: Spark, PySpark, Hadoop (Hive, HDFS, MapReduce), Kafka, Kinesis, Data Lake Architecture ETL & Workflow Orchestration: Apache Airflow, AWS Glue, dbt Core, SSIS, Batch Processing, Pipeline Automation, Data Integration Databases & Data Warehousing: Snowflake, Redshift, PostgreSQL, MySQL, Oracle, Data Modeling, Data Governance, Query Optimization Cloud Platforms: AWS (Redshift, S3, Glue, Lambda, EMR, Athena, EC2), Azure (Databricks, Synapse, Data Lake Storage, Azure ML) DevOps & Infrastructure as Code: Docker, Kubernetes (EKS, AKS), Jenkins, Git, GitHub Actions, Terraform, CI/CD Pipelines Machine Learning & MLOps: Scikit-learn, PyTorch, TensorFlow, Pandas, NumPy, MLflow, Model Deployment, Model Monitoring, Feature Store, Natural Language Processing (NLP), LLMs Pipelines Business Intelligence & Visualization: Power BI, Tableau, SSRS, Advanced Excel (Formulas, Pivot Tables), KPI Dashboard Development Other Core Competencies: Agile (Scrum) Methodologies, SDLC, Data Quality (Great Expectations), Performance Tuning, UDFs, Recursive CTEs EXPERIENCE

Meta – Data Engineer

Location: California, USA Duration: January 2025 – Present

• Designed, developed, and optimized petabyte-scale data pipelines using Python and PySpark on Meta’s XStream and Hive frameworks, improving data throughput by 20% for real-time machine learning signals.

• Built scalable feature engineering pipelines using PySpark and Meta’s Tectonic layer to process billions of records, increasing model training efficiency by 25%.

• Partnered with machine learning engineers to deploy real-time recommendation models using FBLearner Flow, building low-latency feature stores that improved personalization and boosted user engagement by 12%.

• Automated 15+ end-to-end ETL workflows by creating dynamic Airflow DAGs and optimizing Presto queries for distributed execution, reducing cross-regional data latency by 30%.

• Developed a modular data transformation layer using dbt and advanced SQL (CTEs, window functions) to standardize business logic and ensure data consistency across 10+ enterprise datasets. BlackRock – Data Engineer

Location: New York, USA Duration: September 2023 – December 2024

• Optimized complex SQL workloads in Snowflake, utilizing query parallelization, data partitioning, and warehouse scaling to reduce Value-at-Risk (VaR) computation time from 3 hours to 30 minutes for high-volume portfolio analytics.

• Engineered and maintained event-driven data pipelines using Apache Kafka, Apache Spark, and Apache Airflow, supporting real-time ingestion of over 8M+ financial transactions daily with low-latency SLAs.

• Designed and integrated Azure Data Lake Storage Gen2 with Delta Lake and hierarchical namespace support, enabling petabyte-scale analytics while cutting storage costs by 30% through automated lifecycle policies.

• Built containerized CI/CD pipelines using Docker, Kubernetes, and Jenkins; automated cloud infrastructure provisioning via Terraform, achieving a 99.9% deployment success rate and reducing environment setup time by 65%.

• Developed robust Python-based data validation frameworks with Pandas, PySpark, and Great Expectations, automating data quality assurance and improving dataset reliability by 25% for investment analytics pipelines. Honeywell – Data Engineer

Location: India Duration: May 2019 – July 2022

• Tuned large-scale ETL pipelines within the Hadoop ecosystem (HDFS, Hive, MapReduce, Sqoop), enhancing processing speed by 20% through advanced query optimization techniques (joins, partitioning, bucketing, ORC formats) and orchestration via Apache Oozie.

• Designed and optimized SQL-based ETL workflows on AWS Redshift and Snowflake to process IoT and building management system data, improving data transformation efficiency by 45% and reducing query costs by 25%.

• Built real-time data streaming pipelines using Python, Apache Kafka, and AWS Kinesis to ingest and process telemetry from industrial equipment, enabling live performance monitoring and reducing incident response time by 20% through predictive alerting.

• Managed multiple concurrent data engineering projects supporting industrial automation and smart building solutions, collaborating with cross-functional teams under an Agile framework to improve project delivery timelines by 15%. EDUCATION

Master of Computer Engineering, University of North Texas, Denton, TX, USA.



Contact this candidate