Data Engineer Senior

Location:

Dallas, TX

Salary:

140000

Posted:

October 09, 2025

Contact this candidate

Resume:

Daniel Lopez ******.*****.**********@*****.***

Senior Data Engineer 339-***-**** Dallas, TX

linkedin.com/in/daniel-lopez-65159b27a

SUMMARY

Senior Data Engineer with 8 years of experience designing and scaling data platforms, real-time pipelines, and cloud-native infrastructure for finance, e-commerce, legal, and IoT. Skilled in building streaming and batch systems that process billions of events, designing warehouse and lakehouse solutions, and automating data workflows for analytics and ML. Expert in Spark, Kafka, Airflow, Redshift, Snowflake, and AWS. Known for improving pipeline reliability, reducing costs, and delivering data products that directly impact revenue, retention, and operational efficiency.

TECHNICAL SKILLS

● Languages: Python, Scala, SQL, R, Java, C/C++, Bash

● Frameworks & APIs: FastAPI, Flask, Django, REST, GraphQL, gRPC, LangChain

● Data Storage & Databases: PostgreSQL, MySQL, MongoDB, Oracle, Redshift, Snowflake, DynamoDB, S3, Cassandra, Elasticsearch, Redis, HDFS

● Big Data & Streaming: Apache Spark, Kafka, Flink, Beam, AWS Kinesis, RabbitMQ, NiFi

● Data Engineering Tools: Airflow, dbt, MLflow, Great Expectations, Terraform, Docker, Kubernetes, Helm, Superset, Metabase

● Visualization & BI: Tableau, PowerBI, Grafana, Kibana, Matplotlib, Seaborn

● Machine Learning & AI Support: TensorFlow, PyTorch, Scikit-learn, Hugging Face, vector search, RAG, LLMOps integration

● Cloud Platforms: AWS (Glue, Redshift, EMR, Step Functions, Lambda, SageMaker, EKS), GCP (BigQuery, Dataflow, AI Platform), Azure (Data Factory, Synapse, AKS), Databricks, Snowpark

● DevOps & CI/CD: GitHub Actions, Jenkins, Azure DevOps, Terraform Cloud, infrastructure as code, monitoring with DataDog and Prometheus

WORK EXPERIENCE

Senior Data Engineer - AstroSirens Oct 2021 - Present

● Led the design of multi-tenant data platforms for clients across finance, retail, legal, and IoT. Architected hybrid batch and streaming systems on AWS using Spark, Kafka, Airflow, and Delta Lake that process 2B+ events each month.

● Consolidated data from PostgreSQL, MongoDB, and flat files into Snowflake and Redshift. Created star schemas and optimized clustering, partitioning, and indexing. Reduced query times by 50% and enabled self-service analytics for 100+ business users.

● Built a contract intelligence pipeline with GPT-4, LangChain, and Elasticsearch. Automated ingestion, clause extraction, and embedding storage to enable semantic search and retrieval. Cut legal contract review effort by 70% and tripled throughput.

● Designed real-time fraud and anomaly detection pipelines with Kafka, Spark Structured Streaming, and Kinesis. Applied sliding windows, schema validation, and online feature generation. Reduced false alerts by 28% and improved detection speed by 40%.

● Developed customer behavior and churn analytics pipelines using dbt, Airflow, and Snowflake. Supported uplift modeling and A/B testing with marketing teams, contributing to a 15% improvement in customer retention.

● Created recommendation and personalization systems powered by FAISS and Elasticsearch vector search. Combined behavioral and content features to drive an 18% increase in upsell conversions and 12% higher click-through rates.

● Automated ETL workflows with dbt and Great Expectations. Enforced feature contracts and implemented SLA/SLO monitoring. Cut data quality incidents by 35% and reduced pipeline failures in production.

● Integrated MLflow and SageMaker into pipeline workflows for model versioning, retraining, and deployment. Improved time-to-production for ML models by 30%.

● Built operational dashboards in PowerBI, Tableau, and Grafana to provide leadership with near real-time views of product adoption, NPS, and support performance.

● Mentored six engineers and data scientists in Spark optimization, Docker/Kubernetes, and data pipeline architecture. Established best practices in CI/CD, version control, and testing that raised overall code quality. Data Engineer - Splunk Jun 2018 - Sep 2021

● Built ingestion and enrichment pipelines for SIEM telemetry using Spark and Kafka. Supported 100TB+ of daily security logs while maintaining sub-second ingestion latency.

● Delivered predictive maintenance workflows for IoT devices using Spark streaming and rolling retraining. Cut downtime by 20% and gave operations teams better visibility with real-time dashboards.

● Developed NLP-based log parsing and classification services using Hugging Face and PyTorch. Integrated into Spark ETL workflows, reducing MTTR for security incidents by 25%.

● Optimized data pipelines in Redshift and PostgreSQL with clustering, materialized views, and workload management. Cut report generation times from 5 minutes to under 15 seconds.

● Built observability dashboards in Splunk and Grafana to track pipeline health, anomaly scores, and drift metrics. Boosted confidence in automated detections among engineering teams.

● Created validation suites using Great Expectations and Pytest to catch schema drift and null explosions early. Lowered downstream data issues by 40%.

● Partnered with ML teams to automate feature pipelines and training dataset generation. Introduced dataset versioning and lineage tracking with MLflow.

● Collaborated with product teams to roll out new customer-facing anomaly detection features that scaled to thousands of enterprise clients.

Junior Data Engineer - IBM May 2017 - Apr 2018

● Designed ETL workflows using SQL, Redshift, and S3 to prepare financial and legal datasets for downstream analytics and ML.

● Built time-series forecasting pipelines with Prophet and ARIMA, including parameterized ETL jobs and automated validation. Improved forecast accuracy and planning reliability.

● Developed document ingestion and entity extraction pipelines using TensorFlow models integrated with Spark. Improved processing throughput and reduced error rates in document classification.

● Contributed to data quality checks, feature engineering steps, and pipeline testing for enterprise clients.

● Assisted in scaling Spark jobs and improving pipeline reproducibility. Documented best practices and trained team members on SQL optimization and partitioning strategies. EDUCATION

Master of Science in Engineering Data Science and AI University of Houston Sep 2015 - Mar 2017 Houston, TX

Bachelor of Science in Computer Science University of Houston Apr 2011 - Sep 2015 Houston, TX

Contact this candidate