Post Job Free
Sign in

Data Engineer Senior

Location:
Dallas, TX
Salary:
12000
Posted:
October 09, 2025

Contact this candidate

Resume:

Angel Contreras *****************@*****.***

Senior Data Engineer 945-***-**** Dallas, TX

linkedin.com/in/angel-contreras-154281379

SUMMARY

Senior Data Engineer with 8 years of experience designing and optimizing large scale distributed data platforms and ETL pipelines. Expert in building real time data streaming, data warehousing and deploying AI and ML solutions in the cloud. Proficient in Python, SQL, Scala and Apache Spark with hands-on experience in AWS, GCP, Azure. Skilled in modern data engineering tools like Databricks, Apache Airflow, MLflow, dbt, Kafka, Snowflake and Terraform to build scalable, maintainable, cost effective data solutions. Proven leader of data engineering initiatives supporting analytics, machine learning workflows and LLM integration using OpenAI and RAG architectures in production. TECHNICAL SKILLS

● Languages: Python, Scala, SQL, R, Java, C/C++, Bash

● Frameworks & APIs: Django, FastAPI, Flask, SpringBoot, REST, GraphQL, gRPC, LangChain

● Data Storage & Databases: MySQL, PostgreSQL, MongoDB, Oracle, BigQuery, ElasticSearch, AWS Redshift, Snowflake, AWS S3, DynamoDB, Redis, Hadoop HDFS, Cassandra

● Big Data & Streaming: Apache Spark, Kafka, Apache Flink, AWS Kinesis, RabbitMQ, Apache Beam, NiFi

● Data Engineering Tools: Apache Airflow, MLflow, dbt, Pandas, Numpy, Pytest, Terraform, Docker, Kubernetes, Helm, Apache Superset, Metabase

● Visualization & BI: PowerBI, Tableau, Grafana, Matplotlib, Seaborn

● Machine Learning & AI: TensorFlow, PyTorch, Scikit-learn, OpenCV, Hugging Face Transformers, NLP, Vector Search, RAG, LangChain, LLMOps

● Cloud Platforms: AWS (Glue, EC2, RDS, Lambda, SageMaker, EKS, Redshift), GCP (Dataflow, BigQuery, AI Platform), Azure (Data Factory, Synapse, AKS), Databricks, Snowpark, Terraform Cloud

● DevOps & CI/CD: GitHub Actions, Azure DevOps, Jenkins, Linux Administration, Infrastructure as Code

(IaC), ELK Stack (Elastic, Logstash, Kibana)

WORK EXPERIENCE

Senior Data Engineer – AstroSirens Mar 2022 - Present

● Led the design of a customer behavior analytics platform by building real-time and batch pipelines using Apache Spark on Databricks, Airflow and Delta Lake. Processed over 2 billion monthly events from web and mobile to enable personalized marketing and product insights.

● Migrated fragmented datasets from MongoDB, PostgreSQL and Google Sheets to Redshift and Snowflake using AWS Glue and custom transformation logic. Improved query times by 50% and reduced the reporting backlog by 80%.

● Built a generative AI pipeline for document processing using OpenAI GPT-4, LangChain and FastAPI. Applied RAG architecture and stored embeddings in ElasticSearch to help legal operations cut contract review time by 70%.

● Developed a real-time user engagement tracking system using Kafka, AWS Kinesis and Lambda. Stored data in Parquet format on S3 and built monitoring dashboards in Grafana to reduce incident response time by 40%.

● Integrated MLflow and SageMaker into existing workflows to support versioning, model training and automated deployment of churn prediction models used by customer success teams.

● Automated ETL workflows across development, staging and production using dbt, Great Expectations and GitHub Actions. Used Terraform and AWS CDK to manage infrastructure changes and ensure deployment consistency.

● Designed PowerBI and Tableau dashboards for operations and leadership teams. Delivered near real-time views of product usage, NPS trends and support KPIs that directly influenced team priorities.

● Mentored junior engineers in Docker, Kubernetes, Terraform and CI/CD practices using Azure DevOps. Helped improve team velocity and code quality through hands-on guidance and code reviews.

● Worked closely with product and ML teams to build lead scoring models and recommendation systems that increased upsell conversion by 18% year over year. Data Engineer – Databricks Aug 2018 - Feb 2022

● Developed data ingestion and transformation pipelines using Azure Data Factory, Airflow and Python. Handled structured and unstructured data from OCR tools, IoT devices and external APIs for use in client-facing dashboards.

● Built a modular ELT framework using Azure Synapse, Snowflake and dbt to reduce onboarding time for new data sources by 60%. Enabled analysts to explore data without engineering help.

● Built a hybrid batch and streaming data platform with Kafka, Python and Node.js. Transformed clickstream data into user journey insights used by product and leadership teams.

● Set up data validation using Great Expectations and Pytest to catch issues early in the pipeline. Reduced downstream errors and improved confidence in reporting.

● Optimized Redshift and PostgreSQL performance with materialized views and query tuning. Reduced dashboard load times from minutes to seconds across internal reporting tools.

● Developed microservices with FastAPI and Docker and deployed them to Azure Kubernetes Service. Exposed real-time data endpoints to internal teams building personalization features.

● Created an internal tool to visualize data lineage using Flask and React. Helped compliance and governance teams audit data use and trace downstream dependencies.

● Migrated legacy logs to AWS S3 using Glue crawlers and Athena. Indexed metadata in DynamoDB to support historical lookups and audits.

● Built real-time anomaly detection alerts using Grafana and Lambda. Helped reduce false positives and improved the reliability of system monitoring.

● Worked with ML engineers to prepare training datasets and automate feature pipelines using TensorFlow and Pandas. Accelerated fraud detection model development.

● Redesigned messaging infrastructure with RabbitMQ and Kafka and introduced Terraform to manage service configuration. Improved scalability and simplified team ownership. Data Engineering Intern – Netflix Nov 2017 - May 2018

● Built and tested ETL pipelines using PySpark and Airflow to process millions of playback and search events for the content discovery team.

● Wrote Python scripts to combine user interaction logs, device data and catalog metadata. Helped analysts evaluate the performance of recommendation experiments.

● Created dashboards in Redshift and Tableau to show daily active users and session completion rates. Gave product teams better visibility into engagement metrics.

● Worked with senior engineers to improve S3 partitioning and Redshift table structures. Reduced query times for several reporting use cases.

● Containerized testing jobs with Docker and participated in team sessions on data modeling, batch and streaming systems and data governance best practices. EDUCATION

Master of Science in Data Engineering University of Houston Jun 2015 - Oct 2017 Houston, TX

Bachelor of Science in Computer Science University of Houston Apr 2011 - May 2015 Houston, TX



Contact this candidate