Data Engineer Big

Location:

Dallas, TX, 75201

Posted:

May 13, 2025

Contact this candidate

Resume:

NAME: Evangel D

Sr. Data Engineer

Email: ************@*****.*** Phone: 504-***-****

LinkedIn: www.linkedin.com/in/evangel-d-a027792b6

PROFESSIONAL SUMMARY

Sr. Data Engineer with 10+ years of experience designing, developing, and optimizing data pipelines, ETL workflows, and big data solutions using Python, Java, Scala, and SQL.

Cloud Data Engineering expertise in Azure, AWS, and GCP, implementing data solutions using AWS EMR, Azure HDInsight, and GCP BigQuery.

Proficient in data management and retrieval with hands-on experience in Snowflake, Redshift, SQL Server, PostgreSQL, Teradata, and MySQL.

Skilled in ETL and data pipeline development using tools like Apache Airflow, AWS Glue, Azure Data Factory (ADF), and Informatica for seamless data integration across platforms.

Expertise in streaming data processing with Apache Kafka, Spark Streaming, AWS Kinesis, and GCP Pub/Sub for real-time analytics and event-driven architectures.

Data governance and security knowledge, implementing encryption protocols, IAM, Key Vault, and regulatory compliance standards to ensure data privacy and integrity.

In-depth big data expertise with technologies like Apache Spark, Hadoop, HDFS, and MapReduce for high-performance data processing and analytics at scale.

Knowledgeable in data analytics and machine learning, utilizing tools like Pandas, NumPy, Scikit-learn, and forecasting models to drive business insights and predictive analytics.

Skilled in metadata and data quality management, using Erwin Data Modeler, Data Catalog, and automated data quality checks to ensure data integrity and governance.

BI & reporting expertise with Power BI, Tableau, and AWS QuickSight tools to create interactive dashboards and visualizations for decision-making.

Experience working in Agile environments, utilizing Scrum and Kanban frameworks and tools like JIRA and ServiceNow for efficient project management and execution.

Expert in data problem-solving and performance optimization.

Ensures data-driven solutions meet business goals by facilitating clear communication and strong collaboration between technical and business teams.

TECHNICAL SKILLS

Programming Languages & Scripting: Python, Java, Scala, SQL, Shell Scripting, JavaScript

Big Data & Distributed Computing: Apache Spark, Spark SQL, Hadoop, HDFS, MapReduce, Apache Hive, Presto, Apache Airflow, Apache Kafka, AWS EMR, Azure HDInsight, Spark Streaming

Cloud Platforms & Services: Azure (Data Factory, Data Lake Storage, Synapse Analytics, Blob Storage, Functions, Logic Apps, Kubernetes Service (AKS), Backup, Site Recovery, Key Vault, Security Center, DevOps, Monitoring, Active Directory, Data Catalog, HDInsight); AWS (Lambda, Glue, Data Pipeline, Athena, Redshift, S3, DynamoDB, SQS, SNS, CloudFormation, CloudWatch, CloudTrail, KMS, Secrets Manager, IAM, QuickSight, ECS, EKS, CloudFront); GCP (BigQuery, Dataflow, Pub/Sub)

DevOps & CI/CD: Git, GitHub, Jenkins, ServiceNow, Azure DevOps, AWS CloudFormation

Machine Learning & Data Science: NumPy, Pandas, Scikit-learn, SciPy, Forecasting Models, Data Analytics

Databases & Data Warehousing: SQL Server, PostgreSQL, CosmosDB, Teradata, MySQL, Snowflake

ETL & Data Integration: ADF, AWS Glue, Informatica, Apache Sqoop

Data Visualization & Reporting: Power BI, Tableau, Matplotlib, Azure Data Studio

Data Modeling & Governance: Erwin Data Modeler, Metadata Management, Data Quality Metrics, Data Security Best Practices, Data Compliance Standards, Encryption Protocols

Workflow & Agile Methodologies: JIRA, Agile, Scrum, Kanban

WORK EXPERIENCE

Molina healthcare, Bothell, WA

Sr. Data Engineer November 2023 - Present

Responsibilities:

Designed and implemented Azure Data Lake Storage and Azure Blob Storage solutions to efficiently manage structured and unstructured data at scale.

Developed Azure Stream Analytics pipelines to process real-time streaming data, enabling low-latency insights for business-critical applications.

Utilized Erwin Data Modeler to create conceptual, logical, and physical data models, ensuring optimized data structures for enterprise-wide applications.

Built Azure Data Factory (ADF) pipelines to orchestrate ETL workflows and integrated diverse data sources for seamless transformation.

Engineered Matillion ETL jobs to extract, load, and transform data across cloud-based data warehouses, enhancing data processing efficiency.

Developed data processing frameworks in Azure Synapse Analytics, leveraging Spark SQL and PySpark to perform distributed computations.

Designed and optimized Power BI dashboards, transforming raw datasets into actionable insights for business intelligence and decision-making.

Created complex data visualizations in MicroStrategy, integrating multiple data sources to enable data-driven business strategies.

Strengthened cloud security by implementing Azure Security Center policies, ensuring compliance with industry security standards.

Ensured high availability with Azure Backup and Site Recovery strategies.

Built Azure Functions to automate serverless data processing tasks, improving efficiency and reducing operational overhead.

Established Azure Data Catalog for metadata management, improving data discovery and governance across the organization.

Designed and managed large-scale Hadoop clusters and Azure HDInsight environments to support big data workloads.

Developed scalable data engineering pipelines on Azure Databricks, leveraging PySpark for data transformation and analytics.

Administered SQL Server databases, optimizing indexing, partitioning, and query tuning to improve performance and scalability.

Implemented CosmosDB solutions to handle globally distributed, multi-model data storage with minimal latency.

Built and integrated RESTful APIs for seamless data exchange between enterprise applications and cloud-based storage systems.

Managed end-to-end Azure DevOps pipelines for CI/CD automation, ensuring rapid deployment and integration of data engineering solutions.

Led JIRA-based Agile sprint planning and backlog grooming for cross-functional teams, enhancing project collaboration.

Developed workflow automation using Azure Logic Apps, streamlining data processing and reducing manual intervention.

Designed data lineage tracking frameworks to ensure transparency and traceability of data movement across systems.

Applied advanced indexing and partitioning strategies to optimize large-scale database queries and minimize execution time.

Performed query tuning to enhance the performance of complex SQL queries, improving data retrieval speeds.

Implemented RBAC (Role-Based Access Control) and data encryption strategies to secure sensitive enterprise data.

Configured Azure Monitor and Log Analytics to track system performance, detect anomalies, and enhance real-time monitoring.

Deployed containerized data pipelines using Docker and Kubernetes, ensuring efficient scalability and resource utilization.

Automated infrastructure provisioning with Terraform improves deployment consistency and reduces manual configuration errors.

Executed cloud migration projects, seamlessly transferring on-premise data and applications to Azure while ensuring minimal downtime.

Applied machine learning techniques for feature engineering, data augmentation, and data preprocessing, enhancing predictive analytics capabilities.

Environment: ADLS, Azure Blob Storage, Azure Stream Analytics, ADF, Azure Synapse Analytics, Azure Security Center, Azure Backup, Azure Site Recovery, Azure Functions, Azure Data Catalog, Azure HDInsight, CosmosDB, Azure DevOps, Azure Monitor, Azure Logic Apps, Matillion ETL, Spark SQL, PySpark, Hadoop, Azure Databricks, Erwin Data Modeler, Power BI, MicroStrategy, SQL Server, Terraform, Docker, Kubernetes, RESTful APIs, JIRA.

First Republic Bank, New York, NY

Data Engineer May 2021 - October 2023

Responsibilities:

Designed and implemented scalable Amazon S3 data lakes to efficiently store and manage structured and unstructured data.

Developed high-performance Amazon Redshift data warehouses to support analytical workloads and optimize query performance.

Utilized Amazon Athena for serverless interactive querying of large datasets stored in Amazon S3, reducing processing costs.

Automated ETL workflows using AWS Glue, enhancing data transformation and integration across multiple sources.

Designed and orchestrated data pipelines using AWS Data Pipeline for efficient data movement and processing.

Implemented DBT (Data Build Tool) for modular data transformation, ensuring data reliability and consistency.

Built data engineering solutions on Databricks, leveraging Apache Spark for large-scale data processing and analytics.

Wrote optimized Python scripts utilizing Pandas, NumPy, and SciPy to clean, transform, and analyze large datasets.

Configured AWS EMR clusters for big data processing using Hadoop, Hive, and Apache Spark, improving data ingestion performance.

Developed and deployed RESTful APIs to enable seamless data integration between various applications and services.

Implemented secure access controls using AWS IAM, defining IAM roles for granular permissions and data security.

Secured sensitive data through AWS KMS and AWS Secrets Manager, ensuring encryption and access control best practices.

Automated infrastructure provisioning with Terraform, enabling scalable and repeatable deployments.

Integrated CI/CD pipelines using Jenkins, ensuring automated testing and deployment of data workflows.

Managed version control with Git, facilitating collaborative development and tracking changes in data pipelines.

Containerized and deployed data processing workloads using Docker and Amazon ECS for efficient scalability.

Designed serverless data processing solutions with AWS Lambda, reducing operational overhead and costs.

Implemented event-driven architecture using AWS SQS and AWS SNS for real-time data streaming and notifications.

Created interactive dashboards in Tableau, enabling stakeholders to visualize key data insights effectively.

Designed and optimized MySQL databases and ensured efficient query execution and data storage management.

Monitored data security and compliance using AWS CloudTrail, tracking changes and access logs.

Enhanced system reliability with AWS CloudWatch, setting up alerts and monitoring application performance.

Improved content delivery and latency with AWS CloudFront, caching data closer to end users.

Deployed and managed containerized workloads using Kubernetes, ensuring high availability and scalability.

Followed Agile and Scrum methodologies and collaborated with cross-functional teams to accelerate project delivery.

Established data lineage tracking, metadata documentation, data validation, anomaly detection, and governance strategies, ensuring data quality and compliance.

Environment: AWS S3, Redshift, Athena, Glue, Data Pipeline, EMR, IAM, KMS, Secrets Manager, Lambda, SQS, SNS, CloudTrail, CloudWatch, CloudFront, ECS, Databricks, Apache Spark, Hadoop, Hive, MySQL, DBT, Tableau, Python, Pandas, NumPy, SciPy, SQL, RESTful APIs, Terraform, Kubernetes, Docker, Jenkins, Git, Agile, Scrum.

CBRE, Dallas, TX

Data Engineer December 2018 - April 2021

Responsibilities:

Developed and deployed Azure Functions to automate event-driven processes, reducing manual workloads and improving system efficiency.

Designed and implemented big data architectures using Azure Databricks, ensuring scalable and high-performance data processing.

Optimized Hive-based data processing workflows, enabling faster query execution and improved analytics capabilities.

Built ETL pipelines using Azure Data Factory, streamlining data extraction, transformation, and loading across multiple sources.

Orchestrated workflow automation with Apache Airflow, ensuring seamless task execution and dependency management.

Developed real-time data streaming solutions with Azure Stream Analytics and improved operational visibility and decision-making.

Engineered high-performance Apache Spark jobs for distributed data processing, optimizing computational efficiency.

Designed and trained machine learning models using Python, enhancing predictive analytics and business intelligence.

Leveraged NumPy and Pandas to preprocess and analyze structured and unstructured datasets for data-driven insights.

Applied Scikit-Learn algorithms to develop AI-powered analytics solutions, enabling advanced data classification and regression tasks.

Architected cost-efficient storage solutions using Azure Blob Storage, ensuring secure and scalable data retention.

Developed and managed Azure Data Lake Storage (ADLS) environments to handle large-scale enterprise data assets.

Wrote optimized SQL queries for data retrieval and transformation in Teradata, improving query efficiency and database performance.

Designed scalable data solutions using Snowflake, ensuring fast and flexible cloud-based data warehousing.

Enforced data security best practices through Azure Key Vault, safeguarding sensitive information and access credentials.

Built large-scale Scala applications for big data processing, enhancing computational efficiency in data pipelines.

Developed cloud-based solutions in IntelliJ IDEA, improving development workflows and version control.

Monitored data pipelines with Azure Monitoring, enabling proactive performance tuning and issue resolution.

Designed and developed interactive Power BI dashboards, providing real-time business intelligence insights.

Implemented Log Analytics to track system behavior and ensure optimal data processing performance.

Followed Agile development principles and contributed to sprint planning and iterative software delivery.

Managed Git-based version control workflows within a DevOps environment and integrated RESTful APIs with OAuth 2.0 authentication for secure data exchange.

Environment: Azure Functions, Azure Databricks, ADF, Azure Stream Analytics, Azure Blob Storage, ADLS, Snowflake, Teradata, Apache Spark, Apache Hive, Apache Airflow, SQL, Scala, Python, NumPy, Pandas, Scikit-Learn, Azure Key Vault, Azure Monitoring, Log Analytics, Git, Power BI, OAuth 2.0, IntelliJ IDEA, RESTful APIs.

Amway Corp ADA, MI

Data Engineer July 2016 - November 2018

Responsibilities:

Developed scalable big data applications using Java and Apache Spark, optimizing data transformation pipelines for high-throughput processing.

Engineered data-intensive applications in Scala, ensuring efficient and fault-tolerant distributed processing.

Designed and implemented ETL workflows using Informatica, streamlining data integration across multiple enterprise systems.

Built real-time data streaming solutions using Apache Kafka, enabling low-latency event processing for mission-critical applications.

Managed distributed computing environments with Hadoop, optimizing data storage and processing for large-scale analytics.

Developed and deployed batch and stream processing applications on Google Cloud Dataflow, ensuring real-time and historical data consistency.

Integrated event-driven messaging architectures using Google Pub/Sub, enhancing inter-service communication across cloud platforms.

Monitored infrastructure health and performance metrics with Prometheus, implementing proactive alerting for system reliability.

Designed and maintained cloud-based data warehouses in BigQuery, optimizing storage and query execution for cost efficiency.

Automated system administration tasks through Linux shell scripting, improved operational workflows, and reduced manual intervention.

Optimized complex database queries and indexing strategies, improving performance, and reducing for high-volume datasets.

Implemented robust data governance frameworks and ensured compliance with industry best practices and regulatory requirements.

Developed workflow automation solutions in ServiceNow and streamlined incident response and data pipeline management.

Maintained code repositories and version control systems using Git.

Created insightful data visualizations with Matplotlib and transformed raw data into actionable business intelligence.

Managed cloud-based data solutions on Google Cloud Platform (GCP) and optimized resources for large-scale data operations.

Enforced advanced data security protocols and implemented encryption, access controls, and regulatory compliance measures.

Designed and deployed machine learning pipelines and leveraged big data frameworks for predictive analytics and AI-driven insights.

Environment: Java, Scala, Apache Spark, Informatica, Apache Kafka, Hadoop, Google Cloud Dataflow, Google Pub/Sub, Prometheus, BigQuery, Linux, Shell Scripting, PostgreSQL, ServiceNow, Git, Matplotlib, GCP.

Netenrich Technologies Pvt. Ltd., Hyd

Jr. Data Engineer March 2012 - October 2014

Responsibilities:

Wrote and optimized SQL and PL/SQL queries for data transformation, validation, and performance tuning, ensuring high data processing efficiency across multiple systems.

Designed and implemented ETL pipelines in Azure Data Factory (ADF) to automate data ingestion and processing and ensured seamless data movement within Azure Data Lake.

Developed and maintained scalable data workflow automation using Python, improving system efficiency and reducing manual intervention.

Leveraged Apache Spark to optimize distributed data processing and enabled faster computations for large-scale datasets.

Enforced data governance policies to maintain data integrity, ensuring compliance with regulatory requirements and enhancing data security protocols.

Managed source code and collaborated with teams using Git.

Built end-to-end data ingestion and transformation solutions using Informatica.

Integrated ServiceNow with enterprise data platforms, automating ITSM workflows, and improving data accessibility for cross-functional teams.

Used Apache Sqoop to enable fast and secure data migration between Hadoop and relational databases, facilitating big data processing.

Designed and optimized relational database schemas and ensured structured data storage and retrieval for efficient analytical queries.

Executed data quality assessments and accuracy checks, reducing errors and improving trust in business intelligence reports.

Followed Agile and Scrum methodologies, participating in sprint planning, backlog refinement, and iterative development for efficient project execution.

Contributed to cross-functional teams by providing expertise in performance tuning and reducing latency in ETL pipelines and database queries.

Ensured regulatory compliance through proactive data security measures, preventing unauthorized access and maintaining enterprise-level governance standards.

Environment: SQL, PL/SQL, ADF, Azure Data Lake, Python, Apache Spark, Git, Informatica, ServiceNow, Apache Sqoop, Hadoop, Relational Databases, Agile, Scrum.

Contact this candidate