Data Engineer Lead

Location:

Fairfax, VA

Posted:

June 24, 2025

Contact this candidate

Resume:

Gabriel N. Dima

Upper Marlboro, Maryland

202-***-****

************@*****.***

Citizenship: United States

Profile Summary

Lead Data Engineer AWS Cloud Big Data Ecosystems

● Accomplished Lead Data Engineer with 8+ years of experience architecting, developing, and maintaining robust, scalable data solutions on the AWS cloud. Adept at leading the design and delivery of complex data pipelines using AWS native services to support enterprise-scale analytics and decision-making. Proven ability to create reusable data frameworks, implement automated data quality checks, and manage the complete data lifecycle across ingestion, transformation, and orchestration layers.

● Extensive hands-on expertise in AWS Glue (PySpark), AWS Glue Crawler, AWS Athena, AWS Redshift, AWS S3, IAM, and AWS CloudFormation. Skilled in implementing end-to-end ETL/ELT workflows, designing serverless data lakes, and building infrastructure-as-code (IaC) templates for consistent, scalable deployments.

● Led cross-functional initiatives to modernize legacy data platforms by migrating on-prem Hadoop/Hive/Sqoop-based systems to cost-effective, cloud-native architectures on AWS. Experienced in query performance optimization using MPP engines like StarRocks and developing self-service analytics using DBeaver and Tableau.

● Champion of data quality, utilizing Great Expectations to enforce schema validation and data profiling rules across production pipelines. Adept at creating CI/CD pipelines for data engineering use cases and managing production releases using ServiceNow, ensuring high system availability and operational efficiency.

● Strong understanding of system administration, particularly in multi-platform environments (Unix/Linux/Windows). Experienced in VM provisioning, server builds, network configurations, and user management—skills leveraged for hybrid cloud deployments and ensuring seamless integration across cloud and on-prem systems.

● Passionate about mentoring teams, driving data governance best practices, and delivering high impact business insights through scalable cloud solutions. Committed to innovation, operational excellence, and aligning technical solutions with evolving business priorities. ● Successfully led data platform modernization projects, enabling real-time analytics and reducing data processing time by over 40% through optimized AWS-based architecture. ● Collaborated with product, analytics, and DevOps teams to ensure secure, compliant, and highly available data pipelines, driving faster business insights and better decision-making.

PROFESSIONAL EXPERIENCE:

InfoGravity - Virginia March 2022 – to present Lead Data Engineer AWS Cloud Client: Merck Group

● Successfully led the architecture, design, and development of complex, scalable data pipelines using AWS Glue (PySpark) and AWS Step Functions, aligning with stringent regulatory and business requirements in the pharmaceutical domain.

● Utilized advanced PySpark and SparkSQL to optimize ETL workflows handling terabytes of structured and semi-structured clinical and supply chain data.

● Implemented automated data validation and quality control frameworks using AWS Glue and Great Expectations, significantly improving trust in data assets across the organization. ● Developed unit and integration test scripts and orchestrated QA-to-prod deployments using ServiceNow, resulting in a 30% reduction in production incidents.

● Built reusable ingestion and transformation frameworks using AWS Glue and AWS Lambda, improving development speed and reducing repetitive coding across multiple projects. ● Designed and implemented efficient, reusable data lakes on Amazon S3, using best practices in partitioning, lifecycle policies, and versioning to optimize performance and cost. ● Leveraged AWS CloudFormation to automate infrastructure provisioning, ensuring consistent and auditable deployments across multiple pharma projects.

● Integrated AWS Glue Crawlers to catalog incoming data feeds and maintain metadata accuracy across changing schema definitions from clinical trial sources.

● Implemented secure and role-based access to data assets via AWS IAM and S3 bucket policies to maintain compliance with HIPAA and GxP standards.

● Troubleshot complex ETL failures and bottlenecks using CloudWatch Logs, Glue job metrics, and Spark UI, reducing mean time to resolution (MTTR) by 40%. ● Collaborated with data governance teams to define catalog taxonomy, naming conventions, and data lineage using AWS Glue Data Catalog.

● Authored technical runbooks and architecture diagrams for all ETL workflows, contributing to faster onboarding and improved system maintainability.

Projects: Global Clinical Trial Data Ingestion & Processing

• Objective: To automate the ingestion of raw clinical trial data from external partners into the enterprise data lake for analysis by medical researchers.

• Solution: Built a secure, scalable ingestion pipeline using AWS Glue, S3, and Glue Crawlers to process and catalog clinical datasets received in XML, JSON, and CSV formats.

• Outcome: Reduced manual data handling by 70%, improved data availability, and ensured full traceability for regulatory audits.

Projects: Pharmaceutical Supply Chain Data Lake Modernization

• Objective: To unify data from SAP, vendor APIs, and Excel reports into a central AWS data lake for improved inventory visibility and demand forecasting.

• Solution: Designed modular ETL pipelines using AWS Glue PySpark, AWS Lambda, and S3 partitioning, enabling near real-time visibility across supply chain nodes. • Outcome: Enabled a 25% improvement in forecast accuracy and reduced stockout incidents by automating data refresh cycles and pipeline monitoring.

National Oceanic & Atmospheric Administration, Maryland (on site)

Systems Engineer (NWS ) Aug 2019 - Jan 2022

2022

Hardening, Patching (Solaris 10 Servers using 10-Recommended patches), and upgrades

(release), on standalone servers (using single user mode), and on production servers (live

upgrade)

● Writes basic bash shell scripts to automate processes using crontab; Installs packages and

applications

● Configure Apache and MySQL on Solaris 10 for virtual and web hosting, building and

hosting websites; configure web hosting company DNS settings: install and configure

Samba for quick publishing using a third-party web page maker.

● User and security management

● Setup, configure and troubleshoot TCP/IP, DHCP, DNS; NFS, CIFS, and samba servers

in a multiplatform LAN

● Shell scripting (bash) to schedule and automate processes including full and incremental

backups using tar, cpio and ufsdump; migrate and enlarge file systems on Solaris 10.

● Managing swap configurations

● Perform multiplatform volume management using SVM, LVM, ZFS volume manager,

and VERITAS volume manager; installed and configure a LAN wide NAS (Free NAS)

used for creating LUNs and attaching to windows 2008 servers and to Solaris 10 servers

using iSCSi

● Build, configure, secure and deploy Solaris 10 system (global)

● Build, Install, configure, secure, deploy and migrate Solaris 10 local zones (spare root or

whole root zone)

● Troubleshoot Solaris 10 issues emanating from hardware and software related and

configuration issues

● Virtualization (VMware Esxi 5, 5.5 and Oracle Virtual Box)

● Setup a domain and an active directory on a Windows 2008 server; install and configure

a samba server on a Solaris 10 and mapping to the Windows 2008 server.

● Responsible for data management using native Solaris utilities for archiving,

compression backup and restore.

● Supporting system users and troubleshooting system issues. Document solutions for

future reference.

● Installing and maintaining the operating system and related software products.

● Participating in root-cause analysis of recurring issues, system backup and security setup,

Provide 24X7 support in Production Testing and Development environments.

● Installed and configured databases on Unix

InfoGravity - Virginia APR 2016 – DEC2019 Senior Data Engineer AWS Cloud Client: ESPN Sporting Domain

● Designed and developed scalable ETL pipelines using AWS Glue (PySpark) to process and transform high-volume sports data from diverse sources such as JSON, CSV, and complex text files.

● Built advanced pre-processing scripts in Glue to handle nested and semi-structured sports data formats, improving ingestion efficiency across live and historical datasets. ● Managed Redshift data models and created optimized database objects using DBeaver, supporting high-performance querying for sports analytics workloads.

● Leveraged StarRocks for high-speed MPP querying to support ESPN’s real-time analytics needs during live matches, reducing data access latency significantly.

● Migrated complex legacy SQL transformations from Redshift to StarRocks for enhanced scalability and parallel query performance.

● Developed real-time StarRocks data pipelines to process and store live game events and customer interaction data, resulting in a 30% reduction in query response time. ● Utilized AWS DataBrew for data profiling, cleansing, and standardization to improve quality of incoming data feeds from sports data providers.

● Architected and optimized Redshift schemas for subject areas like live match telemetry, player stats, and fan engagement to support analytical use cases.

● Managed data ingestion workflows and lifecycle using AWS S3, implementing partitioning and tiered storage strategies for efficiency and cost control.

● Implemented AWS IAM policies and roles to control access and ensure secure interaction across Glue, Redshift, S3, and other services.

● Automated infrastructure deployment using AWS CloudFormation, enabling repeatable and consistent provisioning of data pipeline components.

● Configured and maintained AWS Glue Crawlers to discover and catalog metadata, supporting ESPN’s data governance and schema evolution strategy.

● Coordinated production-grade data deployments and change requests through ServiceNow, aligning with release schedules and ITIL processes.

● Set up and maintained a robust QA environment for data quality assurance, validating sports data across environments.

● Implemented automated data quality checks using the Great Expectations framework to ensure trust and consistency in all pipeline outputs.

● Developed efficient PySpark-based ingestion pipelines for REST APIs, integrating real time data from internal ESPN systems and third-party sources.

● Applied PySpark transformations to cleanse and standardize raw sports data from APIs for downstream analytics and storage in Redshift and StarRocks.

● Conducted performance tuning of Glue and PySpark jobs to reduce latency in live data processing pipelines.

● Documented end-to-end technical workflows, ETL logic, and architectural diagrams to support ESPN’s engineering standards and onboarding processes.

Trini tech Consulting, Hyattsville, MD (on site) MAY 2013 – DEC 2016 Linux&Unix Systems Administration - (Department of Commerce)

• Builds, Install, configure brand new virtual and physical servers, test, deploy Solaris 10, RHEL 6, CentOS 6.4 servers to the network: OS installation and configuration – standard and advanced (net installation and jumpstart, kickstart).

• Configure Apache and MySQL on Solaris 10 for virtual and web hosting; building and hosting websites; configure web hosting company DNS settings: install and configure Samba for quick publishing using a third-party web page maker. • User and security management

• Setup, configure and troubleshoot TCP/IP, DHCP, DNS; NFS, CIFS, and samba servers in a multiplatform LAN

• Shell scripting (bash) to schedule and automate processes including full and incremental backups using tar, cpio and ufsdump; migrate and enlarge file systems on Solaris 10.

• Managing swap configurations

• Perform multiplatform volume management using SVM, LVM, ZFS volume manager, and VERITAS volume manager; installed and configure a LAN wide NAS (Free NAS) used for creating LUNs and attaching to windows 2008 servers and to Solaris 10 servers using iSCSi

• Build, configure, secure and deploy Solaris 10 system (global)

• Build, Install, configure, secure, deploy and migrated Solaris 10 local zones (spare root or whole root zone)

• Troubleshoot Solaris 10 issues emanating from hardware and software related and configuration issues

• Virtualization (VMware Esxi 5, 5.5 and Oracle Virtual Box)

• Setup a domain and an active directory on a Windows 2008 server; install and configure a samba server on a Solaris 10 and mapping to the windows 2008 server.

• Responsible for data management using native Solaris utilities for archiving, compression backup and restore.

• Supporting system users and troubleshooting system issues. Document solutions for future reference.

• Installing and maintaining the operating system and related software products.

• Participating in root-cause analysis of recurring issues, system backup and security setup, Provide 24X7 support in Production Testing and Development environments.

• Installed and configured databases on Unix/Linux platforms

Certifications:

● Project Management Certificate April 2013

Education:

● Bachelor’s in computer science and technology, July 1999 (Congo the republic democratic)

Contact this candidate