Senior Big Data Engineer

Location:

Atlanta, GA, 30339

Posted:

October 18, 2023

Contact this candidate

Resume:

DEMETRIC THOMPSON

Mobile: (***) *** - ****

Email: *****************@*****.***

PROFILE SUMMARY

•Result-oriented Data Engineer with 10 years of experience in BigData Cloud and related technology.

•Extensive experience in leveraging GCP services for data engineering, including BigQuery, Dataflow, Dataprep, and Pub/Sub.

•Proficient in setting up and managing GCP data pipelines using tools like Cloud Composer and Cloud Dataflow.

•Demonstrated expertise in building and deploying applications on Google Kubernetes Engine (GKE).

•Skilled in implementing security and compliance measures on GCP, ensuring data privacy and protection.

•Successful track record of optimizing cost and resource utilization on GCP cloud infrastructure;

•Proficient in AWS services for data processing, including Amazon EMR, Redshift, and Glue.

•Experienced in architecting scalable and cost-effective solutions on AWS cloud platform; Skillful in configuring and managing AWS Lambda functions for serverless computing.

•Skilled in setting up AWS Kinesis streams to collect and analyze real-time data, enabling data-driven decision-making and improving system responsiveness.

•Experienced in leveraging AWS DynamoDB for building highly scalable and low-latency NoSQL databases, ensuring efficient data storage and retrieval in dynamic applications.

•Profound expertise in managing and optimizing Amazon Redshift data warehouses, empowering organizations to make data-driven decisions by providing high-performance analytics and data warehousing capabilities.

•Adept at integrating AWS services into CI/CD pipelines for seamless automation, ensuring continuous integration, continuous delivery, and continuous deployment of applications with minimal manual intervention.

•Competent in setting up and securing virtual private cloud (VPC) environments on AWS.

•Knowledgeable about Azure services for data storage and analytics, such as Azure Data Lake Storage and Azure Databricks.

•Proficient in managing virtual machines (VMs) on the Azure cloud.

•Expertise in managing on-premises data infrastructure, including data warehouses and databases.

•Familiar with AWS DevOps practices for continuous integration and deployment.

•Leveraged version control systems like Git to manage DBT projects, ensuring that changes to data models and transformations were tracked and documented.

•Skilled in performance tuning and optimization of on-premises data systems.

•Proficient in data migration strategies between on-premises and cloud environments.

•Strong troubleshooting skills in on-premises data environments.

•Proven experience in maintaining high availability and disaster recovery solutions on-premises.

•Proficient in implementing Continuous Integration and Continuous Deployment (CI/CD) pipelines using tools like Jenkins and GitLab CI/CD.

•Experienced in automated testing processes, including unit testing, integration testing, and regression testing.

•Effective in gathering and analysing `project requirements, ensuring alignment with business objectives.

•Skilled in project management methodologies, contributing to the success of Agile projects through data-driven analytics and collaboration

CERTIFICATIONS

•AWS Certified Cloud Practitioner Certification

•Google Data Analytics Professional Certificate

•Google Advanced Data Analytics Professional Certificate

•CompTIA A+ Certification (In progress)

•Google Cloud Professional

CORE COMPETENCIES

•Big Data

•Data ETL

•Hadoop Development

•ML techniques

•Predictive Modelling

•PowerBI

•R/Python/Tableau

•Data Analytics/Visualization

•Java, Python and SQL Queries

•Leadership Skills

TECHNICAL SKILLS

Category

Skills

Languages & Scripting

Spark, Python, Java, Scala, Hive, Kafka, SQL

Cloud Platforms

AWS, GCP, Azure

Python Packages

NumPy, TensorFlow, Pandas, Scikit-Learn, Matplotlib, Seaborn

Databases

Cassandra, HBase, Redshift, DynamoDB, MongoDB, SQL, MySQL, Oracle, RDBMS, Amazon RDS, DynamoDB

Data Ingestion & Streaming

AWS Kinesis, AWS IoT Core, Azure Data Lake, Google Pub/Sub

Data Analysis

Data Modeling, Statistical Analysis, Sentiment Analysis, Forecasting, Predictive Modeling

Project Management

Jira, Kanban

Data Warehousing & Analytics

Redshift, Snowflake, BigQuery, Teradata, Oracle, Azure SynapsDB

Data Integration & ETL

AWS Glue, Aws EMR, DMS, Apache Nifi, Spark, Azure Data Factory, Data Bricks, Azure HdInsight, Google Data Flow, Google Data Proc,

Serverless Computing

AWS Lambda, Fargate, Batch, GCP Data Fusion, Functions, GKE, Step Function

Data Governance & Security

AWS IAM, KMS, Secrets Manager, GCP IAM, Security Command Center

CI/CD

AWS Code Pipeline, Code Build, Code Deploy, Azure DevOps, Jenkins, Terraform, CloudFormation, Docker, Kubernities

Cluster Management

Cloudera Manager, Apache Ambari

Data Visualization

PowerBI, Tableau, Aws QuickSight

Search & Indexing

Elasticsearch, Kibana

WORK EXPERIENCE

Sr. Big Data Engineer

Search Discovery, Atlanta, Georgia, Jul’21-Present

•Designed and implemented data architecture solutions closely aligned with business requirements, leveraging a wide array of Google Cloud Platform (GCP) services, including Google Cloud Dataflow, Pub/Sub, BigQuery, Bigtable, DataProc, IAM, and more, to optimize data handling.

•Utilized GCP services such as Google Cloud Dataflow, Pub/Sub, BigQuery, Bigtable, and DataProc, along with expertise in Hadoop, Spark, Hive, Sqoop, and related components, to develop and fine-tune data processing workflows tailored for GCP's cloud environment.

•Developed and maintained scalable data pipelines on GCP, ensuring efficient data processing, elasticity, and cost-effectiveness through services like Google Cloud Dataflow and DataProc.

•Orchestrated data storage strategies using Google Cloud Storage (GCS) and Bigtable, enhancing data accessibility, reliability, and cost-effectiveness while adhering to GCP's best practices.

•Constructed new API integrations within GCP, leveraging Pub/Sub for data ingestion and other GCP services, to accommodate the growing volume and complexity of data, ensuring seamless data flow.

•Designed and implemented data systems and pipelines within GCP, harnessing the scalability of GCP resources and optimizing performance through services like Dataflow, BigQuery, and Bigtable.

•Conducted intricate data analysis using GCP's data analytics services, particularly BigQuery, providing valuable insights for informed decision-making within the GCP ecosystem.

•Integrated data from diverse sources into a unified data platform hosted on GCP, designed for streamlined analysis and reporting using BigQuery and Bigtable.

•Developed ETL processes within GCP using Dataflow and DataProc to efficiently process and transform large volumes of data, ensuring data quality, accuracy, and compliance with GCP best practices.

•Continuously monitored and optimized data processing jobs and workflows hosted on GCP for improved performance, scalability, and cost-efficiency using services like Dataflow and DataProc.

•Implemented robust data security measures within GCP, including IAM, to protect sensitive information, ensuring compliance with data privacy regulations and GCP security standards.

•Established and enforced data governance practices within GCP, maintaining data quality, consistency, and integrity throughout the platform while utilizing BigQuery for data analysis and reporting.

•Collaborated closely with data scientists and analysts, providing clean, well-structured data hosted on GCP to support advanced analytics and machine learning initiatives powered by BigQuery and Dataflow.

•Leveraged GCP for data storage, processing, and analytics, optimizing cost and performance while fully harnessing the capabilities of GCP services like BigQuery, Dataflow, and Bigtable.

•Implemented real-time data processing and streaming technologies within GCP, enabled by Pub/Sub and Dataflow, to facilitate near real-time data-driven decision-making.

•Acted as a mentor to junior team members, offering technical guidance on GCP services and ensuring they stayed updated on emerging trends and best practices within the GCP ecosystem.

•Maintained comprehensive documentation of data pipelines, processes, and best practices specifically tailored for GCP, facilitating knowledge sharing and support within the GCP environment.

•Addressed complex data-related challenges and effectively troubleshooted issues during data processing and analysis, leveraging GCP's extensive set of tools and methodologies.

•Collaborated closely with stakeholders within the GCP environment to thoroughly understand data requirements and delivered solutions that precisely met their needs using GCP services.

•Stayed updated with advancements in big data technologies, tools, and methodologies, with a particular focus on GCP's evolving ecosystem.

•Managed and prioritized multiple data engineering projects within the GCP environment, ensuring their timely completion and cost-effectiveness.

•Demonstrated the ability to effectively communicate technical concepts and findings to both technical and non-technical stakeholders within the context of GCP, facilitating data-driven decision-making processes.

Sr. AWS Data Engineer

eBryIt Inc., Kennesaw, Georgia, Jan’20-Jul’21

•Designed and managed data storage solutions utilizing Amazon S3 for cost-effective and scalable storage of structured and unstructured data.

•Leveraged AWS Glue Data Catalog for metadata management and data discovery, ensuring efficient data governance.

•Implemented Amazon EBS for block-level storage, supporting reliable data storage for EC2 instances.

•Utilized Amazon EFS for scalable and shared file storage, facilitating collaboration and data access.

•Engineered data ingestion and real-time streaming solutions using Amazon Kinesis services for rapid data processing.

•Designed and implemented data ingestion pipelines from IoT devices using AWS IoT Core, ensuring secure and reliable data capture.

•Orchestrated data workflows with AWS Data Pipeline to automate data movement and transformations, optimizing data processing efficiency.

•Leveraged Amazon Redshift for high-performance data warehousing, enabling efficient data storage, querying, and analytics.

•Authored and executed queries in Amazon Athena to query data stored in Amazon S3, facilitating data profiling and ad-hoc analysis.

•Designed DBT models to support incremental data loads, reducing processing time and optimizing data pipelines.

•Created interactive and insightful dashboards with Amazon QuickSight for visualizing data analytics results.

•Utilized Amazon EMR for processing large-scale data, implementing distributed data processing with Hadoop and Spark.

•Managed and optimized relational databases using Amazon RDS, ensuring data availability and reliability.

•Implemented NoSQL solutions with Amazon DynamoDB for flexible and high-performance database services.

•Leveraged Amazon Aurora for high-availability, scalable relational database deployments, ensuring low-latency data access.

•Utilized Amazon DocumentDB for NoSQL document database solutions, supporting complex data models.

•Managed and stored time-series data efficiently with Amazon Timestream, enabling real-time data analytics.

•Designed and executed ETL processes using AWS Glue, automating data transformations and enhancing data quality.

•Utilized AWS DataSync for data migration and synchronization, ensuring seamless data transfers.

•Conducted database migrations efficiently with AWS DMS, minimizing downtime and ensuring data consistency.

•Orchestrated workflows in AWS Step Functions to automate and streamline ETL pipelines, integrating various AWS services seamlessly.

•Developed and deployed serverless functions with AWS Lambda for event-driven data processing and automation.

•Orchestrated container deployments using AWS Fargate, ensuring scalability and resource optimization.

•Managed batch processing workloads with AWS Batch, optimizing resource allocation and job scheduling.

•Implemented IAM policies for granular access control, ensuring data security and compliance.

•Utilized AWS KMS for data encryption, safeguarding sensitive information.

•Managed and rotated secrets and credentials securely with AWS Secrets Manager.

•Enhanced data security and compliance with AWS Macie for data discovery and classification.

•Implemented automated CI/CD pipelines with AWS CodePipeline, ensuring efficient code deployments and continuous integration.

•Utilized AWS CodeBuild for automated source code compilation and building.

•Automated code deployments to various compute services with AWS CodeDeploy.

•Managed source code repositories and version control using AWS CodeCommit.

•Proven track record of delivering scalable data solutions using Snowflake, ensuring data-driven insights.

•Extensive experience in cloud-based data warehousing, with Snowflake expertise driving operational efficiency.

•Adept at maximizing Snowflake's performance through efficient data modeling and query optimization.

•Strong grasp of Snowflake's role in modern data ecosystems, enabling streamlined data analytics workflows.

•Effective data management and analysis within Snowflake, contributing to data-driven decision-making.

Azure Data Engineer

Ameriprise Financial, Minneapolis, Minnesota, Oct’18-Jan’20

•Engineered a resilient data warehouse architecture on Azure Synapse Analytics and executed complex data analysis queries within the Azure environment.

•Collaborated in the development and integration of a data lake on Azure Data Lake Storage, enhancing its compatibility with various applications and development projects.

•Utilized Python to create Azure Functions, automating tasks and improving operational efficiency within the Azure cloud environment.

•Implemented Azure HDInsight for processing Big Data across Hadoop Clusters, leveraging virtual servers on Azure Virtual Machines (VMs) and Azure Blob Storage.

•Created Spark jobs that seamlessly executed in HDInsight clusters using Azure Notebooks, streamlining data processing.

•Developed Spark programs in Python to run efficiently within the HDInsight clusters, optimizing data processing capabilities.

•Successfully deployed the ELK (Elastic Search, Logstash, Kibana) stack in Azure, facilitating the collection and analysis of website logs.

•Implemented comprehensive unit tests for all code using various frameworks like PyTest, ensuring code quality and reliability.

•Architected serverless designs using Azure API Management, Azure Functions, Azure Storage (Blob), and Azure Cosmos DB, achieving optimized performance with auto-scaling capabilities.

•Designed schemas, cleaned input data, processed records, formulated queries, and generated output data using Azure Synapse Analytics, facilitating data management and analysis.

•Efficiently populated database tables through Azure Stream Analytics and Azure Synapse Analytics, enhancing data warehousing capabilities.

•Developed User Defined Functions (UDF) using Scala to automate specific business logic within the applications, improving efficiency.

•Designed Azure Data Factory pipelines to ingest, process, and store data while interacting seamlessly with different Azure services.

•Executed Hadoop/Spark jobs on Azure HDInsight using programs with data stored in Azure Blob Storage, enabling scalable data processing.

•Developed Azure Resource Manager (ARM) templates to create a custom infrastructure for pipelines, optimizing resource management.

•Implemented Azure Active Directory (Azure AD) user roles, instance profiles, and policies to authenticate users and control access, ensuring data security and compliance.

•Leveraged Azure Data Lake Storage for data lake architecture, ensuring data availability and scalability.

•Developed Azure Functions for serverless automation and task execution.

•Utilized Azure HDInsight for big data processing and analytics.

•Designed data integration workflows with Azure Data Factory, enabling data movement and transformations.

•Implemented Azure AD for identity and access management, similar to AWS IAM, to ensure secure authentication and authorization.

•Proficient in DBT (Data Build Tool) for transforming and modeling data, optimizing analytics pipelines.

•Extensive experience using DBT to streamline data transformation workflows and enhance data quality.

•Skilled in leveraging DBT to create structured data models, enabling more effective data analysis.

•Expertise in implementing DBT for version-controlled, modular data transformations, ensuring scalability.

•Strong understanding of DBT's role in modern data stack architecture, facilitating efficient data processing.

•Effective use of DBT for data transformation, contributing to data-driven decision-making and insights generation.

Sr Data Engineer

Eli Lily, Indianapolis, Jun’15-Oct’18

•Designed HiveQL and SQL queries to retrieve data from the data warehouse and create views for the final users to consume

•Provided connections to different Business Intelligence tools to the tables in the data warehouse such as Tableau and Power BI.

•Utilized DBT as a powerful data transformation framework to transform and model data in a structured and repeatable manner.

•Cleaned up the input data, specified the schema, processed the records, wrote UDFs, and generated the output data using Hive

•Designed and maintained Snowflake schemas, including star and snowflake schemas, to optimize query performance and facilitate easy data retrieval.

•Leveraged Snowflake's elastic scalability to handle growing data workloads, allowing for seamless expansion as data volumes increased.

•Developed Spark SQL script to manage various data sets and verified its performance over the jobs using the Spark UI

•Used Spark to filter and format the data before designing the sink to store the data in the Hive data warehouse.

•Created Hive tables to store different data formats of data coming from different data sources.

•Maintained the Hive metastore tables to store the metadata of the tables in the data warehouse.

•Developed data pipelines using Kafka Connect to seamlessly integrate data from different sources into Kafka topics, enabling real-time data flow.

•Automated the ingestion pipeline using bash scripts and Cron jobs to perform the ETL pipeline daily.

•Utilized Kafka Streams for real-time data transformation and enrichment, allowing for data processing directly within the Kafka ecosystem.

•Imported data from the local file system, RDBMS into HDFS and Sqoop and developed a workflow using Shell Scrips to automate the tasks of loading the data into HDFS

•Evaluated various data processing techniques available in Hadoop from various perspectives to detect aberrations in data.

•Implemented a streaming job with Apache Kafka to ingest data from Rest API.

•Utilized Gradient boosted trees and random forests to create a benchmark for potential accuracy.

Big Data Engineer

AutoNation, Fort Lauderdale, Florida, Oct’13-Jun’15

•Played a key role in extracting data from diverse on-premises databases, orchestrating Oozie workflows for daily task execution. Ensured timely and automated data operations.

•Architected a streamlined Kafka broker solution within the on-premises environment, enhancing data stream handling capabilities.

•Collected data from various sources using REST APIs, establishing secure HTTPS connections with client servers. Efficiently retrieved data through GET requests and integrated it into Kafka Producer, optimizing data retrieval processes.

•Facilitated data import from web services into on-premises HDFS and performed data transformations using Spark, optimizing data handling and analysis.

•Executed Hadoop/Spark jobs within the on-premises environment, utilizing custom programs and data stored locally, ensuring scalable data processing.

•Developed a PySpark application as an ETL job to extract data from various on-premises file system sources, apply necessary transformations, and efficiently write it to a NoSQL database (Cassandra), enhancing data processing capabilities.

•Leveraged Spark SQL to create and populate HBase data warehouses within the on-premises environment, enhancing data storage and retrieval capabilities.

•Employed Snowflake's query optimization features to fine-tune SQL queries, improving overall data retrieval and analytics performance.

•Installed and configured a Kafka cluster on-premises while actively monitoring its performance and ensuring seamless operations.

•Implemented Apache Kafka as a real-time data streaming platform to ingest, process, and distribute data across various applications and systems.

•Utilized the Pandas library and Spark in Python for comprehensive data cleansing, validation, processing, and in-depth analysis within the on-premises environment, ensuring data quality and accuracy.

•Developed Hive external tables and devised data models within Apache Hive, streamlining data organization and retrieval processes.

•Actively participated in ETL, Data Integration, and Migration efforts from AWS to GCP, ensuring seamless data operations within the GCP ecosystem.

•Created multiple Spark Streaming and batch Spark jobs using Python on GCP, contributing to efficient data processing and analytics.

•Implemented monitoring and alerting systems to track Kafka cluster health, performance, and data ingestion rates, enabling proactive issue resolution.

•Implemented advanced feature engineering procedures for the data science team, harnessing the in-memory computing capabilities of Apache Spark, all programmed in Python, to enhance data analysis and insights within the GCP environment.

•Implemented Rack Awareness within the GCP production environment, optimizing resource allocation and data redundancy.

•Collaborated closely with Spark Context, Spark-SQL, Data Frames, and Pair RDDs within the GCP environment, ensuring efficient data handling.

•Documented project requirements, including the implementation of code using Spark, Amazon DynamoDB, Redshift, and Elastic Search within GCP. Ensured clear project understanding and communication.

•Managed data imports and exports into HDFS and Hive using Sqoop within the GCP environment, optimizing data transfer processes.

PROJECTS

•Automate Crypto API Request(Python)

•Nashville Housing Data Cleaning In Sql Server

•Pull Data from Lever API and Upload it into Bigquery(Python)

•Created Staging Models in Bigquery And Documentation Models Using DBT

•Data Exploration Covid Data

Academic Details

•Bachelor of Business Administration, Computer Information Systems with a concentration in Data Analytics from J. Mack Robinson College of Business, Georgia State University, Atlanta, Georgia

•Bachelor of Business Administration, Finance from J. Mack Robinson College of Business, Georgia State University, Atlanta, Georgia

Contact this candidate