Post Job Free

Resume

Sign in

Data Processing Real Time

Location:
Austin, TX
Posted:
July 19, 2023

Contact this candidate

Resume:

HAMZA BIN MOHAMMED AL JABRI

Data Engineer

Phone: +1-469-***-****

Email: adyebl@r.postjobfree.com

PROFESSIONAL SUMMARY

Overall, 9+ years of professional IT experience, over 5 years in AWS, and 4 years in Data Warehousing.

Extensive experience deploying cloud-based applications using Amazon Web Services such as Amazon EC2, S3, RDS, IAM, Auto Scaling, CloudWatch, SNS, Athena, Route53, Glue, Kinesis, Lambda, EMR, Redshift, and DynamoDB.

Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline that can be written to Glue Catalog and can be queried from Athena.

Proven expertise in deploying major software solutions for various high-end clients meeting the business requirements such as big data Processing, Ingestion, Analytics, and Cloud Migration from On-prem to AWS Cloud using AWS EMR, S3, and DynamoDB.

Results-oriented and highly skilled professional with expertise in Snowflake, AWS, and Big Data technologies.

Experienced in utilizing AWS Glue for ETL workflows, enabling efficient data extraction, transformation, and loading.

Expertise in AWS S3 for scalable and cost-effective data storage and retrieval.

Skilled in utilizing AWS EMR for big data processing, including technologies like Hadoop, Spark, Hive, MapReduce, and PySpark.

Proficient in integrating AWS SNS and SQS for real-time event processing and messaging.

Utilized AWS CloudWatch to monitor and manage AWS resources, set alarms, and collect metrics.

Proficient in managing user access and permissions to AWS resources using IAM.

Designed and developed logical and physical data models that utilize concepts such as Star Schema, Snowflake Schema and Slowly Changing Dimensions.

Hands-on experience across Hadoop Ecosystem that includes extensive experience in Big Data technologies like HDFS, MapReduce, YARN, Apache Cassandra, HBase, Hive, Oozie, Impala, Pig, Zookeeper and Flume, Kafka, Sqoop, Spark.

Built real-time data pipelines by developing Kafka producers and Spark streaming applications for consumption.

Experienced in partitioning strategies and multi-cluster warehouses in Snowflake to ensure optimal query performance and scalability.

Skilled in designing roles and views and implementing performance tuning techniques to enhance Snowflake system performance.

Developed ETL pipelines in and out of the data warehouse using a combination of Python and Snowflakes Snow SQL Writing SQL queries against Snowflake.

Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data frame API, Spark Streaming, and Pair RDD's and worked explicitly on PySpark.

Proficient in utilizing virtual warehouses, caching, and Snow pipe for real-time data ingestion and processing in Snowflake.

Strong knowledge of Snowflake's time-travel feature for auditing and analyzing historical data.

Extensive experience in leveraging window functions, Snowflake arrays, regular expressions, and JSON parsing for advanced data analysis and manipulation.

Highly proficient in Snowflake scripting to automate ETL processes, data transformations, and data pipelines.

worked with HDFS, Sqoop, PySpark, Hive, MapReduce, and HBase for big data processing and analytics.

Proficient in developing and optimizing Spark and Spark Streaming applications for real-time data processing and analytics.

Experienced in scheduling and workflow management using IBM Tivoli, Control-M, Oozie, and m

Hands-on experience in using JSON file format semi-structured and unstructured (Audio/video) data.

Good understanding of Networking concepts including BPCs, Subnets, DNS, VPC & Gateways.

Strong database development skills in Teradata, Oracle, and SQL Server, including the development of stored procedures, triggers, and cursors.

Proficient in using Snow SQL for complex data manipulation tasks and developing efficient data pipelines.

Proficient in implementing CI/CD frameworks for data pipelines using tools like Jenkins, ensuring efficient automation and deployment.

Proficient in version control systems like Git, GitLab, and VSS for code repository management and collaboration.

TECHNICAL SKILLS

AWS Services

AWS s3, redshift, EMR, SNS, SQS, Aetna, glue, cloud watch, kinesis, route53, IAM.

Big Data Technologies

HDFS, SQOOP, PySpark, hive, MapReduce, spark, spark streaming, HBASE.

Hadoop Distribution

Cloudera, Horton Works.

Scheduling

IBM Tivoli, Control-M, oozie, Airflow.

Languages

Java, SQL, PL/SQL, Python, HiveQL, Scala.

Operating Systems

Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.

Database

Teradata, Oracle, SQL server.

Version Control

GIT, GitHub, VSS, Jenkins.

Methodology

Agile, Scrum.

IDE &Build Tools, Design

Eclipse, Visual Studio.

EDUCATION

Masters in computer and information science.

Bachelors in computer science engineering.

WORK EXPERIENCE

AWS Snowflake Data Engineer Sep 2022 – Till Now

County of Santa Clara, San Jose, CA.

Responsibilities:

Implemented and designed data ingestion and storage solutions using AWS S3, Redshift, and Glue.

Developed ETL workflows utilizing AWS Glue to extract, transform, and load data into Redshift from diverse sources.

Integrated AWS SNS and SQS to enable real-time event processing and messaging.

Optimized application and service deployment through efficient DNS configurations and routing using AWS Route53.

Utilized AWS Athena for ad-hoc data analysis and querying on S3 data.

Employed AWS CloudWatch for resource monitoring, management, and metric collection.

Designed and implemented real-time data streaming solutions using AWS Kinesis.

Implemented Snowflake stages for efficient loading of data from various sources into Snowflake tables.

Created different types of tables in Snowflake, including transient, temporary, and persistent tables.

Optimized Snowflake warehouses to achieve optimal performance and cost efficiency by selecting appropriate sizes and configurations.

Developed complex Snow SQL queries for data extraction, transformation, and loading into Snowflake.

Implemented partitioning techniques in Snowflake to enhance query performance and data retrieval.

Configured multi-cluster warehouses in Snowflake to effectively handle high-concurrency workloads.

Defined roles and access privileges in Snowflake for data security and governance.

Implemented Snowflake caching mechanisms to improve query performance and reduce data transfer costs.

Utilized Snow Pipe for real-time data ingestion into Snowflake, ensuring continuous data availability and automated loading processes.

Designed and configured data processing and ETL pipeline workflows.

Leveraged Snowflake's time travel features for historical data tracking and restoration.

Utilized regular expressions in Snowflake for pattern matching and data extraction.

Developed Snowflake scripting solutions for automating data pipelines, ETL processes, and data transformations.

Implemented Workload Management (WML) in Redshift to prioritize basic dashboard queries over longer running ad hoc queries.

Developed data processing pipelines using Hadoop technologies such as HDFS, Sqoop, Hive, MapReduce, and Spark.

Implemented Spark Streaming for real-time data processing and analytics.

Configured scheduling and job automation using IBM Tivoli, Control-M, Oozie, and Airflow.

Designed and developed database solutions using Teradata, Oracle, and SQL Server.

Contributed to setting up the CI/CD pipeline using Jenkins, Maven, Nexus, GitHub, and AWS.

Managed code repository and collaboration using Git, GitLab, and VSS.

Environment: AWS, AWS S3, redshift, EMR, SNS, SQS, Aetna, glue, cloud watch, kinesis, route53, IAM, Sqoop, MYSQL, HDFS, Apache Spark, Hive, Cloudera, Kafka, Zookeeper, Oozie, PySpark, Ambari, JIRA, IBM Tivoli, control-m, OOZIE, airflow, Teradata, Oracle, SQL

Snowflake Engineer Sep 2020 – Sep 2022

Coca-Cola, Atlanta, GA.

Responsibilities:

Developed and optimized ETL workflows using AWS Glue to efficiently extract, transform, and load data from various sources into Redshift for streamlined data processing.

Implemented Lambda functions and AWS Glue to create on-demand tables on S3 files, improving flexibility and data processing capabilities.

Configured and fine-tuned Redshift clusters to ensure high-performance data processing and optimized querying.

Integrated AWS SNS and SQS to enable real-time event processing and efficient messaging within the system.

Implemented event-based pipelines using S3 Lambda trigger events for seamless data processing.

Designed and implemented data streaming solutions using AWS Kinesis for real-time data processing and analysis.

Utilized AWS Route53 to effectively configure DNS configurations and routing, facilitating efficient deployment of applications and services.

Implemented robust IAM policies and roles to ensure secure user access and proper permissions for AWS resources.

Developed and optimized data processing pipelines using various Hadoop ecosystem technologies, including HDFS, Sqoop, Hive, MapReduce, and Spark.

Implemented Spark Streaming for real-time data processing and advanced analytics capabilities.

Demonstrated expertise in scheduling and job automation using IBM Tivoli, Control-M, Oozie, and Airflow for executing data processing and ETL pipelines.

Created different types of Snowflake tables, such as transient, temporary, and persistent, to accommodate specific data storage and processing requirements.

Implemented advanced partitioning techniques in Snowflake to significantly enhance query performance and accelerate data retrieval.

Defined robust roles and access privileges within Snowflake to enforce strict data security and governance standards.

Implemented regular expressions in Snowflake for seamless pattern matching and data extraction tasks.

Developed and implemented Snowflake scripting solutions to automate critical data pipelines, ETL processes, and data transformations.

Designed and developed database solutions using Teradata, Oracle, and SQL Server, encompassing schema design, optimization, and various database components.

Proficient in utilizing Git, GitLab, and VSS for efficient code repository management and collaborative development processes.

Implemented a real-time streaming data pipeline using Kinesis and Spark Streaming, enabling continuous data processing and analysis.

Environment: AWS, AWS S3, redshift, EMR, SNS, SQS, Athena, glue, cloud watch, kinesis, route53, IAM, Sqoop, MYSQL, HDFS, Apache Spark, Hive, Cloudera, Kafka, Zookeeper, Oozie, PySpark, Ambari, JIRA, IBM Tivoli, control-m, OOZIE, airflow, Teradata, Oracle, SQL

Big Data Developer May 2018 – Aug 2020

Quotient, Mountain View, CA.

Responsibilities:

Implemented Lambda functions with Boto3 to reduce EC2 costs by deregistering unused AMIs across all application regions.

Regularly imported data from MySQL to HDFS using Sqoop for efficient data loading.

Utilized Apache Spark and Scala to perform aggregations on large data volumes and stored results in the Hive data warehouse for further analysis.

Proficient in JIRA for effective issue and project workflow management.

Extensive experience working with Data Lakes and big data ecosystems, including Hadoop, Spark, Hortonworks, and Cloudera.

Efficiently loaded and transformed structured, semi-structured, and unstructured data sets.

Developed Hive queries to analyze data and meet specific business requirements.

Leveraged HBASE integration with Hive to build HBASE tables in the Analytics Zone.

Utilized Kafka and Spark Streaming for processing streaming data in specific use cases.

Developed data pipelines using Flume and Sqoop to ingest customer behavioral data into HDFS for analysis.

Implemented automated builds using Python and BOTO3 for quality control purposes.

Proficient in using big data analytic tools like Hive and MapReduce for Hadoop cluster analysis.

Designed and implemented data pipelines using Kafka, Spark, and Hive for efficient data ingestion, transformation, and analysis.

Utilized Oozie workflow engine for job scheduling within Hadoop.

Implemented CI/CD pipelines for seamless building and deployment of projects in the Hadoop environment.

Utilized PySpark and Spark SQL for fast and efficient data testing and processing in Spark.

Utilized Spark Streaming for batch processing of streaming data.

Wrote Hive queries and utilized Hive QL to replicate MapReduce functionalities for data analysis and processing.

Migrated data from Oracle RDBMS to Hadoop using Sqoop for enhanced data processing capabilities.

Utilized PySpark in Spark SQL for data analysis and processing tasks.

Developed custom scripts and tools using Oracle's PL/SQL language for automated data validation, cleansing, and transformation.

Leveraged Zookeeper for coordination, synchronization, and serialization within clusters.

Implemented data visualizations using Tableau.

Proficient in continuous integration using Jenkins for application development.

Utilized Git as a version control tool for efficient code repository management.

Environment: Sqoop, MYSQL, HDFS, Apache Spark Scala, Hive Hadoop, Cloudera, Kafka, MapReduce, Zookeeper, Oozie, Data Pipelines, RDBMS, Python, PySpark, Ambari, JIRA, Jenkins.

Hadoop Developer May 2016 – Apr 2018

UPS, Louisville, KY.

Responsibilities:

Utilized Spark-Scala (RRDs, Data frames, Spark SQL) and Spark-Cassandra-Connector APIs extensively for various tasks, including data migration and generating business reports.

Developed real-time sales analytics using Spark Streaming.

Created an ETL framework using Sqoop, Pig, and Hive to facilitate frequent data ingestion from the source and enable easy data consumption.

Processed HDFS data and designed external tables using Hive, along with developing scripts for data ingestion and table maintenance.

Analyzed source data, performed efficient data type modifications, and generated ad-hoc reports using Excel sheets, flat files, and CSV files with PowerBI.

Implemented performance tuning techniques in Redshift, such as Distribution Keys, Sort Keys, and Partitioning.

Extracted data from various sources into HDFS using Sqoop.

Managed data importing, transformation, and loading into HDFS using Hive and MapReduce.

Extracted data from MySQL into HDFS using Sqoop.

Implemented automation for deployments using YAML scripts for streamlined builds and releases.

Proficient in utilizing Apache Hive, Apache Pig, HBase, Apache Spark, Zookeeper, Flume, Kafka, and Sqoop.

Implemented data classification algorithms using MapReduce design patterns.

Implemented optimization techniques like combiners, partitioning, and distributed cache to enhance MapReduce job performance.

Analyzed SQL scripts and designed solutions using PySpark.

Developed ETL jobs using Spark-Scala to migrate data from Oracle to new MySQL tables.

Maintained source code using Git and GitHub repositories.

Utilized Jenkins for continuous integration purposes.

Environment: Hadoop, Hive, spark, PySpark, Sqoop, Spark SQL, Cassandra, YAML, ETL, Jenkins.

Data Warehouse Developer Feb 2014 – Apr 2016

Aetna Inc., Hartford, CT.

Responsibilities:

Implemented AWS Athena for efficient ad-hoc data analysis and querying on data stored in AWS S3.

Developed automated processes by creating jobs, SQL Mail Agent, alerts, and scheduling DTS/SSIS packages.

Managed and updated Erwin models for logical/physical data modelling of Consolidated Data Store (CDS), Actuarial Data Mart (ADM), and Reference DB to meet user requirements.

Utilized TFS for source control and tracking environment-specific script deployments.

Published current data models from Erwin to PDF format on SharePoint for easy user access.

Administered and managed databases including Consolidated Data Store, Reference Database, and Actuarial Data Mart.

Developed and maintained triggers, stored procedures, and functions using Transact-SQL (T-SQL) and ensured the integrity of physical database structures.

Deployed scripts in different environments based on Configuration Management and Playbook requirements.

Implemented effective table/index associations, optimized query performance, and conducted performance-tuning activities.

Managed defect tracking and closure using Quality Centre for streamlined issue management.

Maintained users, roles, and permissions within the SQL Server environment.

Environment: SQL Server 2008/2012 Enterprise Edition, SSRS, SSIS, T-SQL, Windows Server 2003, Performance Point Server 2007, Oracle 10g, visual Studio 2010.



Contact this candidate