Osei Banahene
****.**********@*****.*** 571-***-**** Triangle, VA
https://www.linkedin.com/in/osei-banahene-64b25a21
SUMMARY
I am a Senior Data Engineer and Machine Learning with over 8 years of experience in designing, implementing, deploying scalable ML models and managing data solutions across AWS, GCP, and Azure. Extensive expertise in data warehouse engineering, cloud migration projects, and ETL/ELT pipelines. I am skilled in leveraging tools like GitLab, Terraform, Spark, Python, Spark ML and TensorFlow to drive ML models, big data transformation and integration across multi-cloud environments.
Technical Skills
Cloud Platforms: Google Cloud Platform (GCP), AWS, Azure
GCP Tools: BigQuery, Cloud Composer, Cloud Storage, Dataflow, Pub/Sub, Data Catalog
ML Techniques: Supervised & Unsupervised Learning, CNNs, RNNs, Transformers.
Databases/Warehousing: Teradata, BigQuery, PostgreSQL, SQL Server
Languages: Python, SQL, Spark, Spark ML, TensorFlow, sckit-learn, Bash
ETL/Orchestration: Apache Airflow, Cloud Composer, Dataflow, Informatica (basic)
Tools & Platforms: Git, Terraform, Jupyter, Docker
Data Domains: Finance, Healthcare, Vendor Data, Structured & Semi-Structured Data
WORK EXPERIENCE
Federal Reserve Bank Richmond, VA
Data Engineer Jan 2023 - Present
•Design and develop security frameworks for fine-grained access control on AWS services, leveraging AWS Lambda, S3, and DynamoDB.
•Deploy and configure infrastructure as code using Terraform and Ansible; manage CI/CD pipelines through GitLab, Jenkins, and AWS tools for data pipeline automation.
•Execute data migrations and transformations, including SQL Server to AWS PostgreSQL with SCT and CDC, and optimize big data processing with Databricks, achieving a 40% improvement in processing times.
•Develop and test ETL processes utilizing Informatica, Control-M, and SQL Server stored procedures, ensuring data integrity and efficiency in data movement and reporting tasks.
•Lead the migration of on-premises SQL Server databases to AWS cloud, ensuring seamless transition and optimal performance on AWS infrastructure.
•Architected and deployed data pipelines using AWS tools such as Glue, Lambda, and DMS to automate ETL processes and data integration.
•Managed data storage and retrieval solutions using S3, DynamoDB, and Redshift, enhancing data accessibility and processing efficiency.
•Implemented infrastructure-as-code with Terraform and managed CI/CD pipelines in GitLab to streamline deployments and improve DevOps workflows.
•Successfully migrated 500+ GB of critical data from on-premises to AWS, ensuring high data availability.
•Designed dashboards and reports using SAP BusinessObjects and Power BI to provide real-time insights to stakeholders.
•Reduced data processing times by 30% through optimization of ETL processes in AWS Glue
•Utilized Boto3 API to load historical data, used cloud watch logs to monitor and capture tag changes
•Implemented real-time fraud ML models, reducing false positive by 40%.
Bright Speed Charlotte, North Carolina
Senior Data Engineer/Spark Developer Jan 2022 - Jan 2023
•Implemented a comprehensive ETL process, utilizing Python and Apache Airflow to automate data flows and optimize the integration of diverse data sources into a centralized BigQuery data warehouse.
•Enhanced data quality and integrity by designing robust data schemas, achieving a 10% improvement in data validation and accuracy metrics.
•Implemented optimized Spark jobs that improved data processing speed by 40%.
•Processed extensive data sets in Hadoop and Spark, enabling high-performance data transformation and analytics.
•Designed efficient data pipelines that reduced operational costs by leveraging GCP storage solutions.
•Processed large-scale datasets using PySpark, facilitating both real-time streaming and efficient batch data handling, to provide actionable insights to stakeholders.
•Managed ETL processes on GCP, utilizing BigQuery, Dataflow, and Dataproc to support large-scale data analysis and reporting.
•Implemented GCP-native tools for monitoring, logging, and alerting to support data pipeline stability.
CME Group Chicago, IL
Data Engineer Jan 2020 - Jan 2022
•Facilitated data migration from AWS to GCP, focusing on maintaining data integrity and consistency across platforms.
•Engineered and automated deployment processes via Terraform and Ansible, and designed big data solutions utilizing Hadoop, MapReduce, Hive, Spark, and Databricks to meet specific business requirements.
•Integrated Power BI for data visualization, creating dynamic dashboards to support business insights.
•Developed real-time data streaming applications with Flink and Spark, ensuring minimal downtime during migration.
•Implemented CI/CD pipelines using Jenkins for streamlined code integration and deployment, while proficiently utilizing containerization tools like Docker and Kubernetes for orchestration.
•Executed a successful cross-cloud migration, transferring over 1TB of data from AWS to GCP.
•Processed and stored large data sets in S3 and BigQuery, enabling fast and reliable data retrieval.
•Optimized existing ML pipelines, reducing training time by 50%.
•Developed a predictive maintenance model using Featuretools, resulting in a 22% decrease in unplanned downtime for the company’s cloud infrastructure.
Collaborative Imaging Plano, Texas
Data Engineer Jan 2017 - Jan 2020
•Designed and executed ELT pipelines on Azure, leveraging tools like Snowflake, DBT, and Blob Storage for data transformation and storage.
•Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
•Created and managed YAML files to configure CI/CD pipelines, automating the deployment of data workflows.
•Developed data marts and configured Tableau for data visualization to deliver actionable insights.
•Enhanced Snowflake data warehouse performance by redesigning tables and views, and integrated Tableau for advanced reporting capabilities.
•Used Spark API over cloudera Hadoop YARN to perform analytics on data in HDFS.
•Developed streamlined ELT pipelines that reduced data processing time by 25%.
•Orchestrated automated workflows using Azure Logic Apps and developed ETL processes in Azure Data Factory to handle large data sets with thorough debugging and data validation.
•Leveraged Spark technologies, including Scala and PySpark, for efficient data extraction, transformation, and aggregation, coupled with performance tuning for optimized processing.
•Successfully handled troubleshooting and error resolution, improving pipeline stability and reducing downtime.
EDUCATION
Kwame Nkrumah University of Science and Technology Ghana
MBA, Logistics and Supply Chain Management Apr 2012
CERTIFICATIONS
GCP Professional Data Engineer
AWS Certified Solutions Architect – Associate
CompTIA Network+
CompTIA Security+
SKILLS
Big Data Ecosystems • Hadoop • MapReduce • HDFS • HBase • Zookeeper • Hive • Sqoop • Spark Ecosystems
•SparkSQL • Spark Streaming • Spark Machine Learning • Operating Systems • Windows XP • Windows Vista •
Windows 7 • Windows 10 • Windows Server 2003 • Windows Server 2008 • Linux • Databases • MS SQL Server •
Teradata • MySQL • Oracle • Db2 • PostgreSQL • Snowflake • ETL/Other Tools • Informatica Power Center • Cloudera
Base • SAS Enterprise Guide • Ssis • Erwin Data Modeler • ER Assistant • Querying Tools • SQL Management Studio
•SQL Developer • Querying Language • SQL • Pl/SQL • TSQL • Business Intelligence Tools • Power BI • Tableau •
SSRS • Cognos • Excel • Programming Language • SQL • Python • Shell Scripting • Versioning Tools • Git • GitHub • Jenkins