Big Data Solutions Architect

Location:

Long Island, NY, 11570

Salary:

190000k

Posted:

December 02, 2024

Contact this candidate

Resume:

Hugh McBride, B.Sc, M.S.

email:*******.****@*****.***

mobile: 718-***-****

Primary

Skill Set

Big Data: Hadoop, HBase, Spark/PySpark, Spark Streaming, Spark SQL, Kafka, Sqoop, Hive, Zookeeper,Oozie, Flume, Hue

JEE/JSE: Java, JMS, JMX, JAXB, Design Patterns: GOF, EIP Open Source: Scala, SBT, Python,PETL, PyTest, JUnit, TestNG, Log4j AWS: EMR, IAM, S3, EC2, VPC, IAM, Lambda, SageMaker, Glue, Athena Third Party: Matlab, Maple, Cloudera (CDH 5.x-7.x), StreamSets Database: SQL, PostGresSQL MySQL

Certification: AWS Certified Solutions Architect - Associate (SAA-C01) AWS Certified Developer - Associate (DVA-C01)

AWS Certified Machine Learning - Specialty (MLS-C01) Databricks Certified Associate Developer for Apache Spark 3.0 Cloudera Certified Developer for Apache Hadoop (CCDH) Cloudera Certified Administrator for Apache Hadoop(CCAH) Cloudera Certified Specialist in Apache HBase

Cloudera Certified Instructor for CCAH & CCDH

Sun Certified Java Programmer 1.4

Working

Knowledge

Linux, NumPy, SciPy, Matplotlib, Airflow, Git, ElasticSearch, Docker, Kubernetes Work

History

Cloud Consultant, New Math Data Jun 2024 - Present

• Delivery of IOT Data Feeds of Home Electricity Meter Readings using AWS Lambda, Athena, Glue and Apache Kafka for Major Electricity Supplier. Involved data cleansing and wrangling to provide derived views for end users.

• Acted as a Spark SME for Junior Development team to assist in resolving coding bottlenecks.

Big Data Consultant, Nordea Feb 2023 - Jul 2023

• Assisting teams with troubleshooting and performance tuning of their Spark, PySpark, Yarn, HBase and Hive applications as part of a large on prem set of cluster migrations. Advising on streamlining,simplifying and consolidating current cluster implementations to reduce maintenance workload and increase cluster availability Career Break / Bereavement - Renovations of Family home Oct 2022 - Jan 2023 Python Data Engineer, QStream Mar 2022 - Sep 2022

• Requested to come onboard and assist with a high profile and difficult data migration in preparing the company for sale. This process involved migrating three separate products on two different clouds back into their main application.Constructed a python ETL pipeline using PETL to migrate hi-value customer data to latest version of the product Big Data Consultant, LetsGetChecked Nov 2021 - Feb 2022

• Design and implementation of Spark Streaming/Kafka ETL pipelines using AWS MSK and EMR

Spark/Big Data Consultant, National Renewable Energy Lab Jun 2021 – Nov 2021

• Provide technical expertise in the implementation and tuning of a Petabyte scale COVID research project on AWS EMR with PySpark and Jupyter Notebooks

• Prototype Kafka clusters using Docker, Kubernetes, Helm and Rancher Spark/Big Data Consultant, Pacific Gas & Electric Feb 2021 – Jun 2021

• Large Scale Data Analysis of 30 year, multi-terabyte historical Weather Data using Spark to calculate historical metrics to supply the Data Science team’s Fire Risk Calculations using Palantir

• Created Geo-Spatial data formats from above data set using Spark to create Heat map images for the Data Science team in Palantir

Big Data Consultant,CleverBits/StreamSets Dec 2019 – Jan 2021

• Work as a StreamSets ETL Consultant, tasked with working on the more advanced transformations of the Data PipeLines to accelerate and ensure delivery of a project that StreamSets and their partner CleverBits were delivering for British Telecom.

• Developed Streamsets Fragments for AWS RedShift, S3 and Google Cloud Storage Azure Databricks Architect Reviewer Oct 2020 – Nov 2020 Wintellect

• Short term consulting appointment for Wintelllect’s client Hexagon to review their DataBricks Delta Lake architecture, Notebooks and development environment. The aim is to isolate performance bottle necks, provide guidance on organizing/simplifying work pipelines and other industry best practices to reduce processing run times. Onsite Big Data Solutions Architect Consultant Feb 2020 – Mar 2020 Cloudera/Lloyds Bank

• Consulting as a Big Data Solutions Architect at Lloyds, tasks included but were not limited to : assisting and mentoring junior developers to accelerate project delivery, Set up Flume pipelines, troubleshoot HBase setup and performance issues. Contract cut short by COVID-19 quarantine restrictions. Big Data/Spark Performance Contract Aug 2019 – Jan 2020 Quantexa/HSBC

• Analyze and refactor Big Data performance bottlenecks on Quantexa’s Hadoop/Spark/ElasticSearch application being implemented with HSBC, a leading investment bank. Providing guidance on best practices in Spark writing and testing code to achieve high performance.

FreeLance Big Data Consultant Feb 2019 – May 2019

Ammeon

• Tasked with providing expertise on the implementation of Spark processing applications on Google Cloud using Docker and Kubernetes. This required researching all the relevant technologies to the degree that provided a robust proof of concept, containerized Spark Application running on Google Cloud’s Kubernetes service Big Data Consultant July 2018 – Dec 2018

BGC Partners

• Brought in for special high priority/profile assignment of creating an ETL framework to make almost two years worth of trading data stored as complex json objects in Cassandra accessible for analysis as requisite skills were not available in-house. This was done using a combination of Spark JSON parsing libraries and JSON binding libraries to create the set of basic and derived data tables accessible via Hive.

• Tasked with reviewing and providing recommendations on improving existing in-house Hadoop clusters installed by a third party consulting company. Big Data Consultant March 2018 – July 2018

Awin Global

• Tasked with training a team that had lost its Big Data developers due to turnover by running workshops, assisting with development and advising on transition AWS

• Kafka Workshop covering: Producers, Consumers, Brokers, Configuration, Failure and Delivery Guarantees, Consumer Groups, Kafka Connect and Kafka Streams

• HBase Workshop covering: HBase Tables, HBase Shell, Access with API, Key and Schema design, HBase Read/Write path, Filters, HBase Utilities

• Designed and deployed standalone Kafka and Spark EMR prototype clusters on AWS Big Data Developer September 2017- March 2018

Deutsche Bank

• Designing and developing standalone, resilient ElasticSearch cluster using the ELK

(ElasticSerach/Logstash/Kibana) stack.

• Converting an existing Spark Streaming/Kafka application from Java to Scala Big Data Architect Reviewer August 2017 – August 2017 Logicalis

• Short term consulting appointment for Logicalis at leading UK Challenger Bank to vet Big Data Architecture design proposed by a global consultancy. Lead Spark Developer/ Big Data (Consultant ) November 2016 – May 2017 Annalect

• Technical Lead for a team of 3 that developed Annalect’s first Spark ETL process and associated framework for record processing on AWS, including Python BOTO launch scripts, and AirFlow operators. This became the template for all future ETL workflows

• Designed AWS cluster for the Marketing Science team. Scripted solution to launch ad-hoc auto-scaling clusters using spot instances, which bootstraps R-Studio. This allows members of the Marketing Science team to launch clusters using SparkR for investigation and analysis through a familiar R Studio environment. Since clusters are launched on as needed basis using spot instances this provides a very significant cost savings over the traditional method of an "always on" cluster solution

• Provided training on Spark using Python and Scala to internal teams across Annalect EMEA Big Data Architect / Lead Data Engineer (Consultant) Apr 2016 – Sept 2016 Cox Automotive

• As a hands on consultant acted as teams first Lead Data Engineer for a team of 4. Responsible for the design, development and Implementation of newly formed Data Analytics Group Cloudera Hadoop Cluster (CDH 5.7) on Microsoft Azure including: sizing, tuning and access control.

• Responsible for the design of ingestion strategy using StreamSets and Analysis Framework using Spark, Tableau, Jupyter with PySpark and Python

• Solved a long standing in-house pricing problem commonly known as "Simons Bane" to find the optimal price for repairs to maximize return which heretofore was intractable using existing tools. This optimization was expected to net an additional £15-20M in revenue

• Mentoring and training in-house developers on Big Data Tools and components of the Hadoop Ecosystem.

Big Data Consultant /HBase Specialist (Consultant) Nov 2015 –Mar 2016 Government

• Redesign of HBase DataBase Access layer for Immigration Application Framework

• Investigated/prototyped table schema and row key design, use of native HBase testing tools, access API design, coding and testing, performance strategies i.e. salting and investigation of third party tools such as Apache Phoenix and DataNucleus and monitoring strategies using JMX and DropWizard

Spark Developer/Big Data Architect/Consultant (Contract) Apr 2015 – Oct 2015 Credit Suisse

• Technical Advisor and SME to the Director of Semantic Technology ( Big Data ) Division, advising on Bank Wide Big Data Architecture, Adoption Strategy and Implementation this included : Server hardware selection, cluster sizing, capacity planning, data ingestion techniques and backup strategies using Cloudera.

• Designed and developed Initial proof of concept application for Market Risk using Spark on YARN, Scala and Hadoop to overcome severe bottlenecks. Was able to speed up the ETL process by a factor of 120 (12000%) over the existing SQL/C# Application. This included the ability to aggregate and coalesce sets of files into a Spark dataframe and apply data enrichment rules via UDFs and joins on imported database tables. The resultant dataframe was saved to an Apache Phoenix Datastore on the processing cluster for end user analysis using standard JDBC tools. Built out core functionality for handover to offshore team for production development and deployment

Spark Developer/Big Data Architect (Contract) Mar 2014 – Apr 2015 Intel Labs

• Designed, planned, setup and tuned Intel’s Smart Cities Hadoop Cluster using Cloudera

(CDH5) on Amazon AWS, this included best practices for cluster set up on AWS. Common maintenance procedures: High Availability, node addition and removal, application deployment, capacity planning, server and master node hardware specification

• Was the teams HBase DataBase Administrator and HBase Specialist responsible for tuning, configuration, availability, data ingestion integrity

• Introduced Apache Phoenix as Hive replacement and Flume for data and log aggregation

• Prototyped Play based web application for SparkSQL queries

• Investigated use of Spark Cluster as a parallel computation engine for real-time Singular Value Decomposition of streaming sets of matrices using Kafka and Spark Streaming

• Led the design and development of the Data Analytics Lambda Architecture using Spark, Kafka, Spark Streaming, Spark SQL, Spark MLib, Scala and OpenTSDB Big Data/Hadoop Developer (Part Time / Remote) Oct 2013 – Mar 2014 Trustev

• At night set up the company's first Hadoop Cluster on Centos Linux using Hortonworks HDP 1.3 on Windows Azure Cloud.

• Set up ETL pipeline for over 100 tables from SQL Server to HDFS and Hive using Sqoop Lead Java Developer (Contract) Aug 2013 – Feb 2014 Citi

• Payment Services Performance Analysis,Design and Development Java Consultant (Contract) Apr 2013 – July 2013

Morgan Stanley

• Investigation of Trade Capture Performance using Python to simulate High Frequency Trading

Java Consultant (Contract) Sept 2012 – Feb 2013

Allied Irish Bank

• Java, Spring MVC, JSP, XML: Design and development of SME Internet Banking Portal Java Consultant (Contract) Nov 2011 – Aug 2012

Enterprise Integration Tools Developer, Bord Gais Energy

• Developed and deployed Grails web app to allow CRUD operations for verification of a catalog of 170 Business Process XML messages and accompanying Selenium Test Suite Senior Java Developer (Contract), BNP Paribas Sept 2010 – Sept 2011

● Designed, developed and deployed Banks new ETL application which parses Ion Trade Records, to generate a database record for auditing and an XML payload for transmission via MQ to locale specific Risk Management Application. Extensive use of annotations for database persistence via Spring DAO, XML mapping for payload generation and JMX monitoring.

● Employed Aggregator, Message Router and Enricher Enterprise Integration Patterns

● Parametric XML Unit Test Framework used extensively in testing to replay trades Java Consultant (Contract), SITA Apr 2010 - July 2010

● JMS Client to C# TCP Bridge with accompanying Policy Server as part of SITAs new Airport Flight Display. Prototyped JMX Flight Display Heartbeat and Log Monitor. Senior Java Developer (Contract),Citi Jun 2009 - Mar 2010

● Responsible for developing STP Rules, trade flows and Swing based Trader screens for Bloomberg Curve Spread and Outright Swaps and associated JUnit Test Harness J2EE Consultant (Contract), Perot Systems Feb 2009 - Apr 2009 Performance of a J2EE based e-learning application for one of the world’s largest academic publishers.

Education PhD Studies (No Degree Awarded) Applied Mathematics Rensselaer, Troy, NY

Concentration: Numerical Partial Differential Equations (Finite Element Methods) NSF Funded Research: Application of Finite Element Method to Micro Mechanical Devices Notable Class Work: Investigation of the effect of pre-conditioners in accelerating conjugate gradient solution techniques: All coding done in Matlab Finite Element Analysis Programming Project using C++ Standard Template Library (STL) Using the STL allowed for efficient implementation of the Reverse Cuthill-McKee algorithm. Master of Science, Applied Mathematics 1992

US Naval Postgraduate School, Monterey, CA

Concentration: Numerical Partial Differential Equations (Finite Difference Methods) Thesis: Wave Propagation in Elastic Solids: This study involved modeling wave propagation through a 2-D coupled fluid-solid domain using Finite Difference method and Matlab Bachelor of Science, Applied Mathematical Science 1984 University College Galway

Concentration: Numerical Analysis & Statistics

Military Commissioned Officer Lieutenant (O-3) United States Navy Experience Naval Aviator. Aircraft flown: T-34C, TH-57, SH-2F 900 hrs Standard Instrument Rating

Contact this candidate