Post Job Free

Resume

Sign in

Big Data Engineer

Location:
Houston, TX
Posted:
January 02, 2024

Contact this candidate

Resume:

Raul Salas

Email – ad2dw7@r.postjobfree.com

Phone – 832-***-****

NoSQL Big Data Engineer DataBricks/Mongodb/Neo4jAdmin

Snowflake/NoSQL Data Architect

Professional Organizations

Founder of Houston AWS cloud user group

Member of SQL Pass Houston Chapter

Leader of Houston MongoDB User Group

Certifications

Databricks certifications – Generative AI fundamentals, LakeHouse fundamentals

Experience

Senior Database Architect (Total Experience – 11 years)

Develop database solutions by designing proposed system; defining database physical structure and functional capabilities, security, back-up, and recovery specifications. Experience includes both relational and open source database platforms such as RDS DocumentDB, Snowflake, Databricks, Kafka, Mongodb,Hadoop, Spark, and Redis. Maintain database performance by identifying and resolving production and application development problems; calculating optimum values for parameters; evaluating, integrating, and installing new releases; completing maintenance; answering user questions. Implemented GDPR solutions on global cloud environment.

Senior Performance Tuning Consultant (Total Experience – 10 years)

Responsible for performance tuning and optimization of SQL Server, Mysql, and open source databases such as Cassandra, Hadoop, Mongodb running on Windows and Linux 6.6 platforms. Provide consulting on performance tuning and troubleshooting, application workload testing, integration, data migration and general problem solving. Architect and implement backup and disaster recovery including both node to node and master slave replication.

BIG DATA & Business Intelligence Architect & Administrator (Total Experience - 5 years)

Business Intelligence (BI)/Online Analytical Processing (OLAP); interactively analyze multidimensional data roll-up, drill-down, data slicing; Cluster Analysis; Data Mining; Predictive Modeling; NoSQL

Database; Big Data Strategies - Performance Management (non-transactional, social data), Data

Exploration, Social Analytics (non-transactional, social data), Decision Science);MongoDB DBA;

MongoDB clusters, Mongodb Oplog, Mongodb sharding, Mongodb partitioning, Mongodb indexes,

Mongodb performance tuning. Mongodb 2.6 Mongodb 3.0 - 3.4 Implemented SQL Server

SSRS,SSAS,SSIS as a Business Intelligence (BIDW) suite implementation for data warehouse using MS Visual Studio. Implemented SSIS for ETL data extraction and conversion for a model conversion.

Created SSRS reports, migrated SSRS farm/reports to new data center servers. Implemented Cassandra

2.1.4 and managed with Opscenter. Drained nodes to clear up disk space, Segmenting out network traffic between nodes and external traffic to optimize connectivity. Cassandra multi data center configurationSpark 2.1.0 stream data ingestion.

Overall Technical Expertise (Total Experience - 24years)

Industry Experience Includes - Healthcare, Oil & Gas, Technology, Banking & Financial Services, Pharmaceutical, Utility, Technology Startup, Software.

TECHNICAL SKILLS & QUALIFICATION SUMMARY:

Mongodb, MySQL, Cassandra, Neo4J, Kafka, and Cloudera Hadoop

SQL Server SSRS, SSAS & SSIS, Database Clustering (Log Shipping, Data Mirroring, Replication), SAN Storage Technologies.

Database Design and Administration - SQL Server v6, v7 2000, 2005, 2008, 2008r2, 2012; MySQL, MongoDB 3.0-3.4, Access, Datastax Cassandra and Apache Open source Cassandra

Database Design, Data Maintenance, Database Security, Database Management, Requirements Analysis, Teamwork, Technical Zeal, Project Management, Presenting Technical Information, Training, Operating Systems

Big Data Technology – AWS/Azure Databricks, Cosmos, MongoDB; MongoDB sharding, MongoDB clusters, MongoDB

replication, Relational, hierarchical, Graphdb, ldap and Kerberos, distributed data file systems, notable scan option, Cassandra, Hadoop, Impala in a Red Hat Linux host based environment. Cassandra data modeling, Cassandra CQLSH. Extensive experience with Neo4j graph database wcluster operations and development. Extensive experience with Docker and Kubernetes in a dev ops and agile development environment. Experience with automated code deployment tools such as Jira, Bamboo, and Atlassian.

PROFESSIONAL EXPERIENCE Independent Big Data consultant Database Cloud Architect

February 2015 to Present

Side project (personal) created crime prediction application using facebook prophet algorithm to predict crime on the location level with Jupyter notebooks. Used databricks as the data platform and dashboard for the presentation layer.

Health Care November 2015 – presenT

Clients Centene and Cigna

Solution Architect

Led the migration of MongoDB Atlas databases to Google Cloud/AWS, ensuring a secure, efficient, and error-free transition.

Collaborated with cross-functional teams to design and implement cloud migration strategies that aligned with business objectives.

Managed data integrity and consistency throughout the migration process, utilizing best practices in data backup and recovery.

Optimized cloud resource utilization post-migration, leading to a 30% reduction in operational costs.

Ensured compliance with data security and privacy regulations during the migration process. Implemented PrivateLink/peering for health care data. Implemented Ping federated for control plane access.

Provided training and support to team members on Google Cloud Platform /AWS services and MongoDB management.

Architected various data platforms in Databricks, Azure Cosmos/CognitizCognitive Search, Elastic search,, Mongodb, Neo4j, Singelstore, and various Python pipelines using Github Ci/Cd.

These systems were very high scale data platforms that integrate multiple database technologies and integrated NLP into Databricks MLFlow and python/pyspark pipelines that ingested 70TB of clinical records for Cancer research. Implemented search capability using Azure Cosmos/ADF/Cogniztive search. Converted Jupyter notebooks to interactive Databricks Notebooks.

Public Cloud environments – AWS and Microsoft Azure, Google cloud platform, cloud security architecture, streaming technologies, Kafka, artificial intelligence, S3, Azure Data Factory, Terraform, azure pipelines, Database architect for Mongodb,DataBricks,Neo4j, mentorship of junior staff, business requirements, data architecture, test plans, metadata, jira, scrum, Azure Data Factory, Azure Data Warehouse, Azure Data Warehouse, ETL, Scala, Power BI, tableau, azure sql, MS-SQL Server, data warehousing,Github,DevOps, data quality,product management with Scrum team, technical architecture, provide guidance, ML Flow machine learning artificial intelligence creation and production endpoint workflow. Data Engineering for data lakes, Unity Catalog.

Data Management, Data Modeling, data analysis, data profiling,splunk, ETL, DBT, data analytics, MLOPS with ML-Flow, Rest API, github repo integration with databricks notebooks, Scalability issue resolution, enterprise data management, data warehousing, distributed computing, integration testing, Data solutions, databricks debugging

, project design from initial implementation.dbt, optimizing dtabricks streaming jobs, cost management in large databricks clusters.TSQL development and troubleshooting. Provide technical guidance to team on databricks. Attantion to detail of technical and business requirements related to datawarehouse and database implementations, team coaching,mentoring,system stability from an SRE perspective.

Enterprise systems in fortune 100 companies implementations, streaming pipe lines,CDC., gathering customer requirements and problem solving

Databricks and Azure Databricks

Implemented CI/CD pipelines with Github actions for a Pyspark/Python data pipeline from blob storage to Databricks. Integrated Python codebase into Databricks DAG workflows

Created and maintained Databricks workflows. Converted Python classes to Databricks notebook infrastructure.

Security, Architecture, Governance - implemented unity catalog as a base for a data mesh.

Data as a product - Transformations, continuous ingestions, and unity catalog

Self Service - Used Python, Notebooks, Dashboards,workflows

Federated computational governance - Hive data store plugins to unity catalog - users are able to query databricks assets

Snowflake to Databricks migration – Created Migration methodology. Created tangible deliverables (mappings, architecture diagrams, and assessments) Reuse working code and refactor. Leverage best practices. Implemented Copy into to convert Snowflake data to Parquet file format for import with autoloader from cloud storage (S3). Implemented data governance with Unity Catalog. Analysis and Inventory of all snowflake etl, batch, and snowflake objects. Analyzed exzisting ETL to see if streaming with Kafka could be implemented. Analysis and understanding of existing Snowflake workloads and match to Databricks cluster infrastructure. Analyzed data pipelines to implement with Databricks workflows. Analyzed existing code for migration and validation. Ensure connectivity for all BI and Analytics users. Ensure onboarding of Data scince and ML teams. Worked with scrum teams and created and updated jira stories for epis and features.

Implemented automation bundle of Terrafrom/Databricks sdk - Python/cli on Databricks environments.

Implemented python sdk as a Databricks notebook library. SDK used to find long running processes, identify high cost Databricks assets, and lineage. Implemented GIT repos with Databricks Notebooks

Generative AI – implemented chatgpt calls within a Databricks SQL function to integrate external data with associated Databricks SQL queries.

Acquired experience with Data bricks ingestion, Unity data security model, Workspace User Interface, Managing/creating Notebooks, DeltaLake for Spark SQL, managing databricks clusters and Dashboard Visualizations in the Databricks environment, In addition, managing the databricks machine learning lifecycle. Implemented Databricks in a AWS S3 environment and managed a Delta-Lake implementation that replaced Snowflake. In depth knowledge of Unity Catalog administration. Collected user requirements and set up data storage, notebooks, and compute to fulfill business use case. Served as a point of contact for technical issues in the environment. Orchestrated the installation and configuration of new workspaces on Databricks. Provisioning new Databricks environments for business and new features form Databricks – Unity Catalog, SQL endpoints for business users. Worked with teams to address queries. User management and administration on Databricks.

Prepped and executed experiments with AutoML and anaylzed results with forecasting model. Promoted to production.

Expertise in data streaming to Lakehouse and optimizing batch and streaming aggregations.

Proven proficiency in implementing AWS DMS and Delta Live tables for efficient change data capture. Skilled in ingesting and analyzing diverse datasets, including Houston police crime data, for Databricks dashboard analytics.

Adept at orchestrating RTL workloads through Databricks workflows and enhancing security with SSO for the Databricks control plane.

Leveraging Databricks SQL, I have built queries to power dynamic dashboards. Additionally, I possess a strong understanding of Databricks cost management best practices, utilizing the Databricks metastore to optimize DBUs, cloud compute, storage, and custom solutions such as private endpoints and firewalls.

Driven by a passion for data-driven solutions, I am committed to driving efficiency and innovation in data processing and analytics workflows. Problem solve issues with Databricks at a very low technical level.

SingleStore 8.x

SingleStore/MemSQL. Created monitoring for singlestore with Grafana and SingleStore Studio management console. Responsible for performance tuning and backups. Created pipelines to ingest data. t-sql,building automation, configuration management, first call resolution, Jenkins, query tuning, scripting skills, handle multiple tasks and multi-task

Snowflake

Performed Data profiling of existing warehouse star schema of a slow changing dimension model and analyzed Snowflake warehouse for possible new data model to accommodate new data pipelines. Created source to target mapping Analyzed stored procedure and Azure data lake pipelines. Implemented DBT vault - implementation of dbt for data modeling automation CI/CD pipelines and data lake modeling validation. Migrated existing Snowflake data warehouse to Data Bricks in parallel for eventual migration

Discovered incorrect data in the data warehouse and made stored procedure changes to remediate. Made recommendations on future direction of data warehouse to follow Snowflake best practices. Reviewed all stored procedures to ensure that all data elements were being captured, Performed gap analysis of legacy data warehouse to snowflake to ensure all data is accurate. Re-created legacy reports in Snowflake, Worked on cloning production data to ensure SDLC is being followed( DEV/TEST/PROD) data is all up to date and in-synch

Neo4J

Administration of Neo4j cluster version 4.4 for data science project. Upgrading cluster, maintaining cluster availability, Ensuring python Parquet file pipelines using Arrow are operational. Implemented ldap in a causal cluster 6 node configuration. With 3 read replicas and 3 core nodes. Managed cluster configuration via configuration file. Reviewed logs for security and performance issues. Created database and assigned AD groups to various roles within the Neo4j security roles. Setup monitoring in Graphana and Dynatrace to improve uptime and availability. Ensured backup and restore strategy (full and differential). Troubleshooted memory issues regarding the neo4j cmem configs as well as JVM.

Semiconductor Devices Company– April 2017 to October 2017

Architected and implemented MongoDB 3.4 replicaset HA solution for Public facing website. Sitecore website personalization profiles.

Set up Chef and Ansible playbooks for server setup, server updates, server backups, background jobs that are needed to run on each server, and code deployments to prod servers on projects that required a scheduled prod deployment rather than automated deployments. Implemented scalable micro-services solutions using Node.js, travis, Docker Cloud, Kubernetes / Helm, Mongo, Redis and SQS to implement tasking between services.

Health Care Company - March 2015 to March 2017

Responsible for administration and architecture of corporate level Mongodb and Neo4j environments.

Worked on Neo4J pilot project. 30 Mongodb Replicasets were centrally managed and monitored via Mongodb MMS. Experience in Mongodb versions 2.4 – 4.x to include upgrades. Responsible for administration, maintenance, Performance analysis, and Capacity planning for

Mongodb/Cassandra/Hadoop corporate data lake environment. Underlying operating system consists of Red Hat Linux 6.x, 7.x based environment. Full lifecycle methodology also was implemented - Dev, Test/Release, Stage, and Production.

Architected Mongo environments for various use cases that ranged from cache to full blown data Shard repositories in both operational and data warehouse capacity.

Implemented Mongodb security that included LUKS (Linux Unified Key Store) disk based encryption, SSL, Kerberos, and LDAP to secure Mongodb environments.

Implemented Cassandra 2.1.4 and DataStax Enterprise 4.8 for proof of concept and technology acquisition for Cigna. This project was to support a managed global cloud environment which included creating runbooks, architecture, and implementation of new technology in a corporate environment This Cassandra environment was a 6 node multi data center ring. Created Ansible scripts that auto provisioned Cassandra nodes. Administered Cassandra database and Performed data modeling with CQLSH. Implemented Ldap and Kerberos security solutions in a multi- data center Bi directional replication was important in addressing the multi-Datacenter environment.

Independent Consultant – DW/Big Data Consultant @ Harris County Public Health Houston, Texas May 2014 – January 2015

Architected and administered a Cassandra Enterprise 4.7 Cluster to store health data and compare performance against Mongodb.

Implemented Use Case for bringing data together from various systems for a consolidated view of health data. Created prototype and presented justification to senior management for use of MongoDB. Created document based schema for patient data. Created aggregation queries for reporting and analysis. Created replication and Mongodb Sharding clustering. Analyzed data to determine appropriate Shard key. Analyzed performance activity to determine if data required balancing. Backed up databases from replication secondary server. Implemented indexing. Implemented MMS for Mongodb monitoring. Maintained and expanded existing SQL Server 2008/2012 production and data warehouse and Analysis services. Performed data model and data workflow analysis of existing database environment. Created unified data model from disparate database applications. Developed Cost/ Value/ Priority Matrix for following business areas: Inspections, Infectious disease, and Mosquito Control. Developed 1 year road map for delivering high value reports Operations dashboards Designed and Developed Master Data Services for referential data across all business units. Managed project plan for business intelligence delivery. Developed Process flows, and functional documentations to capture all systems and users involved for the integration of disparate custom applications.

Calpine Corp Remote and Onsite Houston, Texas June 2007 – April 2014

Manage 50+ SQL Server database server migration to new data center vendor on both physical and virtual environments. Performed detailed database level planning and project creation for migration of 200+ databases and 15TB of data storage to include HA clustering, SAN, and Solid State storage. Obtained Client buy in for Cloud based application prototype. Priced, planned, and implemented prototype.

SDLM – coordinated and interacted with App teams and Change Control team on production changes in support of corporate and ETRM trading systems.

Production support of 60+ SQL Server databases to include High availability clustering and Disaster Recovery.

Manage workflow for offshore Physical DBA team.

Support Trade floor application architecture that includes SSRS, SSIS, and Analysis Services Coordinate and implement change control tasks in support of a Sarbanes-Oxley environment.

Implemented SQL 2012 Always On technology

Implemented Quest Performance Analysis for in-depth SQL server analysis and performance tuning. Implemented MongoDB datawarehouse for energy trading data. Determined Use Case would bring various data from different applications and create a single view of the data for quick access. Created data model and imported data using MongodbImport. Granted users authorization and authentication. Backed up databases using Mongodb backup facility.

Coordinate with application teams and infrastructure groups to resolve performance bottlenecks. Determine architecture for new applications as well as upgrades.

Education

BBA, Business - Accounting

University of Texas – San Antonio



Contact this candidate