Damaris Ndungu
Email: ********@*****.*** Mobile: 469-***-****
Experience Summary
6+ Years of experience in the I.T industry, SQL database development, query performance tuning, ETL and data analysis in python and SAS.
2 years of experience with Big Data Ecosystems including Hadoop Spark, Scala, MapReduce, Hive, Impala, Sqoop, Flume, Oozie, Zookeeper and NoSQL databases
Strong experience in importing and exporting data from different databases like Oracle,,MySQL into HDFS using Sqoop.
Excellent understanding knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
Good knowledge on relational databases like Microsoft SQL, MySQL.
Strong experience in Application Development using Scala Spark, RDBMS and Linux shell scripting.
Good working knowledge of Apache spark for fast large scale in memory MapReduce.
Expertise in developing using Hive Query Language.
Capable of processing large sets of structured, semi-structured, unstructured data.
Strong analytical skills with problem solving and root cause analysis experience.
Good working knowledge of Tableau visualization tool.
Expertise in creating Packages using SQL Server Integration Services (SSIS).
Strong exposure to the Software SDLC and development Agile methodologies.
Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
Ability to work independently or as part of a team to accomplish critical business objectives and to make decisions under pressure
Technical Skillset:
Big Data Ecosystems
Hadoop, HDFS, MapReduce, HBase, Hive, Scala, Sqoop, Flume, Oozie, Spark, Zookeeper
Databases
MySQL, HBase, MS_SQL
Operating Systems
Linux, Windows
Tools
Eclipse IDE, Putty
Programming Languages
Scala, Pig Latin,SQL, NoSQL, PL/SQL, T-SQL,, HiveQL
Packages
VMWare, Oracle VM Virtual Box, SSIS(ETL), MS Office
Professional Experience:
Hopkins Logistics Dallas TX January 2016 – December 2017
Big data developer
Project involved migration of a client’s data warehouse from MySQL DB to Hadoop ecosystem due to rapid data growth to support Global Integration Deals and Integrated Managed Services strategy in the competitive market place. Work involved in data acquisition, ingestion, cleansing, and enrichment/transformation. Actively participated in managing and resolving defects by creating reports of defects in Excel spreadsheet with pivot tables. Conduct sync up calls with team on defects progress, root cause analysis and fixes.
Responsibilities:
Extracted the data from MySQL into HDFS using Sqoop (version 1.4.6).
Created and worked Sqoop jobs with incremental load to populate Hive External tables.
Developed workflows for complete end to end ETL process starting with getting data into HDFS, validating and applying business logic, storing clean data in hive external tables, exporting data from hive to RDBMS sources for reporting and escalating and data quality issues.
Developed Spark SQL to load tables into HDFS to run select queries on top.
Assisted in writing Scala Spark scripts for data cleansing. Data cleansing and data enrichment to remove duplicates, null values was also done using Pig Latin and HiveQL.
Developed Hive (version 1.1) scripts for end user/analyst requirements to perform adhoc analysis.
Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Optimized the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
Good experience in monitoring and managing the Hadoop cluster using Cloudera Manager.
Worked with UDFs as and when necessary to Hive queries.
Managed and reviewed Hadoop log files. Tested raw data and executed performance scripts.
Loaded data into Hive Tables from Hadoop Distributed File System (HDFS) to provide SQL-like access on Hadoop data.
Used Cloudera distribution for Hadoop ecosystem. Converted MapReduce jobs into Spark transformations and actions using Spark
Environment: Hadoop 2, Cloudera CDH5.1, Hadoop, Sqoop, MapReduce, Hive, Spark1.6, SparkSql, PuTTY, Oracle, MySQL.
Hopkins Logistics Dallas TX June 2014 – December 2015
SSIS/ETL DEVELOPER
Responsibilities:
Extract, Transform and Load source data into respective target tables to build the required data marts.
Designed and developed SSIS Packages to import and export data from MS Excel, SQL Server and Flat files.
Involved in daily batch loads (Full & Incremental) into Staging and ODS areas, troubleshooting process, issues and errors using SQL Server Integration Services (SSIS).
Developed and tested extraction, transformation, and load (ETL) processes.
Used various Transformations in SSIS Dataflow, Control Flow using for loop Containers etc.
Extracted data from database and spreadsheets and staged into a single place and applied business logic to load them in the database
Implemented Event Handlers and Error Handling in SSIS packages
Developed, monitored and deployed SSIS packages.
Created Complex ETL Packages using SSIS to extract data from staging tables to partitioned tables with incremental load.
Created logging for ETL load at package level and task level to log number of records processed by each package and each task in a package using SSIS.
Involved in complete Software Development Life Cycle (SDLC) process by analyzing business requirements and understanding the functional work flow of information from source systems to destination systems
Emeritus At Collins Plano TX January 2011 – December 2013
SQL Developer
Responsibilities:
Created and Implemented triggers in T-SQL to facilitate consistent data entry into the database.
Created SSIS packages, triggers, cursors, tables, and views and other SQL joins for building various applications.
Created stored procedures and triggers for data processing of huge volume of data.
Used various transformations in SSIS to load data from flat files and FTP to the SQL databases.
Designed and developed data load jobs using SSIS package and scheduled in SQL Agent.
Created SSIS packages using SSIS to validate, extract, transform and load data to data warehouse databases and data mart databases.
SSIS configuring the data flow, configuring the individual data flow elements and Monitoring the Performance of the Package.
Wrote stored procedures and User Define Scalar Functions (UDFs) to be used in SQL scripts.
Extensively used Joins and Sub-Queries to simplify complex queries involving multiple tables.
Create and Modify database, Table creation, data manipulation and report generation.
Used T-SQL Language and System Functions in querying database.
Education:
EDUCATION:
Brookhaven College – In progress
•Computer science
Colabery School of Data Analytics
Certified SQL Server I/EDW Developer
University Of Nairobi
Bachelors of Commerce - Finance