Data Engineer

Location:

Coppell, TX

Salary:

120k

Posted:

February 08, 2021

Contact this candidate

Resume:

Sonia Masand

Phone:+1-347-***-**** Email:***********@*****.***

Professional Summary- 2+ Years of Professional Experience in building pipelines using AWS, EMR, Snowflake and 1+ year of professional experience in training in Big Data and Hadoop with enterprise databases and and data warehouse management and query language

Technical Skills:

Big Data: HDFS, Apache Spark, Spark SQL, Spark streaming, Hive, MongodB Languages:Python, R, Java, SQL, Shell Scripting.

Database:MySQL, Mongo DB, Cassandra, Oracle 10g/11g, Microsoft SQL Server IDE / Testing Tools: Eclipse, PyCharm

Operating System:Windows, UNIX, Linux, MacOS

Tools:SQL Developer, Snowflake, MS-Excel (VLOOKUP, Pivot Tables, VBA, Macros),Tableau,AWS, Professional Experience: Data Engineer

Project 1

Client: Capital One (Jan 2020 – May 2020)

Project: Cyber Remediation [Dealer M&A]

Responsibilities:

● Developed the Doogle Tool for detecting and masking the sensitive data in OneLake S3 buckets using Python and Spark.

● Accessed the data in S3 buckets and utilized the regular expression patterns to detect various types of sensitive data including SSNs, Credit Card Numbers, etc.

● Configured the code to fetch only the instances with violations and write those instances to a staging bucket post remediation

● Added multiple functionalities to the code including accessing the Talos tokens from LockBox, Detecting and remediating multiple elements and violations, creating log files for audit purposes.

● Fetched and encrypted the Primary key values for the sensitive data, which are written to the log files

● Created Python scripts for performing Post Remediation validations on the remediated data as a sanity check

● Replaced the remediated instances in the original OneLake bucket after post remediation Validations

● Productionized the code so that it can be run on Spark Clusters through AWS EMR

● Uploaded and maintained the code in GitHub using merge and pull requests

● Modularized the code based on Analysis and Remediation Environment: Python,AWS S3, Apache Spark, Spark-Core, Spark-SQL, Snowflake, GitHub, Tableau. Project 2

Client: Capital One (May2020– Present)

Project: DTD Data Pipelines [DealerM&A]

Position: Data Engineer

Responsibilities:

● Designed and implemented an end to end Real Time Data Pipeline to process the semi-structured by integrating the customer credit data from a SDP stream

● Processed incoming Json data from the stream and converted the data into .dat compressed format.

● Splitted the data using the delimiter and loaded the data in the S3 bucket in .parquet format

● Finally the data was loaded into a snowflake table in a readable format.

● Created the Tables in the QA and Production environment.

● Created and tested automated scheduled Jobs using AROW to load the data in the final table Environment: Python,AWS EMR, Apache Spark, Spark-SQL, GitHub, Service Now, Jenkins Pipeline Project 3

Client: Capital One (Oct 2019 – Present)

Project: M&A Monitoring

Position: Production Support L3

Responsibilities:

● Maintained Data Pipeline up-time of 99.9% by monitoring the streaming and transactional data across different data sources using Spark, S3 and Python

● Created Monthly reports which would represent the overall functioning and statistics of the incidents and major issues faced during previous month

● Assist with troubleshooting and issue resolution relating to current applications, providing assistance to the development

● Provide ongoing internal reporting of performance measures and service levels

● Champion and promote service improvements on an ongoing basis to continually improve the quality of services delivered and customer satisfaction

● Manage and coordinate hot fix and maintenance releases

● Coordination of work activities involving TI, Data Centre, DBA’s, Local and Global Technology teams

● Develop, implement and/or improve the application production support knowledge management repository(s) to ensure all are documented, process & procedures are clear and periodic reviews are conducted

● Provide support to the business during day-to-day activities and ad-hoc requests Environment: Python,AWS S3, Apache Spark, Spark-Core, Spark-SQL, Snowflake, GitHub, Tableau. Team Lead in Data Engineering ( Apr 2018 - Oct 2019 ) Per Scholas (Project with Cognizant)

● Lead and Managed 5 classes of Data Engineering as the Lead Instructor with enrollment of around 130 students and with 100+ individuals hired by employers as Data Engineers

● Continually developed course curriculum and added topics such as Nifi, Ranger, MongoDB to improve the curriculum

● Created different ideas of classroom management for collaborating students in groups to encourage team work environment

● Delivered high quality instructions and lectures in the following areas- Core Java, MySQL, Linux commands, Hadoop components, HDFS, YARN, HIVE, PIG, Sqoop, Oozie, Apache Nifi, Apache Ranger, Apache Spark, AWS RedShift, MongoDB/PyMongo, Pandas, Matplotlib

● Trained and Instructed students on different ETL projects such as Credit Card System

● Learned new technologies such as Kafka and Spark in a short time span to continually improve the training with latest information

● Regularly interacted with client for gathering the requirements for the curriculum and projects and delivered the instructions before the deadlines

● Created multiple reports using the student data to analyze and visualize student performance using tableau

Environment- Hadoop HDFS, Apache Spark, Spark-Core, Spark-SQL, Scala, JDK 1.8, Hive, Sqoop, Eclipse, MySQL, HBase, CentOS Linux and ZooKeeper, Jupyter Notebook Education

Northwest Missouri State University (NWMSU) Aug 2016- Dec2017 Master of Science in Information Systems: GPA- 3.83/4 Jhulelal Institute of Technology, Nagpur University, India Jul 2012- May 2016 Bachelor’s in Computer Science Engineering: GPA- 3.5/4 Relevant Coursework

Credit Card System Apr 2018- May 2018

This was an ETL project on Customer’s Transaction Database. This project had different functional requirements proposed by the client which involved using core Java and Hadoop components. This project required transforming the data and used various components such as Apache Hive, Sqoop, Oozie, Spark, Pig and Data Visualization. This project required automating extraction and transformation through Oozie. The purpose of the project was to automate and schedule the whole transformation and extraction process so that data is readily available for performing data analysis on the customer’s behavior Mozingo Supply Management System Sep 2017- Dec 2017 This was a real-world project that involved asking the client (Mozingo Lake Recreations, Maryville) for the functional requirements. This project was completed in 5 sprints following the SDLC life cycle of the project. This project involved creating Use Case diagrams according to the requirements of the client. The next step consisted of the converting Use Case diagrams in DFD and ERD through Microsoft Visio. The final step was designing the prototype for Mozingo through the JustInMind prototype tool. This project also had Project Management aspects which involved creating Daily Scrum and Sprint templates. This project also demanded creating Sprint Backlogs, Product Backlogs, Work breakdown structure and Gantt Chart through Microsoft Project. This project was a combination of technical aspects and management aspects. Data Visualization on Tableau Jan 2017- Apr 2017

This project involved creating appropriate bar graphs, histograms and other types of data based on the student dataset provided. This project also involved drawing conclusions based on the graphs that were created from the input dataset given.

Contact this candidate