Data Engineer

Location:

Clearwater, FL

Salary:

120,000

Posted:

January 21, 2021

Contact this candidate

Resume:

Spandana M

+1-352-***-****

************@*****.***

Professional Summary:

7+ years of extensive IT experience with multinational clients working in diverse fields of software development lifecycle with experience in Big Data/Hadoop Ecosystem.

Hands-on experience on Apache Hadoop ecosystem components like MapReduce, SQOOP, Flume, Pig, Hive, HBase, Spark, Kafka, Oozie and Zookeeper.

Excellent knowledge on Hadoop Components such as HDFS, Name Node, Data Node, YARN, Resource Manager, Node Manager, and Map Reduce programming paradigm.

Experience in analyzing data using HiveQL, Pig Latin and extending Hive and Pig core functionality by using custom UDFs.

Proficient in Relational Database Management Systems (RDBMS).

Extensive working knowledge of Partitioned table, UDFs, Performance tuning, compression related properties in Hive.

Hands on using Apache Kafka for tracking data ingestion to Hadoop cluster and implementing Kafka Custom encoders for custom input format to load data into Kafka Partitions.

Experience in Spark Streaming to ingest data from multiple data sources into HDFS.

Knowledge in job work-flow scheduling and monitoring tools like Oozie and CTRL-M

Proficient in importing and exporting the data using SQOOP from HDFS to Relational Database systems and vice-versa.

Excellent knowledge in data transformations using MapReduce, HIVE and Pig scripts for different file formats.

Experience with various scripting languages like Linux/Unix shell scripts, Python.

Experience in all Phases of Software Development Life Cycle (Analysis, Design, Development, Testing and Maintenance) using Waterfall and Agile methodologies.

Experience in using Sequence files, AVRO file, Parquet file formats; Managing and reviewing Hadoop log files.

Worked on different domains like banking, insurance, with the core as Hadoop exploring on different eco systems.

Had prepared various use cases for big data application for a workshop conducted in Tata Consultancy Services.

Presented a model on big data titled AGILE DATA LAKE illustrating how the data is processed in Hadoop when followed in sprints.

Worked on the data sanitization to secure the NPI columns from being misused in the lower environments.

Used Pentaho ETL tool for analyzing the data present in HDFS and further built the required reports out of them.

Achievements:

Earned a Badge from IBM for completing the Learning Path for Hadoop Fundamentals

Acquired a certification from Big Data University for Spark Fundamentals with a grade of 92%.

Acquired a certification from university of California for Big Data course with a grade of 93.10%

Certified as Oracle PL/SQL developer associate.

Certified as Microsoft Technology Associate for successful completion of Database Administration Fundamentals with a score of 94.

Received Statement of Accomplishment for Introduction to databases, an online course offered by Stanford University.

Technical Skills:

Big Data Technologies

Spark, Hive, MapReduce, Pig, Sqoop, Flume, HBase, Kafka-Storm, Oozie, Zookeeper

Hadoop Distributions

Cloudera, Horton Works

Operating Systems

Windows, Linux, Ubuntu, Unix

Programming Languages

Python,Scala, Unix Shell scripting, Spark SQL, HiveQL, C, C++

Databases

MySql, SQL, Oracle, NoSQL,Netezza

Reporting Tools/ETL Tools

Tableau, QlikView, Informatica, Data stage, Pentaho.

Methodologies

Agile/Scrum, Waterfall, DevOps

Development Tools

Eclipse, NetBeans, Hue, IntelliJ IDEA, Microsoft Office Suite (Word, Excel, PowerPoint, Access)

Work Experience:

Raymond James, Florida Jun 2019-Dec 2020

Sr.Hadoop Developer / Big Data Engineer

Responsibilities:

Participate with team of technical staff and business managers and practitioners in the business unit to determine requirements and functionalities needed in a project.

Responsible for the design and development of scalable data solutions within big data and BI technology platforms to implement business needs.

Designed a Kafka streaming pipeline to ingest data from Cloudera cluster Edge Node to Hive for further processing of the data.

Responsible for the supervision of the preparation of designs and technical and/or functional specifications.

Involved in data extraction that includes analyzing, reviewing, modeling based on requirements using higher-level tools such as Hive and Spark.

Develop and implement API services using Scala in Spark to create complex JSON files to send Healthcare data to the downstream teams.

Implement Spark applications using Scala to perform advanced procedures like text analytics and processing, utilizing data frames and Spark SQL API with in-memory computing capabilities of Spark for faster processing of data.

Write complex Hive queries to load and process data in Hadoop File System and performance tuning.

Working on interactive shell and python scripts for scheduling various data cleansing and data loading process from HDFS to Apache Hive.

Load data into Hive Tables from Hadoop Distributed File System (HDFS) to provide SQL-like access on Hadoop data.

Create Hive tables as per requirement, internal and external tables are defined with appropriate static and dynamic partitions, intended for efficiency.

Developed scripts to use Secure File Transfer Protocol (SFTP) to send files securely from one server to another.

Worked with Sqoop jobs to import data from HDFS TO SQL server.

Environment: Hadoop, Spark, HDFS, Hive, Kafka, Sqoop, Cloudera, Python, Scala, UNIX, Shell, MySQL, SQL, Putty, Control-M, IntelliJ, SVN, Jira.

Deloitte Mar2019-Apr 2019

Hadoop Developer

Responsibilities:

Developed expertise working in Hortonworks Data Platform (HDP).

Built re-usable Hive UDF libraries for business requirements which enabled users to use these UDF’s in Hive querying.

Designed workflow by scheduling Hive processes for Log file data which is streamed into HDFS

Involved in building the runnable jars for the module framework.

Developed SQL scripts to compare all the records for every field and table at each phase of the data movement process from the original source system to the final target.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and gained experience in using Spark-Shell and Spark Streaming.

Loaded and transformed large sets of structured and unstructured data.

Created many access control lists to secure the data present in the Hadoop cluster.

Indulged in regular stand-ups meetings, status calls, Business owner meetings with stake holders, Risk management teams in an Agile environment.

Supported code/design analysis, strategy development and project planning.

Followed Scrum implementation of scaled agile methodology for entire project.

Environment: Hadoop, Spark, Spark SQL, Map Reduce, HDFS, Hive, Cloudera, IntelliJ, Scala, UNIX Shell Scripts, Java, Jira

Infosys Ltd Sep 2018 -Feb 2019

Hadoop Developer

Responsibilities:

Scripting in Unix for the automation of the sanitization process.

Analyzing the AIT data for the key capture and the validating the same with the clients.

Proposed improvements for the way the data was stored in Hadoop.

Worked on POC for validating the different file formats stored on Hadoop cluster.

Captured the different avro formats available and pulled the corresponding storage and the efficiency speeds.

Developed QlikView automated model integrating financial data from Excel and SQL to meet user reporting requirements.

Effectively used data blending. Filters,actions,hierarchies feature in tableau

Preparing use cases for big data work in technical workshops.

Validating against the Target tables loaded into the cluster and reporting in case of any discrepancies.

Fine-tuning the process of source target validation.

Raising defects in case of any found in the process.

Environment: Hortonworks, Unix Servers, Shell Scripting, Java Map Reduce, Pig, Spark, Hive, Sqoop, Flume, Oozie,Tableau, SQL Server

Tata Consultancy Services Sep 2014 -Sep 2018

Hadoop Developer/Big Data Engineer

Responsibilities:

Querying on MySQL database and synchronizing the entries to match to the framework.

Scripting in python to automate the synchronization associated.

Manipulating the data present in HDFS location.

On boarding of the data provided by the source by making the required transformations in the specified timelines.

Gathering the requirements of the customer and documenting them in their respective JIRA stories to have a hassle-free development process.

Worked on many of the triage tickets which serve as the initial document deck for the further proceedings to be made to get the connectivity with the source.

Excellent hands-on experience in working with different Hadoop file formats like Sequence File, RC File & ORC

Extending HIVE and PIG core functionalities by implementing custom UDF’s.

Worked on data onboarding by pulling the files from the mailbox,ingesting them into Hadoop manipulating them and finally publishing them using ETL tools.

Environment: MySQL, Python, Hadoop, Data Meer, Hive, Sqoop, Flume, Oozie, Kafka, Pentaho, SQL Server

Seneca Software Solutions Jul 2013 -Aug 2014

Junior Software Engineer

Responsibilities:

Experience in importing and exporting Teradata using Sqoop from HDFS to RDBMS & vice versa.

Extensive practical experience in incremental import by creating Sqoop meta store jobs.

Experience in using Apache Flume for collecting, aggregation, moving large amount of data from application server and handling variety of data using streaming and velocity of data.

Experience in Extraction, Transformation and Loading (ETL) of data from multiple sources like Flat files, XML files and Databases.

Solid experience in developing workflow using Oozie for running Map Reduce jobs and Hive Queries.

Environment: MySQL, Hadoop, Data Meer, Hive, Sqoop, SQL Server, Unix

Education:

Bachelor’s in computer science from Kakatiya Institute of Technology and Science,

Warangal, India

Contact this candidate