Post Job Free
Sign in

Data Developer

Location:
Houston, TX
Posted:
September 16, 2019

Contact this candidate

Resume:

Big Data Developer – Hashmap, Inc Feb *st, **** – Current

Hashmap, Inc is a Big Data Consulting firm which has clients all over the US. The architecture, engineering, consulting, and application development spans a wide range of capabilities in Big Data, IIoT/IoT, and AI/ML including real time streaming, cloud and dev/ops, industrial control systems, platform services, container orchestration, data and source system integration, and data discovery & exploration, end-to-end analytics, self-service ML, and BI.

Current Project: Murphy Oil Corp, Houston,TX

Role: Big Data Developer

Ingesting data into HDFS from different data sources and formats

Building Spark-Scala JARS to create data pipelines to ingest different file formats like parquet, JSON, CSV, Text files into the HDFS

Integrating Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Hive, Sqoop) as well as automate the system specific jobs using shell scripts.

Performing sanity check to verify the ingested in HDFS is appropriate using advanced HiveQL(SerDeSers) .

Creating partitions, implementing bucketing in Hive

Have good knowledge on NoSQLdatabases like HBase, Cassandra and MongoDB

Solving performance issues in Hive by understanding of joins, group and aggregate

Providing support in production and fix the breaks including installation, configuration and successful deployment of changes across all environments.

Using DIA – Data Ingestion Accelerator for data ingestion and transformation on different clusters

Creating reports using Power BI/Spotfire, integrating transformation and visualization components for the same

Creating report ready data sources for report creation bases on user requirements.

Building complexes visualization using Cross tables, Bar Charts, Line Charts and Tree map

Customize data by adding Property Controls, Filters, Calculations, Summaries and Functions

Migrating the reports from Development to Acceptance / Production Server.

Interacting with Business Analyst, Data Architects to understand the Business needs and function accordingly.

POC for ingesting data into Snowflake and create automation processes using SnowPipes to ingest data into Snowflake.

Big Data Expertise

Apache Projects

Hive

YARN programing

HDP 1.3 - 2.x

HBase

HDFS

MapReduce

Kafka

Nifi

Spark

Flume

Sqoop

Oozie

Other Technical Expertise

Variety of Database Platform, Application Environments and Tools

Functional Skills

Java EE 5,6,7

Oracle – Hyperion, MySQL, Snowflake

Linux, Ubuntu 9/10

Power BI, Spotfire 7.6 +, MS Excel

C/C++, Scala, Shell Scripting

SQL/HQL, Java, JavaEE, C, C++, HTML5, XML, CSS3, JavaScript, JSON, Python, R

Data Mining & Warehousing, Business Intelligence & Analytics, Data Ingestion and Transformation/ETL, Business Intelligence

Spring, Hibernate

Web Services, SOAP, REST

Development: Java, Scala, C/C++, RPG, Unix Shell scripting, PL/SQL

Databases: Oracle, MySQL, SQLServer, DB2/400

APIs: REST

File Types: JSON, XML

OS: RedHat Linux 6/7, OS/400

Other Tools: Spring, Hibernate, RPG

Education

Texas A&M University- Kingsville (master’s CS) GPA: 3.7 December 2017

University of Pune (Bachelor’s Computer Engg) GPA: 3.5 June 2016

Coursework and Certifications

CSEN 5313 Compiler Design CSEN 5314 Database Systems CSEN 5323 Computer Communication Networks

CSEN 5322 Operating System CSEN 5325 Software Engineering CSEN 5333 Data Mining

Apache Hadoop,YARN, Hive Spark, HBase, Sqoop, Oozie,Kafka HDPC: Spark using Scala

Cloudera 175 Apache Spark and Hadoop Snowflake Associate -level 1

Other projects

Case Study to analyze search term Feb 2017(15 days)

Presented data driven insights to analyse the preferred search term between ‘Analytics’ and ‘Data Science’ by leveraging Google Trends data and R programming capability

Detection of Fraud Reviews (Using BI Technology) Aug 2015- Feb 2016 (7 months)

Analysed different web portals and performed data analysis over the extracted reviews from a demo e-commerce website to detect fraudulent reviews given to products sold online using Natural Language Processing techniques.

Data transformation and ingestion using Hadoop from various e-commerce web sites

Seminars and paper/journal presentations

Comprehensive study on current research – Texas A&M University

Presented a report on:

-Hadoop Eco System and its components

-How Hadoop is used in the current/real-time projects for solving Big Data issues.

-Security concerns in Hadoop Computing Environment

Case based Recommendation approach for market basket data

Presented a comprehensive seminar on the recommender systems used by e-commerce websites and various types of the data analysis techniques they use to improve the sales and marketing.

Published a paper under the guidance of Dr.S.K.Pathan. Paper Title: Detection of Fraud Reviews for Products

Name of the Conference/Journal where paper submitted: International Journal for Scientific Research Development

Submitted a paper for International Journal for Scientific Research Thesis.



Contact this candidate