Big Data Developer – Hashmap, Inc Feb *st, **** – Current
Hashmap, Inc is a Big Data Consulting firm which has clients all over the US. The architecture, engineering, consulting, and application development spans a wide range of capabilities in Big Data, IIoT/IoT, and AI/ML including real time streaming, cloud and dev/ops, industrial control systems, platform services, container orchestration, data and source system integration, and data discovery & exploration, end-to-end analytics, self-service ML, and BI.
Current Project: Murphy Oil Corp, Houston,TX
Role: Big Data Developer
Ingesting data into HDFS from different data sources and formats
Building Spark-Scala JARS to create data pipelines to ingest different file formats like parquet, JSON, CSV, Text files into the HDFS
Integrating Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Hive, Sqoop) as well as automate the system specific jobs using shell scripts.
Performing sanity check to verify the ingested in HDFS is appropriate using advanced HiveQL(SerDeSers) .
Creating partitions, implementing bucketing in Hive
Have good knowledge on NoSQLdatabases like HBase, Cassandra and MongoDB
Solving performance issues in Hive by understanding of joins, group and aggregate
Providing support in production and fix the breaks including installation, configuration and successful deployment of changes across all environments.
Using DIA – Data Ingestion Accelerator for data ingestion and transformation on different clusters
Creating reports using Power BI/Spotfire, integrating transformation and visualization components for the same
Creating report ready data sources for report creation bases on user requirements.
Building complexes visualization using Cross tables, Bar Charts, Line Charts and Tree map
Customize data by adding Property Controls, Filters, Calculations, Summaries and Functions
Migrating the reports from Development to Acceptance / Production Server.
Interacting with Business Analyst, Data Architects to understand the Business needs and function accordingly.
POC for ingesting data into Snowflake and create automation processes using SnowPipes to ingest data into Snowflake.
Big Data Expertise
Apache Projects
Hive
YARN programing
HDP 1.3 - 2.x
HBase
HDFS
MapReduce
Kafka
Nifi
Spark
Flume
Sqoop
Oozie
Other Technical Expertise
Variety of Database Platform, Application Environments and Tools
Functional Skills
Java EE 5,6,7
Oracle – Hyperion, MySQL, Snowflake
Linux, Ubuntu 9/10
Power BI, Spotfire 7.6 +, MS Excel
C/C++, Scala, Shell Scripting
SQL/HQL, Java, JavaEE, C, C++, HTML5, XML, CSS3, JavaScript, JSON, Python, R
Data Mining & Warehousing, Business Intelligence & Analytics, Data Ingestion and Transformation/ETL, Business Intelligence
Spring, Hibernate
Web Services, SOAP, REST
Development: Java, Scala, C/C++, RPG, Unix Shell scripting, PL/SQL
Databases: Oracle, MySQL, SQLServer, DB2/400
APIs: REST
File Types: JSON, XML
OS: RedHat Linux 6/7, OS/400
Other Tools: Spring, Hibernate, RPG
Education
Texas A&M University- Kingsville (master’s CS) GPA: 3.7 December 2017
University of Pune (Bachelor’s Computer Engg) GPA: 3.5 June 2016
Coursework and Certifications
CSEN 5313 Compiler Design CSEN 5314 Database Systems CSEN 5323 Computer Communication Networks
CSEN 5322 Operating System CSEN 5325 Software Engineering CSEN 5333 Data Mining
Apache Hadoop,YARN, Hive Spark, HBase, Sqoop, Oozie,Kafka HDPC: Spark using Scala
Cloudera 175 Apache Spark and Hadoop Snowflake Associate -level 1
Other projects
Case Study to analyze search term Feb 2017(15 days)
Presented data driven insights to analyse the preferred search term between ‘Analytics’ and ‘Data Science’ by leveraging Google Trends data and R programming capability
Detection of Fraud Reviews (Using BI Technology) Aug 2015- Feb 2016 (7 months)
Analysed different web portals and performed data analysis over the extracted reviews from a demo e-commerce website to detect fraudulent reviews given to products sold online using Natural Language Processing techniques.
Data transformation and ingestion using Hadoop from various e-commerce web sites
Seminars and paper/journal presentations
Comprehensive study on current research – Texas A&M University
Presented a report on:
-Hadoop Eco System and its components
-How Hadoop is used in the current/real-time projects for solving Big Data issues.
-Security concerns in Hadoop Computing Environment
Case based Recommendation approach for market basket data
Presented a comprehensive seminar on the recommender systems used by e-commerce websites and various types of the data analysis techniques they use to improve the sales and marketing.
Published a paper under the guidance of Dr.S.K.Pathan. Paper Title: Detection of Fraud Reviews for Products
Name of the Conference/Journal where paper submitted: International Journal for Scientific Research Development
Submitted a paper for International Journal for Scientific Research Thesis.