Software engineer, BigData, hadoop, data science, hive, Java

Location:

Bengaluru, KA, 560001, India

Salary:

$1,20,000

Posted:

March 09, 2016

Contact this candidate

Resume:

RESUME

Yogesh Kumar Jha

Phone : +91-953******* Address : Bangalore, India

Email : y *****************@*****.*** Date of Birth : 29 September 1989 GitHub : h ttps://github.com/yogesh github LinkedIn : h ttp://in.linkedin.com/in/yogeshjha Educational Qualifications

Birla Institute of Technology & Science Pilani, India [June 2008 May 2012] M.Sc.(Tech.) Information Systems

CGPA : 7.44 / 10

Professional Summary & Achievements

● 3.9 years of experience working with two highly successful early stage startups in retail CRM & e commerce.

● Built core products for these startups with strong business impact, driving revenue and customer acquisition.

● Built 3 products from scratch in the field of Big Data Engineering / Business Intelligence / Analytics.

● Handled complete ownership of products (conceptualization, business requirements, prototyping, design, development, deployment, operations, support, client adoption, customer success).

● Founding member of Data platform team at my current company [B oomerang Commerce ].

● Handled cross team responsibilities and worked with Implementation Services team to get product feedback

● Served as a single point of contact for product related queries / issues / enhancements / client delivery.

● Solved complex problems resulting in IP Patent [R eport Generator System & Method] .

● Consistently recognized as extra ordinary performer with high sense of ownership and responsibility. Technical Skills

● Programming Languages : Java, Python

● Scripting Languages : Shell scripting, SQL, HiveQL, PSQL

● Databases : MySql, Redshift, HBase

● Operating System : Linux / Unix

● Tools : Eclipse, Maven, Subversion, Git, AWS, EMR

● Technologies : Hadoop, HBase, Hive, Zookeeper, Spark, Phoenix, Sqoop, Azkaban, Redshift

● Other skills : Distributed systems, Data Engineering, Data Warehousing, ETL, Reporting, Analytics Patents

● Report Generation System & Method (201********) [h ttp://patents.justia.com/patent/201********] Work Experience

Senior Software Development Engineer [ B oomerang Commerce] [ Nov 2014 present] Data Platform : Core platform in boomerang tech stack which handles entire data processing across all clients.

● Founding member of data platform team, built initial P OC / prototype.

● Built the product from scratch, took it to production and then adoption for all clients

● Horizontally scalable platform built using latest technologies H adoop, Hive, Sqoop, Redshift, EMR, Azkaban

● Platform Handles 1 0+ TB of data processing daily

● Completely config driven ETL creation and data validation

● 8 hours of mysql based ETL time brought down to less than 1 hour

● Supports out of the box data validations & outlier detection over historical data sets Software Development Engineer [C apillary Technologies] [June 2012 Oct 2014] Big Data Warehouse : Central OLAP data warehouse built on top of HBase to power retail analytics and reporting.

● Initial member of the team, built the prototype and then the product from scratch

● Built using latest bigData technologies H adoop, H Base, Phoenix, Sqoop, Hive

● Re wrote hive hbase handler project to fit our use case. Introduced Apache Phoenix serialization in H BaseSerDe for better range scan performance. Customized predicate pushdowns, enabled composite row key support, wrote custom input splitting strategy to enable parallel scans within a hbase table region, etc.

● Designed Schema for O LAP cubes in Hbase to store rolled up customer and product information.

● Handled core components of the warehouse like processing framework, MR / HBase performance optimizations, hive query generator, hive plugins, etc.

● Developed multiple hive plugins (U DF, UDAF, Serde) for the platform. [ B log post] Capillary Customer Intelligence Manager : Capillary’s flagship product for retail customer analytics & reporting. [ l ink ]

● First developer in the team, built the product from scratch writing core components.

● End to end ownership of the product including design, development, deployment, adoption & customer success.

● Developed p atented technology for core reporting framework [R eport Generator System & Method]

● Developed In Browser OLAP engine for highly efficient slicing / dicing of data cubes in memory.

● Custom query processing engine with auto sql injection for f ilters and s egments.

● Built d ashboards for customer / product / sales trend analysis.

● Highly successful product with a great business impact major source of revenue and client acquisition.

● Currently being used to extract 30,000+ reports & dashboards for 300+ retail clients. Software Development Engineer Intern [A pigee Technologies] [July 2011 Dec 2011] Configuration Data Store : h ighly available data store built for cluster configuration management.

● Built on top of A pache Zookeeper f or high availability and fault tolerance.

● Built an additional monitoring system for the cluster usage metrics. Grad Projects

Similarity Detection in Wikipedia articles : Finding top N similar documents / articles out of a large corpus (Wikipedia articles) for a given article of user preference.

● Used topic modeling (tf idf) and cosine similarity

● The algorithm was implemented as Map Reduce program and was run on a hadoop cluster for experimentation. Fault Tolerant Beowulf Cluster : Implemented a fault tolerant Beowulf cluster to perform parallel computation using MPI with a heart beat mechanism to detect node failures. Areas of Interest

● Data Engineering / Sciences

● Big Data Analytics

● Machine learning / Neural Networks

Contact this candidate