RESUME
Yogesh Kumar Jha
Phone : +91-953******* Address : Bangalore, India
Email : y *****************@*****.*** Date of Birth : 29 September 1989 GitHub : h ttps://github.com/yogesh github LinkedIn : h ttp://in.linkedin.com/in/yogeshjha Educational Qualifications
Birla Institute of Technology & Science Pilani, India [June 2008 May 2012] M.Sc.(Tech.) Information Systems
CGPA : 7.44 / 10
Professional Summary & Achievements
● 3.9 years of experience working with two highly successful early stage startups in retail CRM & e commerce.
● Built core products for these startups with strong business impact, driving revenue and customer acquisition.
● Built 3 products from scratch in the field of Big Data Engineering / Business Intelligence / Analytics.
● Handled complete ownership of products (conceptualization, business requirements, prototyping, design, development, deployment, operations, support, client adoption, customer success).
● Founding member of Data platform team at my current company [B oomerang Commerce ].
● Handled cross team responsibilities and worked with Implementation Services team to get product feedback
● Served as a single point of contact for product related queries / issues / enhancements / client delivery.
● Solved complex problems resulting in IP Patent [R eport Generator System & Method] .
● Consistently recognized as extra ordinary performer with high sense of ownership and responsibility. Technical Skills
● Programming Languages : Java, Python
● Scripting Languages : Shell scripting, SQL, HiveQL, PSQL
● Databases : MySql, Redshift, HBase
● Operating System : Linux / Unix
● Tools : Eclipse, Maven, Subversion, Git, AWS, EMR
● Technologies : Hadoop, HBase, Hive, Zookeeper, Spark, Phoenix, Sqoop, Azkaban, Redshift
● Other skills : Distributed systems, Data Engineering, Data Warehousing, ETL, Reporting, Analytics Patents
● Report Generation System & Method (201********) [h ttp://patents.justia.com/patent/201********] Work Experience
Senior Software Development Engineer [ B oomerang Commerce] [ Nov 2014 present] Data Platform : Core platform in boomerang tech stack which handles entire data processing across all clients.
● Founding member of data platform team, built initial P OC / prototype.
● Built the product from scratch, took it to production and then adoption for all clients
● Horizontally scalable platform built using latest technologies H adoop, Hive, Sqoop, Redshift, EMR, Azkaban
● Platform Handles 1 0+ TB of data processing daily
● Completely config driven ETL creation and data validation
● 8 hours of mysql based ETL time brought down to less than 1 hour
● Supports out of the box data validations & outlier detection over historical data sets Software Development Engineer [C apillary Technologies] [June 2012 Oct 2014] Big Data Warehouse : Central OLAP data warehouse built on top of HBase to power retail analytics and reporting.
● Initial member of the team, built the prototype and then the product from scratch
● Built using latest bigData technologies H adoop, H Base, Phoenix, Sqoop, Hive
● Re wrote hive hbase handler project to fit our use case. Introduced Apache Phoenix serialization in H BaseSerDe for better range scan performance. Customized predicate pushdowns, enabled composite row key support, wrote custom input splitting strategy to enable parallel scans within a hbase table region, etc.
● Designed Schema for O LAP cubes in Hbase to store rolled up customer and product information.
● Handled core components of the warehouse like processing framework, MR / HBase performance optimizations, hive query generator, hive plugins, etc.
● Developed multiple hive plugins (U DF, UDAF, Serde) for the platform. [ B log post] Capillary Customer Intelligence Manager : Capillary’s flagship product for retail customer analytics & reporting. [ l ink ]
● First developer in the team, built the product from scratch writing core components.
● End to end ownership of the product including design, development, deployment, adoption & customer success.
● Developed p atented technology for core reporting framework [R eport Generator System & Method]
● Developed In Browser OLAP engine for highly efficient slicing / dicing of data cubes in memory.
● Custom query processing engine with auto sql injection for f ilters and s egments.
● Built d ashboards for customer / product / sales trend analysis.
● Highly successful product with a great business impact major source of revenue and client acquisition.
● Currently being used to extract 30,000+ reports & dashboards for 300+ retail clients. Software Development Engineer Intern [A pigee Technologies] [July 2011 Dec 2011] Configuration Data Store : h ighly available data store built for cluster configuration management.
● Built on top of A pache Zookeeper f or high availability and fault tolerance.
● Built an additional monitoring system for the cluster usage metrics. Grad Projects
Similarity Detection in Wikipedia articles : Finding top N similar documents / articles out of a large corpus (Wikipedia articles) for a given article of user preference.
● Used topic modeling (tf idf) and cosine similarity
● The algorithm was implemented as Map Reduce program and was run on a hadoop cluster for experimentation. Fault Tolerant Beowulf Cluster : Implemented a fault tolerant Beowulf cluster to perform parallel computation using MPI with a heart beat mechanism to detect node failures. Areas of Interest
● Data Engineering / Sciences
● Big Data Analytics
● Machine learning / Neural Networks