Big Data Lake

Location:

Irving, TX

Posted:

September 10, 2025

Contact this candidate

Resume:

Krupali Raninga

+1-401-***-**** ********@*****.***

LinkedIn: https://www.linkedin.com/in/kraninga/

EXPERIENCE SUMMARY

More than 19 years of experience in building distributed, scalable and complex applications using Java, J2EE and Big Data Technologies.

In depth understanding of spark architecture and fine-tuning spark jobs.

Experience with Azure BI Services like Auzure Data Factory, Azure Data Lake Storage

Gen2, Azure Blob, Data Lake (ADLS), Key Vault, Synapse, Azure and SQL DB.

Expertise with Oracle 10+ version in writing complex SQL queries, stored procedures using PL/SQL.

Experience in optimizing database performance, including indexing, query optimization and trouble shooting performance issues.

In depth understanding/knowledge of Hadoop Architecture and its components.

More than 10 years of experience as Big Data senior developer and technology lead with good understanding of the Hadoop Eco System Map Reduce, Hive, Sqoop, Oozie, Storm, Flume, HBase, Pig, Sqoop, Cassandra, MongoDB, DynamoDB, Phoenix, Impala, Microsoft Azure and services.

Experience in setting up Hadoop clusters using open and vendor specific distribution like Horton Works.

Strong experience working on Apache, Hortonworks and Cloudera Hadoop distributions.

Strong experience with Microsoft Azure services like Info works, Azure Data Factory, Databricks, Azure Blob Storage, Data lake (Gen1,Gen2), Logic Apps, Azure Functions, Cosmos DB, Microsoft Fabric.

Strong experience working on Presto – distributed SQL query engine for running interactive analytic queries.

Strong experience working on Tellius – Natural Query Language Analytical tool.

Strong understanding of Map Reduce programming and experience in analyzing data using Map Reduce, HiveQL.

Very good experience with both Map Reduce 1 (Job Tracker) and Map Reduce 2 (YARN) setups

Good experience in monitoring and managing the Hadoop cluster.

Very Good knowledge of writing hive UDFs and queries.

Expertise in Importing/Exporting data into HDFS from existing relational databases using Sqoop, sparksql.

Good working knowledge of Amazon Web Service components like EC2, SQS, SNS, Dynamo DB, S3 etc.

Managed data extraction for ETL Data warehouse and applied the transformation rules as necessary for data consistency

Experienced with Struts, Spring, EJB, Hibernate, JMS, SOA, SOAP, Restful Web services, Design Patterns, JBoss, Apache tomcat, Jetty, JSP, JSF, Tiles and Servlets

Experienced with Jenkins, Hudson, SVN, git, CVS, JIRA, Bugzilla, Python, and build tool ANT.

Experienced with XML related technologies such as XML, XSLT, XSD, DOM and SAX.

Hands on experience in using IDEs like Eclipse, NetBeans and JBuilder.

Excellent written and verbal communication skills, inter-personal skills and self-learning attitude.

Worked extensively in Linux environment and writing shell scripts.

Excellent debugging skills and strong troubleshooting skills.

Extensive experience in all phases of Software Development Life Cycle (SDLC) including identification of business needs and constraints, collection of requirements, detailed design, implementation, testing, deployment and maintenance.

Lead the team of 7+ and highly Involved in recruiting team members for more than 6 years.

SKILLS & INTERESTS

Hadoop

Microsoft Azure, Data Factory, Info works, Databricks, Azure Blob Storage, Azure Data lake, Azure Logic Apps, Azure Functions, Map Reduce, HBase, Impala, Pig, Sqoop, Hive, Phoenix, Zookeeper, Storm, Cassandra, Flume, Storm, Oozie, Pig, HDInsight’s, DynamoDB, Spark SQL

Languages

Java, XML, PL-SQL, Python, Shell scripts.

Technologies

J2EE, EJB 3.0, JDBC.

Web services

SOAP and RESTful

Frameworks

Spring, Struts

ORM Tools

Hibernate

Reporting Tools

Tableau, BIRT (Business Intelligence Reporting Tool)

Test Frameworks

JUnit, Selenium testing

Software tools & Utilities

Jenkins, SVN, VSS, CVS, GIT, eclipse

Web Servers/App. Servers

Weblogic, Web Sphere, Apache Tomcat, JBoss 4.x and Jetty.

Web UI

JSP, JavaScript, JQuery, Html5, CSS3 and Python

Database

Oracle, My SQL, Teradata

No SQL Database

HBase, DynamoDB, Cassandra, CosmosDB and Mongo DB

Operating Systems

Linux and Windows

Methodologies

Waterfall and agile

Projects and Work Experience.

Citibank

Senior Data Engineer Jan 2025-Present

Project Description:

Some of the challenges faced in current framework is Monolithic code, constant data growth, late arriving data, constant addition of columns, inconsistent design. To overcome these challenges, citi decided to create a standardized framework which requires minimal administrative action, deploy new features using existing CI/CD pipelines, mechanism to utilize query result across multiple features, mechanism to secure features based on security group.

Roles and Responsibilities

Responsible for implementing data transformation rules on the newly added columns and include the same in regulatory and non-regulatory reports.

Optimize the standardization layer performance issues by using spark optimizing techniques.

Migrated few modules from spark 2 version to spark 3 and integrated with other modules.

Collaborate with data analysts and testing teams to understand the requirement, develop and deliver from end to end and provide relevant solutions.

Developing and working on Azure Services such as Info works, Data Factory, Azure Databricks, Log Analytics using python/pyspark language

Troubleshoot and analyze and resolve data mismatch between multiple systems and fix the issues by collaborating with various teams

Ensure Data quality, lineage and consistency.

PepsiCo Inc

Senior Data Engineer Sep 2019-Dec 2024

Project Description:

Business team wants to understand the patterns and trends in the data to get more insights which help them improve the sales and understand the business strategies. In order to achieve this, business wants to run their statistics and ML algorithms on huge data sets residing in the data lake. Currently, the data is scattered in various locations and various systems.

Roles and Responsibilities

Responsible for architecting, designing and implementing data ingestion pipelines for batch, real-time and data streaming

Resolved a complex performance issue by optimizing SQL queries and implementing appropriate indexing strategies.

Managed an Oracle database with 1TB of data supporting over 10,000 users.

Develop and manage data pipelines using tools such as Azure Data Factory, Azure Synapse Analytics, or other ETL tools

Developing and working on Azure Services such as Info works, Data Factory, Azure Databricks, Log Analytics using python/pyspark language

Responsible for data wrangling, advanced analytics modeling and AI/ML capabilities.

Collaborate with data scientists, analysts, and business stakeholders to understand their data needs and provide relevant solutions.

Responsible to build data ingestion pipelines to bring the data from various sources in the form of flat files or EDW tables into Azure data lake using Info works, Azure Data Factory.

Troubleshoot and resolve data-related issues or performance bottlenecks.

Responsible to parse the data with json format into flat structures and store it in Info works/data lake for consumption.

Responsible for creating schemas, views in Presto for business to access. Responsible for understanding business needs to create dashboards, insights in Tellius and give presentations to Senior leaders.

JP Morgan Chase March 2019 – Aug 2019

Big Data Project Lead

Project Description

Global Identity & Access Management has their internal system to generate reports on the requests coming from internal and external audit teams. Since the data is located in different sources, it takes good amount of time to generate a single report as a response to their query. Also, the reports manually generated are not accurate and provides miss-leading information. Hence, as a solution to this problem, all the data needs to be centrally located in Hadoop ecosystem and a UI portal needs to be built on top of it to generate reports for internal and external audit systems. This process of data ingestion is termed as Historical Access Archive and the reporting system is termed as Self Service Audit Portal.

Roles and Responsibilities

●Responsible for Project delivery

●Understand the business requirements and Involved in the evaluating the right technology stacks for the project.

●Created Conceptual and detailed design diagrams in Visio for Historical Access Archive.

●Working on implementation in Cloudera cluster owned by JPMC’s CSOrion team.

●Working closely with Product Managers, Data Modeling teams, Business Analysts and testing teams.

●Created the low level and high level design documents in Confluence.

●Working on Sqoop scripts to fetch the data from RDBMS source to store in CSOrion Datalake in Parquet file formats.

●Loading the data from RDBMS (Oracle) source to HDFS and then load from HDFS to Hive tables.

●Participate in daily Scrum and Sprint Planning/Retrospective meetings.

●Develop the application in Linux development environment, Test environment (Linux), and move the application to Production.

●As a technical lead, took the responsibility for all the releases and post validations.

●Documenting the changes done during the release.

●Brought entire team up to the speed by ramping up the team with skillset and on-going tasks.

●Took the leadership of the completing processes from start to end and integrating with multiple teams

Blue Cross Blue Shield - Infosys May 2015- March 2019

Hadoop Technical Lead

Project Description

Government programs are receiving huge data from different vendors from different sources. With the increasing data, it is difficult to analyze which provider/vendor has sent the files (Medicare/Medicaid claims data) on timely basis. Also, the incoming data from providers needs to be processed and send to the vendors in the form of extracts with specific information.

Working on a GPD (Government Program Data) and ICC Projects. HCSC (Health Care Service Corporation) receives different types of membership and claims files (Medicare and Medicaid) from different vendors from different sources. HCSC will process these files for analytics teams and stores the data in Hive and HBase tables. The processed data will be stored in history, current snapshot and CDC tables to retain the latest and greatest information of each Subscriber claim details and send to the vendors in JSON format.

ICC will receive claims data from blue chip and TMG vendors and process the data and generates the professional and institutional claims. These generated files will be validated against and Edifecs.

Roles and Responsibilities

●Responsible for Project delivery

●Understand the business requirements and Involved in the design and development of project.

●Working on implementation in Horton Works 2.6 cluster.

●Working closely with Product Managers, Data Modeling teams, Business Analysts and testing teams.

●Involved in the low level and high level design.

●Working on Pig scripts in Java to process the event data (Xml, Text and CSV formats) and create and store in HBase/Hive table in ORC, Sequence and XML formats.

●Wrote the shell scripts to process the jobs and schedule the jobs using ASK Zena scheduler tool. Configured and created jobs in event based and time based.

●Loading the data from Teradata to HDFS and then load from HDFS to Hive tables.

●Participate in daily Scrum and Sprint Planning/Retrospective meetings.

●Develop the application in Linux development environment, Test environment (Linux), and move the application to Production.

●As a technical lead, took the responsibility for all the releases and post validations.

●Documenting the changes done during the release.

●Brought entire team up to the speed by taking sessions on using GIT and ramping up the team with skillset

●Took the leadership of the completing processes from start to end and integrating with multiple teams

Tools/Technologies: Horton works 2.6, Pig Script, Phoenix 4.7, Pig, Hive, HBase, Zena Scheduling tool, Shell Script and Python

Hadoop Developer - Infosys April 2016 to November 2016

Project Description

Merck Data scientists wants to analyze the products, chemicals, their usage, their impact on variation of proportion, aggregated information. This problem statement was acknowledged with the visualizations which was built to work on Hadoop environment on hive tables.

Roles and Responsibilities

●Worked on a live 50 nodes Hadoop cluster running on Horton works Hadoop

●Developed visualizations using Tableau and hive tables which is used for data analysis by data scientists.

●Created custom UDAFs using java and used in MRD process.

Tools/Technologies: Hive, unix scripts, sqoop, Tableau, Horton works Hadoop

Fidelity Investments - Infosys March 2015- Dec 2015

Hadoop Lead Developer

Project Description

Fidelity’s business users wanted to have aggregated information over a period of time on millions of IPs at one go. This problem statement was acknowledged with the Model Ready Data (MRD) process which was built to work on Hadoop environment on hive tables.

Roles and Responsibilities

●Worked on a live 87 nodes Hadoop cluster running on cloudera Hadoop

●Developed Model Ready Data Process using hive queries, Java and Unix Scripts which is used for modeling and scoring purpose by business.

●Created custom UDAFs using java and used in MRD process.

●Unit and Integration testing of the MRD process using hive queries.

●Did unit and integration testing of Monthly MID to IP and ABT process which are basically monthly and daily aggregations of data using Control-M jobs.

Tools/Technologies: Hive, unix scripts, Control-M, cloudera Hadoop 5.4

Infosys December 2014- March 2015

Hadoop Lead Developer

Project Description

To produce meaningful and consumable key insights into our client’s actions and behaviors in order to have a more comprehensive and holistic understanding therefore increasing productivity of the FFAS business and ultimately driving sales.

Roles and Responsibilities

●Requirements gathering and clarification from clients.

●Co-ordinating with offshore team to explain and cross-verify the requirements.

●Solving technical challenges faced by offshore team in deploying/running the use cases.

●Developed Reports using Tableau and hive/impala.

●Worked on establishing the environment by installing hadoop components needed for use cases.

●Code Review and deployment.

●Showcasing the use cases in business meetings to the clients.

Tools/Technologies: Map Reduce, Hive, Impala, Tableau, NLTK, Oozie, sqoop

Infosys July 2013-October 2014

Hadoop Developer

Project Description

Client was interested in developing new features, functionalities and enhancements in current eservices using Hadoop architecture in cloud environment. The objective is to enhance customer experience, performance, scalability and provide improved eService Platform agility.

Roles and Responsibilities

●Worked on architecture design to replace 2-3 components built in java by storm component.

●Designed and developed restful web services to load and fetch data to/from HBase.

●Wrote Map Reduce programs to read huge log files and push the data into Hive.

●Explored few components like Apache Kafka, Apache Flume, and Nagios to check if it suffices the requirements.

●Administered the cluster of 10 nodes and made changes to add/remove nodes and HDFS directory

●Configured (added in extra node) zoo keeper to check the consistency.

Tools/Technologies: Map Reduce, Hive, HBase, Restful Web Services, Apache Storm, Apache kafka, Apache Flume, Nagios, Zookeeper

Infosys July 2012-June 2013

Big Data Developer

Project Description

The project represents the first enterprise set of foundation capabilities to enable unified access to any digital information in any format – structured and unstructured in any information store. The program makes the set of federated content stores look and act like a single system. The user need not be aware of where the data is stored.

Roles and Responsibilities

●Involved in sprint planning and estimation.

●Involved in development of all the modules using Spring Framework and amazon web services to create/update/delete the data in DynamoDB.

●Developed unit and integrated test cases using junit and grizzly.

●Created sample test cases for the testing team using Selenium testing.

●Developed JSON, XML data adapter for transforming data sets.

●Developed data adapter for Dynamo Db and S3 file system

●Experience on loading and transforming of large sets of structured, semi structured and

●Unstructured data.

●Responsible to manage data coming from different sources and application

●Design & develop UI screen for clients to interact with system

Language Core/Advanced java

Infosys April 2012-June 2012

Technology Lead

Roles and Responsibilities

● ICA and AH are among northern Europe’s leading retail companies. They both use common java based application which is deployed on WAS6.1 server. They are planning to upgrade this server to IBM WAS8. We are doing POC on two modules, namely EMS buying and DCR for this upgrade.

● Worked on building Java based tool to find incompatibilities in java based application.

● Worked on building UNIX scripts for profiling of WAS server deployed on unix environment, deploying EAR application on WAS, etc.

● Mentored a team of 6 members

●Research work on finding incompatibilities among different versions of J2EE technologies

Infosys July 2011-Aril 2012

Technology Analyst

Roles and Responsibilities

●Diageo is world's leading premium drinks business.

● Project included building webservices to support CRUD operations on the live Data

●Prepared High level and low level documents

●Developed web services to support CRUD operations on live data.

●Unit tested the web services using JUnit

●Integration testing using JMeter

Infosys January 2011-June 2011

Technology Analyst

Roles and Responsibilities

●AKBank was an integration project where client wanted to replace their legacy systems with Finacle (Infosys product) using OSB (oracle service bus) as middleware.

●Did detailed design of the services

●Developed services

●Resolved technical issues in integration testing

●Did unit and integration testing with multiple teams

●

Cognizant December 2009-December 2010

Technology Analyst

Roles and Responsibilities

●Involved in Technical analysis of all the use cases of the project “CRM for RMIC (Customer Relationship Management for Relationship Managers of Institutional Clients)”

●Developed use cases using OBP and MCP frameworks defined by Credit Suisse.

●Responsible for unit and integration testing

Argusoft June 2006-November 2009

Senior Developer

Roles and Responsibilities

●Part of Analysis and designing of UI component development using JSF framework.

●Involved in tools and technology selection for the UI development

●Development of all JSF pages

●Developing ANT scripts for WAR generation

●Integration responsibility with other layers in architecture

●Worked on Web Methods layer in the project

●Involved in the integration and unit testing of the modules

●Technologies used are JDK 1.5/1.6, J2EE, JSF, XML Schema/XSD/XSLT, WSDL, Net Beans as an IDE, Web Methods SOA Suite (Developer, Designer, Blaze), J2ee Patterns, TILES, JSTL, Oracle as a Database, Toad, Sql Developer, Konakart, MS-Visio

●Involved in each and every phase of Spring MVC Web Architecture, i.e.,

●Hibernate Managed Bean Creation (Domain Model Layer)

●Responsible for integrating web layer with business layers

●Involved in Report Creations and it’s designing using Ireport

●Technologies involved : JAVA/J2EE, JSP, Spring WS, XML Schema/XSD/XSLT, XBRL, Net Beans as an IDE, Oracle as a Database, Glassfish App. Server, Toad, MS-Visio, Windows XP, Windows 6 mobile

●Involved in Report creations and its designing using Business Intelligence Reporting Tool (BIRT).

●Hands on creating many services in Financial Accounting and Asset Management modules.

●Developed JSPs for almost all modules

●Involved in integration and unit testing of modules

CERTIFICATIONS

Cloudera Certified Hadoop Programmer

Cloudera Certified Hadoop Administrator

Sun Certified Java Programmer (JDK 1.5)

EDUCATION

Atmiya Institute of Technology & Science - Saurashtra University

B.E. (Computer Engineering), 2006 with 68% aggregate

Contact this candidate