Nilanjan Mitra
Mobile: 972-***-****
Email: ********.*******@*****.***
PROFILE SUMMARY:
Motivated IT professional with 14+ years of experience in data warehousing technologies as Tech Lead, Software Developer & Designer and 3 years of working experience on Big Data Hadoop (Cloudera distribution CDH 4 and 5) technologies like Hive, Sqoop, Impala, HDFS, Apache Spark and Flume.
PROFESSIONAL EXPERTISE:
Experience in Team Leading, Software Development & Design.
Diverse domain expertise including Manufacturing, Health Care, Life Science, Banking and Retail
Experience in Data Warehousing technologies like Informatica Powercenter
Good Experience in ORACLE 9i, ORACLE 10g, Oracle 11g.
Experience in all areas of project life cycle using both proprietary methodologies and Agile Techniques.
Solid expertise in the workings of Hadoop internals, architecture and supporting ecosystem components like Hive, Spark, Sqoop, Pig, Impala and Flume.
Adept at HiveQL and have good experience of partitioning (time based), dynamic partitioning and bucketing to optimize Hive queries. Also used Hive’s MapJoin to speed up the queries when possible.
Used Hive to create tables in both delimited text storage format and binary storage format.
Have excellent working experience in using the two popular Hadoop binary storage formats Avro datafiles and Sequence files.
Also have experience developing Hive UDAF to apply custom aggregation logic.
Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa. Also have good experience in using the Sqoop direct mode with external tables to perform very fast data loads.
Used AirFlow DAGs for creating workflow and coordinator jobs that schedule and execute various Hadoop jobs such as Spark jobs, Hive, Pig and Sqoop operations.
Good knowledge on Spark, Python and Scala.
Good conceptual understanding and experience in cloud computing applications using Amazon EC2, S3, EMR.
Proven ability to work under pressure, prioritize and meet deadlines. Open to dynamic work environment and ability to work collaboratively with business analysts, testers, developers and other team members in the overall enhancement of the product quality.
Strong business acumen, strategic thinking, communication, interpersonal and presentation skills, adept at resolving conflicts.
TECHNICAL SKILLS:
Hadoop Ecosystem Technologies
HDFS, MapReduce, Hive, Pig, Sqoop, Spark, Impala, Flume, Qozie, Airflow DAG
Programming Languages
SQL and PL/SQL, Python, Scala
Operating Systems
Windows 98/XP/2000/NT/VISTA, Unix
RDBMS Databases
Oracle 9i,Oracle 10g,Oracle 11g,DB2,Sybase, SQL Server, Netezza
Scripting Language
Shell Scripting
Tools
TOAD, SQL Developer, ANT, Maven, Visio, Informatica Powercenter, SVN, Bit Bucket, ControlM, Autosys, Workload Automation
WORK EXPRERIENCE:
BANK OF AMERICA OCT’ 17 – TILL DATE
Domain: BANKING
Technology: HADOOP TECHNOLOGIES (SPARK, HIVE, IMPALA, SQOOP, OOZIE), ORACLE, NETEZZA, AUTOSYS, UNIX
Senior Hadoop Developer
Bank of America Corporation is a multinational banking and financial services corporation, which provides its products and services through 4,600 retail financial centers, call centers, and online and mobile banking platforms. The bank's Consumer Real Estate Services segment offers consumer real estate products comprising both fixed and adjustable-rate first-lien mortgage loans for home purchase and refinancing needs, home equity lines of credit, and home equity loans.
The Wholesale Credit Data store DQ framework was created to meet a demand for EDM compliant enterprise level data quality measurement requirements and provide a standardized way to implement controls or rules. The driving capabilities consist of a common set of model components and patterns that can be extended to implement complex process controls and data quality measurements. The framework is implemented on heterogeneous platforms. It provides entities to support retention of measurements, results, defect details and summary of results at different aggregation levels and DQ metric calculations. The framework provides design time rule designer/code generator to automate rule implementation. It provides a runtime capability to manage activation of rules, changes of thresh holds and alert subscriptions.
RESPONSIBILITIES:
Building a Data Quality framework, which consists of a common set of model components and patterns that can be extended to implement complex process controls and data quality measurements using Hadoop.
Created and populated bucketed tables in Hive to allow for faster map side joins and for more efficient jobs and more efficient sampling. Also performed partitioning of data to optimize Hive queries.
Worked extensively with Sqoop to move data from NETEZZA and ORACLE to HDFS.
Scheduled Oozie workflow engine to run multiple Sqoop and Hive jobs, which independently run with time and data availability.
Used Spark SQL functions to move data from stage hive tables to fact and dimension tables in HDFS implementing the CDC logic and performed interactive querying.
Managing and scheduling Jobs on a Hadoop cluster using Oozie.
Implemented dynamic partitioning in hive tables and used appropriate file format, compression technique to improve the performance of map reduce jobs.
Experience in managing and reviewing Hadoop Log files generated through YARN.
Work with Data Engineering Platform team to plan and deploy new Hadoop Environments and expand existing Hadoop clusters.
Monitor Autosys jobs and resolve issues in case of failure.
NIKE INC JUN’ 16 – SEP’ 17
Domain: RETAIL
Technology: HADOOP TECHNOLOGIES (SPARK, HIVE, IMPALA, SQOOP), INFORMATICA 9.1, ORACLE, AUTOSYS, UNIX
Senior Hadoop Developer
Nike, Inc. is an American multinational corporation that is engaged in the design, development, manufacturing and worldwide marketing and sales of footwear, apparel, equipment, accessories and services.
Consumer Knowledge is part of Consumer Digital Technologies (CDT). It exists to enable Direct-to-Consumer (DTC) and Global Consumer Knowledge COE (GCK) data scientists and analysts with the platforms, tools, and data to deeply understand consumer behavior so that they can inform the strategy for Nike's consumer-facing digital products and experiences. The objective of the project is enhance and expand NIKE’s ability to gather, analyze and leverage insights on its consumers in order to deepen our understanding and deliver personalized experiences.
RESPONSIBILITIES:
Hands on experience in loading data from UNIX file system to HDFS. Also performed parallel transfer of data from landing zone to the HDFS file system using DistCp.
Experienced on loading and transforming of large sets of structured and semi structured data from HDFS through Sqoop and placed in HDFS for further processing.
Designed appropriate partitioning/bucketing schema to allow faster data retrieval during analysis using HIVE.
Involved in processing the data in the Hive tables using HQL high-performance, low-latency queries.
Transferred the analyzed data across relational database from HDFS using Sqoop enabling BI team to visualize analytics.
Developed custom aggregate functions using Spark SQL and performed interactive querying.
Managing and scheduling Jobs on a Hadoop cluster using Airflow DAG.
Involved in creating Hive tables, loading data and running hive queries in those data.
Extensive working knowledge of partitioned table, UDFs, performance tuning, compression-related properties in Hive.
Work with Data Engineering Platform team to plan and deploy new Hadoop Environments and expand existing Hadoop clusters.
Monitor Autosys jobs and resolve issues in case of failure.
Deploy Informatica objects in production repository.
Monitor and debug Informatica components in case of failure or performance issues.
ALCON LABORATORIES (NOVARTIS) JUN’ 15 – MAY’16
Domain: LIFE SCIENCE
Technology: INFORMATICA 9.1, SQL SERVER, Control M, HADOOP TECHNOLOGIES
( SPARK, HIVE, IMPALA, SQOOP)
Hadoop Developer
Alcon, the second largest division of Novartis, is the global leader in eye care proudly reaching more than 90 percent of the globe – operating in more than 75 countries, serving 180 markets.
The EnVision Project integrates all critical commercial information to provide better insights, analysis and fact-based decision making to increase Commercial effectiveness. This will be implemented through a single, global data warehouse providing one version of the truth consistently from Executives to the Sales representatives. CDW envisaged building next-generation DWBI platform by integrating data from different sources, building conformed dimensions, establish scalable, flexible architecture and build Business Objects reporting platform for creating standard in Mobile BI and Ad hoc reports on Pharmaceutical, Surgical and Vision care sales and marketing business areas.
RESPONSIBILITIES:
Moving data from Oracle to HDFS and vice-versa using SQOOP.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
Worked with different file formats and compression techniques to determine standards
Developed Hive queries to analyze/transform the data in HDFS.
Designed and Implemented Partitioning (Multi-level), Buckets in HIVE.
Analyzing/Transforming data with Impala and Hive
Gathering Client requirements and writing techno-functional requirement document.
Review design and development artifacts are to ensure quality in the products being developed
Perform analysis for various enhancements, perform impact analysis to find out the systems/programs that could be potentially affected by proposed change(s)
Promoting the code through various stages (SIT, UAT and PROD).
Estimating effort for Change Request.
Responsible for providing recommendations and technical solutions on improving the processes.
Interacting with Customer for regular status.
Effective coordination with offshore team and managed project deliverable on time.
WORK EXPRERIENCE:
Merck & Co. Inc JAN’ 12 – MAY’15
Domain: LIFE SCIENCE
Technology: INFORMATICA 9.1, ORACLE 10g
TECH LEAD
Merck is a global research-driven pharmaceutical company. Established in 1891, Merck discovers, develops, manufactures and markets vaccines and medicines in over 20 therapeutic categories.
Align is a Merck business initiative to replace all Managed Care applications with standardized solutions from a vendor-provided application suite. ALIGN is integration between 20+ systems and technology involved are Model N, Oracle, Teradata, Informatica, TIBCO and Cognos.
RESPONSIBILITIES:
Leading a project team of 10 people.
Gathering Client requirements and writing techno-functional requirement document.
Developing High Level Design and reviewing the low level design for new process.
Doing the Code review.
Review design and development artifacts are to ensure quality in the products being developed
Performance Tuning in Informatica
Promoting the code through various stages (SIT, E2E and UAT) & supporting it till it goes live.
Estimating effort for Change Request.
Review the Test Case document.
Responsible for providing recommendations and technical solutions on improving the processes.
Interacting with Customer for regular status.
Eli Lilly and Company JUN’ 11 – DEC’11
Domain: LIFE SCIENCE
Technology: INFORMATICA 8.6, ORACLE 10g, WORKLOAD AUTOMATION
TECH LEAD
Lilly CRM transformation vision is to “create a single view of the customer to maximize each and every interaction”. Enablers to the vision are standardization, efficiencies, replication, innovation and platform rationalization.
Lilly Business Unit (BU) Leaders has identified VBiopharma as a global CRM platform and Cognizant is responsible for the migration of around 4500 of its US sales users from Siebel/Dendrite based CRMs to VBiopharma. Cognizant is responsible for integrating multiple internal/external systems providing information on Master/Transactional data with VBiopharma.Eli Lilly also requires Cognizant to integrate around 5 sales partner organizations VBiopharma.
RESPONSIBILITIES:
Performed data Integration architect role
Involved in gathering requirements from Lilly Business
Worked with Data Stewards and System Owners from Master Data Systems to understand their data model.
Involved in creation of Architecture, Strategy, Business requirements and Functional requirement documents.
Designed Interface for real-time integration between salesforce.com and Master Data Systems
Review design and development artifacts to ensure quality in the products being developed
Performance Tuning in Informatica
Coordinated Interface cut over and scheduling for data integration activities
Participated in testing for application integration
Implementing the integrated interface through Workload Automation Tool.
HEALTH NET DEC'09 – MAY’11
Domain: INSURANCE
Technology: INFORMATICA 8.6, ORACLE 10g
TEAM LEAD & Senior Developer
The project revolves around creating a product called ClaimSphere. One of the most serious challenges that payer enterprises are grappling with is the enormous amounts of data used to make informed decisions. Enterprise Analytics solutions help to track and analyze information across the organization. ClaimSphere is Cognizant’s Enterprise and Analytical Reporting Framework built around the payer data model.
ClaimSphere simplifies the reporting mechanism through an array of systematic and extensive reports and easy-to-assimilate graphical representations. ClaimSphere supports user groups ranging from C-level executives to middle managers and process executives. The key aspect of the solution is that the analysis is user-driven and you can slice and dice vast amounts of data –resulting in improved decision making. Other major highlights include:
Reduced go-to-market time frame through customizable and extensible solution for healthcare payers
Incorporation of best practices aggregated over multiple engagements
ClaimSphere caters to both analytical and operational reports.
Analytical reports are in the form of Key Performance Indicators (KPIs) that enable analysis at various levels of aggregation and detail. The KPIs broadly cover the subject areas such as Aggregate Cost Analytics, Detailed Cost Analytics and Utilization Management Analytics.
Operational reports provide periodic information on various aspects of business performance. The functional areas covered include Claims, Member-Subscriber, Provider, Billing, Commissions, Utilization Management and Pharmacy.
RESPONSIBILITIES:
Performed data Integration architect role
Worked in different dimensions to handle subsequent tasks in my assigned role
Done Data analysis and Data Profiling
Structuring and maintaining the Data Dictionary for the DW, reviewing the data loaded into DW for accuracy
Reviewing referential integrity of DW data
Review design and development artifacts are to ensure quality in the products being developed
Participating in the definition of technical standards and guidelines for the database, UNIX and ETL technologies
Analysis of requirements and making estimations based on complexities of objects are being handled efficiently
Guiding development team to build Aggregate tables and ETL Design.
Coding, Unit Testing and performance tuning in Informatica
Involved in project planning, estimation, requirement traceability matrix, project tracking, execution, and status reporting
APPLE MAR'09 –OCT’09
Domain: MANUFACTURING
Technology: ORACLE PL/SQL, UNIX, Objective C
Module Leader and Senior Developer
This Project is to integrate the network of services, material, and information flow that link apple firm’s customer relationship, order fulfillment, and supplier relationship processes to those of its supplier and customers.
It is a support project in which three applications are being supported i.e. AFVT, ODYSSEY AND IBF.
The front end coding is based on Objective C and backend is developed on ORACLE10g. There are Autosys jobs that are kept in a UNIX machine is maintained under this project.
RESPONSIBILITIES:
Performing as the module leader.
Allocating Jobs to the other resources
Analysis.
Coding in pl/sql
Code Review
NORTHWESTERN MUTUAL APR’08 – FEB’09
Domain: LIFE INSURANCE
Technology: VISUAL BASIC, SYBASE, ACTUATE, INFORMATICA
MODULE LEADER and SENIOR SOFTWARE DEVELOPER
This Project is for the Solution Design and Execution of Field Management Compensation - Current Income - Phase 1 executed on Commission System in Field Force Business area.
The coding was based on Visual Basic. The database used was SYBASE. And there was a reporting tool called ACTUATE and an ETL tool INFORMATICA.
Enhancement was done for a particular application named ‘GABS’ (under the CM Semi-Monthly System) through which all the network offices of NM record and post accounting entries.
There was a data mart which was populated through informatica and reporting was done based on the data mart which shows the Lives Bonus compensation for Managing Partner, Managing Director and Field Director. The reporting tool used was Actuate.
RESPONSIBILITIES:
Performing as the module leader.
Allocating Jobs to the other resources.
Detail level design.
Analysis.
Coding in Actuate and Informatica.
AETNA DEC’07 - APR’08
Domain: HEALTH INSURANCE
Technology: ORACLE 10g
SOFTWARE DEVELOPER
Pharmacy adjudication system is one of the mission critical systems supporting the customer business units of Aetna. Aetna business group identifies potential drug manufactures, who to promote the sales, want to put their products on Aetna’s formulary (preferred drug list/network). Aetna sets up a contract with each of them, for all the products they wish to add to Aetna’s formulary. Doctors as well as pharmacies are provided with the preferred drug list (usually on-line). Doctors can prescribe a drug outside this preferred list. Whenever a member goes to buy a drug from a pharmacy, the Aetna Pharmacy Management Claim Adjudication System (APMCAS) adjudicates his claim online.
The coding was based on PL/SQL language. The database used was ORACLE 10g.
RESPONSIBILITIES:
Detail level design.
Analysis
Coding.
Unit Testing.
Test Cases Preparation.
Code Review and Unit Test Plan review.
Data set up & Verification of test results.
System Testing
Communicating with onsite co-coordinator.
Implementation Related Activities
COMPUTER ASSOCIATES DEC ’06 -OCT’ 07
DOMAIN: MANUFACTURING
Technology: JAVA, SQL SERVER, SYBASE, INGRES, TALEND ETL TOOL
Software Developer
This project deals with the data migration of three IT support systems to CA’s own IT support system.
The coding was based on JAVA.
POC was done with TALEND ETL tool for data migration.
I was mainly handling a request named as ‘USER ACCOUNT MIGRATION’. Also I have worked here with all the implementation related activities.
RESPONSIBILITIES:
Detail level design.
Analysis
Coding.
Unit Testing.
Test Cases Preparation.
Code Review and Unit Test Plan review.
Data set up & Verification of test results.
System Testing
Communicating with onsite co-coordinator.
Implementation Related Activities
APPLE JAN ’05 -DEC’ 06
DOMAIN: MANUFACTURING
Technology: UNIX, ORACLE 9i
Software Developer
Campaign Management System (CMS) maintains and provides biz support to GCA data mart that analyzes and gathers data from different data sources and empowers strategic planning for added profitability and marketing of apple.
Analyzing, optimizing and gathering millions of data in GCA data mart for the campaign of different products in apple and streamlining data flow with in or to downstream system.
GCA data mart is in Oracle 9i and Affinium Unica for campaigns and autosys job.
The coding was based on PL/SQL language. The database used was ORACLE 9i.
RESPONSIBILITIES:
Code Analysis.
Coding.
Unit Testing.
Code Review and Unit Test Plan review.
Data set up & Verification of test results.
Communicating with onsite co-coordinator.
PITNEY BOWES JUN ’04 -DEC’ 04
DOMAIN: MANUFACTURING
Technology: ORACLE 9i
TESTER
This project deals with system and integration testing of two shipping products of Pitney Bowes i.e. Deliverability and Ascent.
RESPONSIBILITIES:
Test Cases Preparation
System Testing
Integration Testing
Data set up & Verification of test results.
Communicating with onsite co-coordinator.
EDUCATION:
Bachelor of Technology, Information Technology - JUN’99 – MAY’03
Kalyani University, India