Post Job Free
Sign in

Data Platform Cloud

Location:
Wesley Chapel, FL
Posted:
April 22, 2025

Contact this candidate

Resume:

GURUDAS PRADHAN

Tampa, FL ***** 224-***-**** *******.*******@*****.***

https://www.linkedin.com/in/gurudas-pradhan-1835b2140/

Summary:

Gurudas is a Senior Bigdata & Cloud Data Platform Architect, has been in the Information Technology Industry for more than 16 years with specialization in Hadoop and Multi-Cloud technologies. Experienced in implementing Cloud solutions using various AWS services and skilled in designing, building and managing Hadoop, AWS-EMR, AWS-Redshift clusters across diverse environments. Adept at configuring security for Bigdata and other AWS services, introducing innovative tools and engines and successfully deploying complex Bigdata-driven solutions. Experienced architecting and scaling enterprise-grade data platforms, building high-performing analytics teams, and fostering a culture of data-driven decision-making. Skilled in leveraging cloud technologies and advanced analytics to optimize operations, fuel innovation and accelerate growth.

Key Accomplishments:

• Led the migration of 500+ applications to a cloud-native architecture and built an unified AWS data platform by streamlining operations, 3x reporting performance and enabling AI-driven insights.

• Led the successful transitioned from on-premise Hadoop clusters (CDP) to cloud (AWS), optimizing infrastructure and reducing operational costs while improving system reliability.

• Designed the architecture and implementation of the NRT Use case to consume the messages(originating from machine devices 10gb per second) using MSK and processed those real-time messages through PySpark, for the Machine Data (MDP Project).

• Implemented Dynamic Data Masking (DDM) feature on the Redshift Tables for the Payroll and TnL business to mask the sensitive data based on user roles and specific business criteria.

• Architected Redshift producer-consumer models with data sharing to enable seamless, high-performance cross-cluster access at enterprise scale—eliminating manual data movement.

• Led the performance analysis of Redshift clusters, uncovering key insights from query execution time, resource utilization, and concurrency metrics to unveil critical patterns across peak vs. normal loads and quarter-close periods.

• Designed and Configured Redshift WLM queues to optimize resource allocation, defining slot prioritization for critical workloads and ad-hoc queries.

• Automated periodic VACUUM and ANALYZE operations on Redshift tables, integrating performance monitoring to proactively resolve long-running queries and maintain optimal query execution.

• Implemented High Availability for Redshift clusters via Multi-Region Replication and multi-AZ deployment, ensuring data resilience, minimal downtime, and uninterrupted business operations.

• Designed Redshift Workspace Schema, Tables, Views, User group and Roles for the migration.

• Implemented the monitoring mechanisms with manual scripts, Cloud Watch rules, New Relic Alerts were in place by enabling timely intervention and resolution of potential issues.

• Implemented automation of storing and retrieving log and alerts using AWS resources such as lambda, SNS, DynamoDB and CloudWatch.

• Led the end-to-end architecture and migration of legacy Hadoop ecosystems (HDP 2.6.3 & CDH 6.3) to the modern, enterprise-grade CDP 7.1.4 platform, driving enhanced scalability, performance, and maintainability.

• Implemented data encryption, cross-cluster replication, and enhanced security measures using tools such as Ranger KMS and Hadoop Data Encryption.

• Led the design and implementation of “Yarn Capacity Scheduler” by configuring the Yarn Queues for various LOBs for the optimal usage of the CDP Cluster.

• Led the deployment of AWS components like Lambda, SQS,SNS, EC2, API Gateway, EMR, MSK, etc. using Terraform and Jenkins pipelines.

• Hands-on expertise across a broad suite of AWS services—including EC2, S3, VPC, Redshift, EMR, Snowflake, EBS, Auto Scaling, IAM, and CloudWatch—driving scalable, secure, and high-performance cloud solutions.

• Led the design and implementation of GEHC's Data Marketplace, empowering stakeholders to seamlessly discover, access, and collaborate on shared data assets for operational and analytical excellence.

• Engineered IAM policies and roles, configured S3 buckets and policies, set up VPC endpoints, API Gateway, and Security Groups to ensure robust, secure cloud architecture.

• Designed snowflake workspace schemas and external stages and deploying the snowflake Tables, Views in the schema during the data migration process.

• Architected and lead migration of on-prem to cloud based solutions and designed S3, EFS, layouts to alien with enterprise workflow.

• Extensive hands on experience on Hadoop eco-system components like HDFS, MapReduce, Hive, Pig, Hbase, Kafka, Sqoop, Oozie, Ranger, Spark and Zookeeper.

• Designed and built the HDF Clusters - Apache NiFi (3.0) and well versed of creating NiFi Processors and Process Groups for data ingestion.

• Led the deployment process of the NiFi .nar files across different environments.

• Collaborated with cross-functional teams to ensure data quality and availability and provided solutions to Architecture Review Board.

• Developed comprehensive documentation including business requirements, technical design, and upgrade procedures to support compliance and audit readiness for Hadoop and cloud platforms.

• Consistently maintained a customer-centric approach, resolving issues promptly, and ensuring customer satisfaction.

• Proven expertise across diverse domains including Finance, Healthcare, Life Sciences, Travel, and Manufacturing.

EDUCATION

Master in Computer Application

Licenses & certifications

* AWS Certified Solutions Architect – Associate

* AWS Certified Machine Learning - Specialty

* Hortonworks Certified Hadoop Administrator (HDPCA Professional)

* A PMP Certified Professional

* Completed the Big Data Hadoop Foundations certification from IBM – Big Data University

* Completed the Introduction to R certification from IBM – DataCamp

* Snowflake Certified Associate

* IBM Certified Designer for Cognos8 BI Reports

* IBM Certified Cognos8 BI Administrator

* IBM Certified Cognos8 BI Metadata Model Developer

TECHNICAL SKILLS

Cloud Environment Distribution: AWS

Services: EMR, S3, EC2, Glue, Glacier, Cloud Watch, Cloud Trail, SNS, SQS, DynamoDB, Kinesis, MSK, Athena, IAM, KMS, Lambda, Step Function, Sagemaker, Redshift, RDS, EventBridge, API Gateway

Cloud Datawarehouse: Redshift Analytics Warehouse, DBeaver SQL Client, Snowflake Datawarehouse (Version-3.30.2), Snow SQL CLI Client

Bigdata Environment Distribution: Cloudera (CDP-7.1.4, CDH – 6.3), Hortonworks (HDP-2.6)

Hadoop Ecosystem: HDFS, MapReduce, Yarn, Hive, HBase, Spark, Tez, Impala, Sqoop, Oozie, Zookeeper, Kafka, Storm, Knox, Ranger, Solr, Ambari, Cloudera Manager, NiFi, Striim

Reporting Tools: PowerBI, IBM-Cognos8.x

Databases: Oracle9.2, Oracle10g, Sybase12, DB2UDB8.2, SQL Server(2005,2008), Teradata12

Programming Languages: C++, C, Python, XML, SQL

Scripting Languages: Shell Script, Perl Script, Python

Version Control Tool: CVS, RCS, SCCS, VSS, IBM Rational ClearCase, MS - Shared Point Portal Server, GITHub

Operating Systems: HP-UX (9.04, 10.20, 11.0), Digital Unix 2000 (5.0D) (Thru 64), Sun-Solaris, (8), IBM-AIX-5.2, IBM-AIX-5.3, Linux

Compilers & Utilities: GNU Compilers (gcc-3.3.2, g++), IBM-Visual age compiler (xlc_r7, xlC_r7), Forte in Solaris-8, Makefile

PROJECT PROFILE:

Project Title ODP (One Data Platform)

Duration Jan’2023 – Till Date

Location Tampa, Florida, USA

Hardware/Operating Systems Linux (RHEL -7)

Cloud Technologies AWS, EMR, S3. EC2, Glue, Cloud Watch, SNS, SQS, DynamoDB, MSK, Athena, AWS IAM, AWS-Lambda, AWS-Redshift, RDS, Step Functions, Event Bridge

Reporting Tools/Databases/Other Utilities Power BI, Shell scripting, Python scripting, DBeaver, MS – SQL Server 2008, MySQL, Teradata, Oracle10g

PROJECT DESCRIPTION:

ODP aims to enable a data driven organization and optimize the cost of ownership by building a unified data platform for GEHC. ODP brings together strong enterprise data architecture & governance, near real-time data processing, next gen security architecture built on cloud, a contemporary big data foundation allowing for scalability and as-a-service offerings.

The following projects are undertaken on the ODP environment.

Finance Revenue Forecasting: The Objective of the finance revenue forecasting initiative is to design and implement a standardized finance revenue forecasting process that is centrally owned, automated and digitalized, providing the right level of details across the organization and for forecasting purposes through the PBI dashboard. The predictive revenue forecasting model is implemented mainly for Ultrasound, PCS and PDx suit of business.

Data Marketplace Initiative: The Data Marketplace is a capability that aims to deliver “Amazon like user experience” to help all data stake holders within GEHC organization quickly find, access, use and collaborate on a common set of data assets & products to address their operational and analytical needs

ROLES AND RESPONSIBILITIES IN THE PROJECT:

• Led the migration of 500+ applications to a cloud-native architecture and built an unified AWS data platform by streamlining operations, 3x reporting performance and enabling AI-driven insights.

• Led the successful transitioned from on-premise Hadoop clusters (CDP) to cloud (AWS), optimizing infrastructure and reducing operational costs while improving system reliability.

• Designed the architecture and implementation of the NRT Use case to consume the messages(originating from machine devices 10gb per second) using MSK and processed those real-time messages through PySpark, for the Machine Data (MDP Project).

• Implemented Dynamic Data Masking (DDM) feature on the Redshift Tables for the Payroll and TnL business to mask the sensitive data based on user roles and specific business criteria.

• Architected Redshift producer-consumer models with data sharing to enable seamless, high-performance cross-cluster access at enterprise scale—eliminating manual data movement.

• Led the performance analysis of Redshift clusters, uncovering key insights from query execution time, resource utilization, and concurrency metrics to unveil critical patterns across peak vs. normal loads and quarter-close periods.

• Designed and Configured Redshift WLM queues to optimize resource allocation, defining slot prioritization for critical workloads and ad-hoc queries.

• Automated periodic VACUUM and ANALYZE operations on Redshift tables, integrating performance monitoring to proactively resolve long-running queries and maintain optimal query execution.

• Designed Redshift Workspace Schema, Tables, Views, User group and Roles for the migration.

• Implemented the monitoring mechanisms with manual scripts, Cloud Watch rules, New Relic Alerts were in place by enabling timely intervention and resolution of potential issues.

• Implemented automation of storing and retrieving log and alerts using AWS resources such as lambda, SNS, DynamoDB and CloudWatch.

• Led the deployment of AWS components like Lambda, SQS,SNS, EC2, API Gateway, EMR, MSK, etc. using Terraform and Jenkins pipelines.

• Hands-on expertise across a broad suite of AWS services—including EC2, S3, VPC, Redshift, EMR, Snowflake, EBS, Auto Scaling, IAM, and CloudWatch—driving scalable, secure, and high-performance cloud solutions.

• Led the design and implementation of GEHC's Data Marketplace, empowering stakeholders to seamlessly discover, access, and collaborate on shared data assets for operational and analytical excellence.

• Engineered IAM policies and roles, configured S3 buckets and policies, set up VPC endpoints, API Gateway, and Security Groups to ensure robust, secure cloud architecture.

PROJECTS WORKED ON:

Enterprise Datawarehouse - EDW Migration, Centene Corp, Tampa FL, (Jan’2020 – Dec’2022)

The Enterprise Data Warehouse is a DataMart on Greenplum platform and load the refined data(member, provider, claim, account payable, premium) from Hadoop platform in order to establish a common data platform with enhanced data governance and security. The DataMart build denormalized datasets from OLTP source systems for eight subject areas of healthcare claim processing system. It is a three-tier processing system with data ingestion using Sqoop, Talend, Informatica-BDM and core processing on hive and end user datasets creation on pivotal Greenplum platform. Complex hql scripts are used to implement business rules and derive attributes for flattened datasets.

The new environment is supporting data processing as well as extend capabilities for business analytics, reporting and data exploration. The environment will be migrated from HDP to CDP environment in order to support structured, semi-structured and unstructured data as well as batch and streaming data flows for advanced analytics for real-time decision making.

It captures all data from different dispersed systems in its raw form, apply one architecture principles which will create a more scalable and flexible product and provide a common data services layer for end user and tool integration. The Enterprise Data Warehouse enable a data ingestion framework to support both batch and real-time data processing patterns, which is a common interface for data ingestion and enhanced governance and security controls based on the data sensitivity classification. It also integrates with a new Data Pipeline management tool that will handle completeness, timeliness as a service, support batch, micro-batch and streaming ingestion patterns.

ROLES AND RESPONSIBILITIES IN THE PROJECT:

• Led the end-to-end architecture and migration of legacy Hadoop ecosystems (HDP 2.6.3 & CDH 6.3) to the modern, enterprise-grade CDP 7.1.4 platform, driving enhanced scalability, performance, and maintainability.

• Led the migration of on-prem CDP 7.1.4 Hadoop workloads to AWS using EMR, S3, and Snowflake boosting scalability, cutting infrastructure costs, and accelerating data-driven insights.

• Developed Lambda scripts to dynamically provision EMR clusters for on-demand data processing and loading, integrated with S3 and CloudWatch.

• Developed and automated DistCp scripts to migrate data from HDFS to AWS S3, optimizing performance and ensuring data integrity with error handling during the cloud transition.

• Implemented IAM policies and roles for user and functional accounts, and configured S3 bucket policies and security groups to enforce enterprise-grade access control and cloud security.

• Implemented data encryption, cross-cluster replication, and enhanced security measures using tools such as Ranger KMS and Hadoop Data Encryption.

• Led the design and implementation of “Yarn Capacity Scheduler” by configuring the Yarn Queues for various LOBs for the optimal usage of the CDP Cluster.

• Setting up Cluster and installing new services in the Cluster.

• Setting up Kerberos for CDP Cluster and integrate Kerberos with Active Directory.

• Implemented Hadoop security including SSL/TSL, LDAP/Kerberos, Role based authorization.

• Led the Data Copy from HDP to CDP using distcp from cluster to cluster in different domains.

• Led the Migration of Hive schemas and Hive Tables, using the tool, HMS-Mirror and Replication Manager.

• Led the Migration of Ranger and Sentry policies and consolidated the policies in CDP Ranger environment.

• implemented the RTR (Real-Time-Replication) Tool, Striim Integration with CDP - Kafka in order to consume the Oracle log-miner messages coming from Striim via Kafka Topic and publish the messages.

• Created Kafka Topic and configure the access control through Ranger.

• Led the data validation and creating the validation reports for the migration project.

• Led the data load process using Hive scripts(.hql) by creating external ORC tables, AVRO tables and HBase reference tables.

• Installed the Python packages Pandas, Numpy, Openpyxl, etc. to execute the Pyspark Job.

• Developed Script to kill the Yarn long running Jobs.

• Developed automated scripts for seamless Hive table deployment across multiple environments, streamlining data pipeline operations.

• Setting up new Hadoop users and controlling their access in the cluster through Ranger Policy.

• Developed Host specific keytabs for specific Hadoop services and Headless keytabs for Generic User IDs

• Setting up configuration and security for Hadoop clusters using Ranger, Knox.

• Developed automated script to monitor Hadoop cluster connectivity and performance.

• Developed comprehensive documentation including business requirements, technical design, and upgrade procedures to support compliance and audit readiness for Hadoop and cloud platforms.

Enterprise DataLake, Discover Financial, Chicago, IL (Jan’2017 – Dec’2019)

Enterprise DataLake is a next generation data technology (Hadoop) platform to establish a common data platform with enhanced data governance and security. The new environment was supporting data processing as well as extend capabilities for business analytics, reporting and data exploration. The environment was modernized to support structured, semi-structured and unstructured data as well as batch and streaming data flows for advanced analytics for real-time decision making.

ROLES AND RESPONSIBILITIES IN THE PROJECT:

• Led the migration from on-premises Hadoop clusters to AWS, Snowflake achieving faster results and cost efficiency.

• Created Snowflake workspace schema and external stages and set up the policy for the stage in AWS.

• Designed Redshift Workspace Schema, Tables, Views, User group and Roles for the migration.

• Developed Lambda script for data load from AWS S3 to snowflake Datawarehouse.

• Led the data validation and creating the validation reports for the migration project.

• Creating and Loading external ORC tables by using Hive script.

• Developed PySpark code for data migration projects and experienced in troubleshooting PySpark issues.

• Involved in the Performance test on Production Cluster.

• Implemented data encryption, cross-cluster replication, and enhanced security measures using tools such as Ranger KMS and Hadoop Data Encryption.

• Installed, configured and administration of HDP 2.6.1 Clusters (Installed HDFS, Yarn, Hive, Spark, Atlas, Ranger KMS, Zeppelin, Knox, Sqoop, Kafka, HBase)

• Designed and built the HDF Clusters - Apache NiFi (3.0) and well versed of creating NiFi Processors and Process Groups for data ingestion.

• Led the deployment process of the NiFi .nar files across different environments.

Run The Bank, Royal Bank of Canada (Capital Market), Toronto, ON, CANADA (July’2014 – Dec’2016)

The RUN THE BANK project helps to setup, build, maintain and operate the entire Hadoop environment for the Enterprise and International Applications across the Corporate System for the Capital Market group such as Data Management, Market Risk and many other portfolios. The Development environment is having 6 node Cluster where as UAT, PRE-PROD and PRODUCTION each 12 node Cluster.

OPERATIONAL TRADE INFORMATION SYSTEM, Royal Bank of Canada (Capital Market), Toronto, ON, CANADA (Sept’2012 – June’2014)

The Client’s Capital Markets division has started a new initiative on Liquidity Risk management (LRM) to address liquidity risk regulatory requirements while at the same time strengthening its liquidity risk assessment and monitoring capabilities. Positions Data Service (PDS) system is intended to be the strategic platform to aggregate transaction data and related reference data required for Liquidity Risk management reporting. PDS is an existing application that collects, maintains and enriches position level information including all information necessary for the theoretical valuation of products to instrument level of granularity. Its initial focus was to meet IRC reporting requirements for which PDS had sourced majority of positions data from OTIS. PDS also plans to source all transaction related data (trades and positions) from OTIS to meet LRM requirements. It is expected that OTIS serves as a single source for complete and accurate Trade and Position Information, enriched with standard references.

ROLES AND RESPONSIBILITIES IN THE PROJECT:

• Set up and configured Hadoop cluster (HDP 2.3), including installation of new services and components.

• Led migration of Hadoop environment from HDP 2.3 to HDP 2.6, ensuring seamless transition and minimal downtime.

• Designed and executed the data migration roadmap from IBM DataStage (11.5) and Oracle to Hive on HDP 2.6.

• Performed end-to-end data validation and developed validation reports to ensure data integrity post-migration.

• Implemented enterprise-grade security in Hadoop using Apache Ranger policies to enforce access control in compliance with data security standards.

• Created and maintained external ORC, AVRO, and HBase reference tables using Hive scripts (.hql), with automated data loading from external sources.

• Produced comprehensive architecture diagrams and design documents to support the migration and ongoing data operations.

WRICEF Data Mapping Project, 3M, St. Paul, MN (Apr’2012 – Aug’2012)

The data warehouse environment that uses data from different Legacy Source systems (ERP) or flat files. Data is being standardized on the Enterprise Warehouse Management (EWM) System that must be reflected in the downstream applications.

ROLES AND RESPONSIBILITIES IN THE PROJECT:

• Designed the source to target mapping document by understanding the business requirement document.

• Performed the Data validation after the source data loaded in the EWM system.

• Led the Unit and Integration testing of the EWM system.

• Skill used: IBM Cognos Analytics, Informatica Power Center, Teradata

NA – Stabilization Project, CNH Industrial, Racine, WI (Aug’2011 – Apr’2012)

The scope of the project is to build a new Framework model (TNBI Master Model) using the DMR approach that fulfill the NA – Business reporting requirements more effective and efficient manner with performance as a major criteria. The RCAD package which built out of the TNBI Master model is Vehicle centric. It holds information related to vehicle dimensions that can be used to get single unit of data. It also holds information related to measures as stock, wholesale quantity etc. The package (RCAD) is published for the report studio users to generate reports for the Sales & Marketing department.

ROLES AND RESPONSIBILITIES IN THE PROJECT:

• Designed and implemented a scalable IBM Cognos Framework Manager model, integrating multiple global data sources

to deliver consistent, high-performance BI solutions across the enterprise.

• Designed List, Crosstab and Chart reports for the business users based on the TNBI Master Model.

• Designed and implemented the Sales dashboards for analyzing pipeline, targets, and regional performance.

• Led the Unit and Integration testing of the TNBI Master Model.

• Skill used: IBM Cognos Analytics, Oracle10g, Cognos Framework Manager, Report Studio, MS-SQLServer

Consumer Healthcare BI Application Support, GSK, Pittsburgh, PA (Nov’2010 – Aug’2011)

The primary objective of the Project was to have First point of contact with client for USA BI DW engagements in Consumer Health IT business, Create / update defined transition documentation for 17 Consumer healthcare applications, Perform the transition from end to end Processes and create deliverables as defined in the transition plan to take over the BI development and Support of applications. The scope of the project is to provide maintenance and support to Consumer Healthcare BI & Data warehouse applications.

ROLES AND RESPONSIBILITIES IN THE PROJECT:

• Designed and implemented a scalable IBM Cognos Framework Manager model, integrating multiple global data sources to

deliver consistent, high-performance BI solutions across the enterprise.

• Designed List, Crosstab and Chart reports for the business users based on the FM Model.

• Designed and implemented the BI Operational dashboards for monitoring daily activities and exceptions.

• Led the Unit and Integration testing of the FM Model.

• Provided production support of the BI Applications.

• Skill used: IBM Cognos Analytics, Cognos Cube, Oracle9i, MS-SQLServer, Unix Shell Scripting.

Citi - CBG, Citigroup, Irving, TX (Sept’2006 – Oct’2010)

The primary objective of the project is the development and deployment of a central data warehouse across the entire commercial business group to improve and in some areas initiate MIS reporting processes. The existing MIS reporting processes are fragmented, incomplete or non – existent, lack reliability and do not provide a common view across businesses. This is due largely to the fact that the business has undergone many transitions including multiple acquisitions resulting in a significant number of disparate source systems, business definitions and MIS processes. The integration and standardization of this data is critical to building MIS processes that are comprehensive, standardized and reliable.

• The MIS reporting processes (both automated and manual) for all businesses and functions in the Commercial Business Group is reengineered to function with the data warehouse.

• The reconciliation processes need to be enhanced or initiated to ensure reconcilement between MIS reporting and the source systems as well as the downstream systems (the financial statements, risk management, etc.)

• The initiation and standardization of assigning customer identifiers is likely involve the leveraging existing processes and systems.

ROLES AND RESPONSIBILITIES IN THE PROJECT:

• Designed and implemented a scalable IBM Cognos Framework Manager model, integrating multiple global data sources to deliver consistent,

high-performance BI solutions across the enterprise.

• Designed List, Crosstab and Chart reports for the business users based on the FM Model.

• Designed the Cognos Power cube and published the Cube.

• Designed and implemented the Financial dashboards for real-time budget vs. actual comparisons.

• Led the Unit and Integration testing of the FM Model.

• Provided production support of the BI Applications.

• Skill used: IBM Cognos Analytics, Cognos Cube, Oracle9i, MS-SQLServer, Unix Shell Scripting.



Contact this candidate