Data Platform Cloud

Location:

Wesley Chapel, FL

Posted:

April 22, 2025

Contact this candidate

Resume:

GURUDAS PRADHAN

Tampa, FL ***** 224-***-**** *******.*******@*****.***

https://www.linkedin.com/in/gurudas-pradhan-1835b2140/

Summary:

Gurudas is a Senior Bigdata & Cloud Data Platform Architect, has been in the Information Technology Industry for more than 16 years with specialization in Hadoop and Multi-Cloud technologies. Experienced in implementing Cloud solutions using various AWS services and skilled in designing, building and managing Hadoop, AWS-EMR, AWS-Redshift clusters across diverse environments. Adept at configuring security for Bigdata and other AWS services, introducing innovative tools and engines and successfully deploying complex Bigdata-driven solutions. Experienced architecting and scaling enterprise-grade data platforms, building high-performing analytics teams, and fostering a culture of data-driven decision-making. Skilled in leveraging cloud technologies and advanced analytics to optimize operations, fuel innovation and accelerate growth.

Key Accomplishments:

• Led the migration of 500+ applications to a cloud-native architecture and built an unified AWS data platform by streamlining operations, 3x reporting performance and enabling AI-driven insights.

• Led the successful transitioned from on-premise Hadoop clusters (CDP) to cloud (AWS), optimizing infrastructure and reducing operational costs while improving system reliability.

• Designed the architecture and implementation of the NRT Use case to consume the messages(originating from machine devices 10gb per second) using MSK and processed those real-time messages through PySpark, for the Machine Data (MDP Project).

• Implemented Dynamic Data Masking (DDM) feature on the Redshift Tables for the Payroll and TnL business to mask the sensitive data based on user roles and specific business criteria.

• Architected Redshift producer-consumer models with data sharing to enable seamless, high-performance cross-cluster access at enterprise scale—eliminating manual data movement.

• Led the performance analysis of Redshift clusters, uncovering key insights from query execution time, resource utilization, and concurrency metrics to unveil critical patterns across peak vs. normal loads and quarter-close periods.

• Designed and Configured Redshift WLM queues to optimize resource allocation, defining slot prioritization for critical workloads and ad-hoc queries.

• Automated periodic VACUUM and ANALYZE operations on Redshift tables, integrating performance monitoring to proactively resolve long-running queries and maintain optimal query execution.

• Implemented High Availability for Redshift clusters via Multi-Region Replication and multi-AZ deployment, ensuring data resilience, minimal downtime, and uninterrupted business operations.

• Designed Redshift Workspace Schema, Tables, Views, User group and Roles for the migration.

• Implemented the monitoring mechanisms with manual scripts, Cloud Watch rules, New Relic Alerts were in place by enabling timely intervention and resolution of potential issues.

• Implemented automation of storing and retrieving log and alerts using AWS resources such as lambda, SNS, DynamoDB and CloudWatch.

• Led the end-to-end architecture and migration of legacy Hadoop ecosystems (HDP 2.6.3 & CDH 6.3) to the modern, enterprise-grade CDP 7.1.4 platform, driving enhanced scalability, performance, and maintainability.

• Implemented data encryption, cross-cluster replication, and enhanced security measures using tools such as Ranger KMS and Hadoop Data Encryption.

• Led the design and implementation of “Yarn Capacity Scheduler” by configuring the Yarn Queues for various LOBs for the optimal usage of the CDP Cluster.

• Led the deployment of AWS components like Lambda, SQS,SNS, EC2, API Gateway, EMR, MSK, etc. using Terraform and Jenkins pipelines.

• Hands-on expertise across a broad suite of AWS services—including EC2, S3, VPC, Redshift, EMR, Snowflake, EBS, Auto Scaling, IAM, and CloudWatch—driving scalable, secure, and high-performance cloud solutions.

• Led the design and implementation of GEHC's Data Marketplace, empowering stakeholders to seamlessly discover, access, and collaborate on shared data assets for operational and analytical excellence.

• Engineered IAM policies and roles, configured S3 buckets and policies, set up VPC endpoints, API Gateway, and Security Groups to ensure robust, secure cloud architecture.

• Designed snowflake workspace schemas and external stages and deploying the snowflake Tables, Views in the schema during the data migration process.

• Architected and lead migration of on-prem to cloud based solutions and designed S3, EFS, layouts to alien with enterprise workflow.

• Extensive hands on experience on Hadoop eco-system components like HDFS, MapReduce, Hive, Pig, Hbase, Kafka, Sqoop, Oozie, Ranger, Spark and Zookeeper.

• Designed and built the HDF Clusters - Apache NiFi (3.0) and well versed of creating NiFi Processors and Process Groups for data ingestion.

• Led the deployment process of the NiFi .nar files across different environments.

• Collaborated with cross-functional teams to ensure data quality and availability and provided solutions to Architecture Review Board.

• Developed comprehensive documentation including business requirements, technical design, and upgrade procedures to support compliance and audit readiness for Hadoop and cloud platforms.

• Consistently maintained a customer-centric approach, resolving issues promptly, and ensuring customer satisfaction.

• Proven expertise across diverse domains including Finance, Healthcare, Life Sciences, Travel, and Manufacturing.

EDUCATION

Master in Computer Application

Licenses & certifications

* AWS Certified Solutions Architect – Associate

* AWS Certified Machine Learning - Specialty

* Hortonworks Certified Hadoop Administrator (HDPCA Professional)

* A PMP Certified Professional

* Completed the Big Data Hadoop Foundations certification from IBM – Big Data University

* Completed the Introduction to R certification from IBM – DataCamp

* Snowflake Certified Associate

* IBM Certified Designer for Cognos8 BI Reports

* IBM Certified Cognos8 BI Administrator

* IBM Certified Cognos8 BI Metadata Model Developer

TECHNICAL SKILLS

Cloud Environment Distribution: AWS

Services: EMR, S3, EC2, Glue, Glacier, Cloud Watch, Cloud Trail, SNS, SQS, DynamoDB, Kinesis, MSK, Athena, IAM, KMS, Lambda, Step Function, Sagemaker, Redshift, RDS, EventBridge, API Gateway

Cloud Datawarehouse: Redshift Analytics Warehouse, DBeaver SQL Client, Snowflake Datawarehouse (Version-3.30.2), Snow SQL CLI Client

Bigdata Environment Distribution: Cloudera (CDP-7.1.4, CDH – 6.3), Hortonworks (HDP-2.6)

Hadoop Ecosystem: HDFS, MapReduce, Yarn, Hive, HBase, Spark, Tez, Impala, Sqoop, Oozie, Zookeeper, Kafka, Storm, Knox, Ranger, Solr, Ambari, Cloudera Manager, NiFi, Striim

Reporting Tools: PowerBI, IBM-Cognos8.x

Databases: Oracle9.2, Oracle10g, Sybase12, DB2UDB8.2, SQL Server(2005,2008), Teradata12

Programming Languages: C++, C, Python, XML, SQL

Scripting Languages: Shell Script, Perl Script, Python

Version Control Tool: CVS, RCS, SCCS, VSS, IBM Rational ClearCase, MS - Shared Point Portal Server, GITHub

Operating Systems: HP-UX (9.04, 10.20, 11.0), Digital Unix 2000 (5.0D) (Thru 64), Sun-Solaris, (8), IBM-AIX-5.2, IBM-AIX-5.3, Linux

Compilers & Utilities: GNU Compilers (gcc-3.3.2, g++), IBM-Visual age compiler (xlc_r7, xlC_r7), Forte in Solaris-8, Makefile

PROJECT PROFILE:

Project Title ODP (One Data Platform)

Duration Jan’2023 – Till Date

Location Tampa, Florida, USA

Hardware/Operating Systems Linux (RHEL -7)

Cloud Technologies AWS, EMR, S3. EC2, Glue, Cloud Watch, SNS, SQS, DynamoDB, MSK, Athena, AWS IAM, AWS-Lambda, AWS-Redshift, RDS, Step Functions, Event Bridge

Reporting Tools/Databases/Other Utilities Power BI, Shell scripting, Python scripting, DBeaver, MS – SQL Server 2008, MySQL, Teradata, Oracle10g

PROJECT DESCRIPTION:

ODP aims to enable a data driven organization and optimize the cost of ownership by building a unified data platform for GEHC. ODP brings together strong enterprise data architecture & governance, near real-time data processing, next gen security architecture built on cloud, a contemporary big data foundation allowing for scalability and as-a-service offerings.

The following projects are undertaken on the ODP environment.

Finance Revenue Forecasting: The Objective of the finance revenue forecasting initiative is to design and implement a standardized finance revenue forecasting process that is centrally owned, automated and digitalized, providing the right level of details across the organization and for forecasting purposes through the PBI dashboard. The predictive revenue forecasting model is implemented mainly for Ultrasound, PCS and PDx suit of business.

Data Marketplace Initiative: The Data Marketplace is a capability that aims to deliver “Amazon like user experience” to help all data stake holders within GEHC organization quickly find, access, use and collaborate on a common set of data assets & products to address their operational and analytical needs

ROLES AND RESPONSIBILITIES IN THE PROJECT: