Post Job Free
Sign in

Data Engineer Cloud

Location:
Herndon, VA, 20170
Posted:
June 18, 2025

Contact this candidate

Resume:

BIRESHWAR GHOSH

Ph: +1-571-***-****

LinkedIn: Bireshwar-raj-ghosh

Cloud Data Engineer

Executive Summary

Experienced Data Engineer with 14+ years of expertise in data architecture, pipeline development, and cloud data platforms across various industries like Banking, Healthcare, Retail, Insurance, and Telecom. Proficient in building scalable data pipelines, optimizing ETL workflows, and implementing real-time data processing solutions.

Expertise in AWS Data Services: Possess proficiency in cloud-based data solutions, including AWS Redshift, Athena, Glue, S3, and Snowflake, as well as extensive experience with database technologies such as PostgreSQL, MySQL, Oracle, Db2, MongoDB, and DynamoDB.

Demonstrated experience in data ingestion, transformation, and storage optimization utilizing Apache Spark, Airflow, Kafka, and SQL-based ETL frameworks.

Comprehensive knowledge of Data Warehousing, Data Governance, and Security Best Practices, ensuring data integrity, compliance, and performance optimization.

Exhibit strong programming skills in Python, Typescript, NodeJS, SQL, and Bash scripting, aimed at optimizing and automating data workflows, ETL processes, and machine learning pipelines.

Acquired hands-on experience in Infrastructure-as-Code using Terraform and CloudFormation to automate cloud data environments.

Experienced in monitoring and observability utilizing CloudWatch, Prometheus, Grafana, and ELK Stack to ensure the reliability of data pipelines and facilitate real-time alerting.

Proven capability to design, deploy, and manage large-scale data architectures, implement real-time streaming solutions, including Kinesis, Kafka, and Apache Flink, and enhance data accessibility and analytics for AI/ML and business intelligence applications.

A results-driven collaborator who aims to streamline data engineering processes, enhance operational efficiency, and promote innovation in data-driven decision-making.

Proficient in security and compliance management utilizing HashiCorp Vault, AWS Security Hub, and compliance frameworks such as SOC 2, HIPAA, and ISO 27001.

Possesses comprehensive expertise in AWS data lake architecture, leveraging AWS Lake Formation, S3, and Glue Catalog for effective metadata management and governance.

Skilled in optimizing query performance and implementing cost-reduction strategies for AWS Redshift, Athena, and Snowflake by applying partitioning, indexing, and compression techniques.

Experienced in data modeling and constructing ETL pipelines, employing serverless data processing architectures that utilize AWS Lambda, Step Functions, Glue ETL, and Event Bridge for scalable, event-driven data workflows.

Strong DataOps and CI/CD foundation for data pipelines, utilizing GitHub Actions, AWS CodePipeline, and Jenkins to automate deployment and testing processes.

Expertise in designing multi-region, highly available data architectures on AWS, thereby ensuring disaster recovery and fault tolerance.

Developed a real-time fraud detection pipeline with Managed Apache Flink, Apache Kafka, AWS Kinesis Data Streams, and Amazon Kinesis Data Firehose, decreasing fraudulent transactions by 30% and improving fraud response time from hours to seconds.

Developed a real-time fraud detection pipeline with Managed Apache Flink, Apache Kafka, AWS Kinesis Data Streams, and Amazon Kinesis Data Firehose, decreasing fraudulent transactions by 30% and improving fraud response time from hours to seconds.

Implemented CDC (Change Data Capture) pipelines using AWS DMS and Debezium integrated with Kafka and Kinesis Data Streams, enabling real-time data replication and decreasing data lag from 24 hours to under 5 minutes.

Optimized a large-scale credit risk modeling pipeline using Apache Spark on Amazon EMR and Redshift Spectrum, shortening model training time from 12 hours to 2 hours and improving risk assessment accuracy.

Practical experience in integrating AI/ML workflows with AWS services such as SageMaker, Bedrock, and Feature Store to support advanced analytics capabilities.

Technical Skills:

Languages:

PL/SQL, T-SQL, YAML, JSON, Python, Typescript, Nodejs

OS:

Linux 8.x/7.x, AIX 8.1/7.1/5.3/4.3, Solaris, OS/400, and Windows Servers.

Tools:

AWS CLI &Management Console, Cloud9, workbench,db visualizer,pgadmin, SQL developer, Data Studio, etc.)., Qrep, Putty, Gits, Gradle, Maven, Chef, Ansible, Docker, Kubernetes, JENKINS, Optim, Toad, Server Manager, Ubuntu, Quest Central, DB DIFF, Veritas Volume Manager, DB2Workbench, Visual Explain, Health Center, Design Adviser, Stored Procedure Builder, DPROPR, Configuration advisor, DB2 Utilities, DB2 Connect, INAV, I-Doctor, TAATOOL, MIMIX, Control Center, Query Patroller, Business Objects, Cognos, Crystal Reports, Erwin, BMC, SQL*Loader, etc.).

RDBMS:

PostgreSQL, Aurora, MySQL, DB2 V11.5 & all Earlier Versions DB2/400, DB2, Oracle /9i/10g.

Data Warehouse

Redshift, Snowflake, Data Lake.

NoSQL DB:

DocumentDB, DynamoDB, ElastiCache, Neptune, QLDB.

Cloud Service Platform:

Amazon Web Services

Application

Servers:

IBM Web Sphere and Web Logic

Web Servers:

IIS, Apache HTTP Server, Nginx, Apache Tomcat.

AI &ML

SegeMaker, Bedrock, MLflow, TFX.

Collaboration & Productivity

Jira, Confluence, Git,Trello.

Education:

College: SCSU, Minnesota.

Degree: Bachelor of Elective Studies.

Major: Information Technology & Electronics, 2009

CERTIFICATION:

IBM Certified Database Administrator – Db2 9.1

IBM Certified Database Administrator – Db2 10.1

AWS Fundamental of ML & AI

AWS Redshift

AWS Certified Data Engineer (In Progress)

PROFESSIONAL WORK EXPERIENCE:

BMO, March’22 – OCT’24

Role: Cloud Data Engineer

Tasks and Responsibilities:

Designed and deployed a scalable ETL pipeline on AWS using Glue, Lambda, and Redshift, reducing data processing time by 60% and enhancing analytics performance.

Developed a real-time fraud detection pipeline with Apache Kafka, AWS Kinesis, and Flink, decreasing fraudulent transactions by 30% and improving fraud response time from hours to seconds.

Optimized AWS Redshift workloads through columnar storage, distribution keys, and automated vacuuming, leading to a 40% cost reduction and 2x improvement in query performance.

Migrated a monolithic on-premises data warehouse to an AWS data lake architecture (S3, Glue, and Athena), enhancing data accessibility and cutting infrastructure costs by 50%.

Implemented CDC (Change Data Capture) pipelines using AWS DMS and Debezium, enabling real-time data replication and decreasing data lag from 24 hours to under 5 minutes.

Designed a real-time Change Data Capture (CDC) pipeline utilizing AWS DMS and Debezium for near-instant data synchronization from DB2 to AWS RDS.

Led the migration of a DB2-based financial data warehouse to AWS Redshift and RDS, boosting query performance by 40% and lowering costs.

Developed risk assessment frameworks for DB2 migrations, pinpointing potential bottlenecks and proposing strategies for workload optimization.

Automated schema and stored procedure conversions from DB2 to AWS RDS using the AWS Schema Conversion Tool (SCT).

Improved DB2 database indexing strategies and SQL performance tuning to meet AWS RDS best practices, enhancing transaction speeds by 30%.

Provided leadership in defining business use cases for DB2 migration, assisting stakeholders in understanding cloud-native capabilities and cost advantages.

Configured AWS EC2 for DB2 RDS deployments, ensuring high availability and fault tolerance across multi-region architectures.

Implemented Change Data Capture (CDC) for real-time trade reconciliation using AWS DMS, Debezium, and Kafka, minimizing data inconsistency issues by 95%.

Developed a serverless data ingestion framework utilizing AWS Step Functions, Lambda, and S3, processing over 10TB of data daily while ensuring high availability.

Created an AWS-based market data ingestion system employing Athena, Redshift, and Airflow, processing terabytes of financial data daily, reducing query execution time from 15 minutes to under 30 seconds.

Enhanced monitoring and observability by integrating AWS CloudWatch, Prometheus, and Grafana, decreasing data pipeline failure resolution time from 4 hours to 30 minutes.

Implemented machine learning feature pipelines with AWS SageMaker Feature Store, increasing model training efficiency by 35% for predictive analytics.

Automated data quality checks using Great Expectations and AWS Lambda, achieving 99.9% data accuracy in financial reporting systems.

Automated regulatory reporting for financial compliance (SOX, Basel II, and PCI DSS) via AWS Glue, Lambda, and Step Functions, reducing manual effort by 70% while ensuring 100% regulatory compliance.

Optimized a large-scale credit risk modeling pipeline using Spark on EMR and Redshift Spectrum, shortening model training time from 12 hours to 2 hours and improving risk assessment accuracy.

Maintained the change management dashboard, reviewing and assigning resources for Chief Information Officers (CIOs), emphasizing identifying root causes for implementing permanent fixes and preventing recurrence.

Security and Access Management: Ensured effective platform-level logging and monitoring in compliance with IAM policies, including monitoring alerts and auditing actions and changes within the environment.

Monitored security events and addressed them by adhering to established processes while introducing enhancements and automation as necessary.

Cloud Shared Services Management and Support: Support shared platform services, including KMS, S3, and SFTP.

Database Management: Ensured secure database provisioning and configuration, including vulnerability remediation.

Implemented anomaly detection in financial transactions with AWS SageMaker and Bedrock AI, flagging 99.5% of suspicious transactions and preventing fraudulent activities before settlement.

Cloud Deployments & Onboarding: Supported cloud deployment processes and troubleshot deployment-related issues.

CareFirst - Federal Employee Program Operations Center, April’19- March ‘22

Role: Database & Infrastructure engineer

Tasks and Responsibilities:

Managed over 800 databases, handling responsibilities for migration, deployment, upgrades, troubleshooting, and integrating daily requirements for the PEGA Application team and several other applications.

Conducted POC assessments for DB2 migrations to AWS RDS, identifying feature gaps, risks, and mitigation strategies.

Implemented online (Change Data Capture - CDC) and offline migration strategies for DB2 workloads, reducing data replication latency and improving cutover efficiency.

Optimized schema migrations, stored procedures, triggers, and index conversions to enhance database performance in AWS RDS DB2 environments.

Led the migration of DB2 databases (v11.1.4 - 11.5.1) to AWS RDS using AWS Database Migration Service (DMS) and Schema Conversion Tool (SCT), ensuring a seamless transition with minimal downtime.

Developed Terraform-based Infrastructure-as-Code (IaC) solutions for provisioning AWS RDS instances and optimizing security configurations.

Collaborated with cross-functional teams, including DBAs, cloud engineers, and application developers, to ensure a smooth integration of DB2 workloads with AWS services.

Developed an ETL pipeline for financial transaction processing using AWS Glue, Redshift, and S3, reducing data ingestion latency by 80% and enabling real-time analytics for compliance reporting.

Created a Sandbox solution to help developer and infrastructure teams securely evaluate, explore, and build proof-of-concepts (POCs) using AWS services and third-party applications that operate on AWS.

Responsible for designing and implementing AWS-based solutions that meet the organization's needs, leveraging various AWS services and technologies.

Responsible for managing and preserving AWS infrastructure components.

Responsible for implementing and managing security measures on AWS, including identity and access management (IAM), security groups, and encryption to protect resources and data.

Responsible for monitoring AWS resources using tools like CloudWatch, identifying performance issues, and troubleshooting infrastructure or application-related problems.

Responsible for automating tasks and processes using AWS tools and services such as CloudFormation, AWS CLI, or Infrastructure as Code (IaC) frameworks like Terraform or AWS CDK.

Responsible for managing data on AWS, including data storage, retrieval, and processing using services like S3, RDS, Redshift, Athena, Glue, or EMR. This may involve data migration, ETL processes, and building data pipelines.

Responsible for implementing AWS disaster recovery and backup strategies, ensuring data redundancy, and establishing backup and restore processes.

Collaborated closely with developers, architects, and other stakeholders to understand requirements, provide technical guidance, and collaborate on solution design implementation.

Using CDK, I provisioned multiple clusters and databases for ongoing development projects.

Integrated Amazon Redshift and Amazon QuickSight into VMware Cloud on AWS, using AWS DMS to transfer data into Amazon Redshift for more straightforward analysis.

Automated the periodic data archival process for an Amazon Redshift database. The cold data is instantly available and can be joined with existing datasets in the Amazon Redshift cluster using Amazon Redshift Spectrum.

Migrated data from traditional relational database systems, file systems, and NAS shares to RDS, Aurora, and Redshift.

Transferred data from APIs to the AWS data lake (S3) and relational databases, including Amazon RDS, Aurora, and Redshift.

Reviewed and assessed current infrastructure, migrating DB2 11.1.4-11.5.1 to PostgreSQL using AWS DMS.

Utilized DMS and adhered to best practices for cloud conformity checks while migrating multiple medium and large databases.

Employed AWS Glue to build a data warehouse that organizes, cleanses, validates, and formats data.

Conducted ETL jobs across multiple VPCs using a dedicated AWS Glue VPC.

Leveraged Athena integrated with AWS Glue to create a central metadata store that executed interactive ad hoc SQL queries against data on Amazon S3 without managing any infrastructure or clusters.

Also utilized Athena to generate reports and explore data with Power BI or other SQL clients connected via JDBC or ODBC drivers.

Stayed up to date with AWS services and best practices: I continuously learn about the latest AWS services, features, and best practices through training, certifications, and industry resources.

Provided Level 2, 24/7 support for AWS-hosted workloads.

Delivered comprehensive support and executed cloud deployment services.

Configured and maintained AWS servers, storage, and networking systems to ensure infrastructure scalability, reliability, and security.

Managed and maintained AWS-based databases, including Amazon RDS, Aurora, DynamoDB, and others, focusing on performance optimization, backup and recovery strategies, and data security and privacy.

Developed an ETL pipeline for financial transaction processing using Apache Spark on AWS Glue, Redshift, and S3, reducing data ingestion latency by 80% and enabling real-time analytics for compliance reporting.

Architected and managed the AWS data lake (S3), utilizing AWS Glue (Spark ETL), Amazon EMR (Spark, Hive), and Athena for large-scale data processing, transformation, and ad-hoc querying.

Transferred high-volume data from APIs and traditional systems to the AWS data lake (S3) and relational databases using Amazon Kinesis Data Firehose and AWS Glue (Spark).

Migrated data from traditional relational database systems... to RDS, Aurora, and Redshift using AWS DMS and Kinesis Data Firehose for streaming workloads.

Offered technical guidance and support to development teams and technical staff, recommending best practices, troubleshooting issues, and providing training and documentation.

eSystems-inc, Arkansas, Jun’15-April’19

Role: Database & Infrastructure engineer

Tasks and Responsibilities:

Multiple environments (dev, test, prod) were created using AWS Elastic Beanstalk, and Blue/Green deployment was performed. For prod deployment, Immutable & Rolling with additional batch deployment options was used.

Deployed multiple applications using AWS Elastic Beanstalk, Code Commit, Code Pipeline, Code Build, and Code Deploy in an SDLC methodologies environment.

Worked on other AWS services including Cloud Front, Cloud Watch, RDS, S3, Route53, SNS, SQS, Cloud Trail, Elastic Beanstalk, Elastic Cache, and Code Deploy.

Involved in reviewing and assessing current infrastructure and migrating DB2 11.1.4 to PostgreSQL.

Created new servers on AWS using EC2 instances and S3 and configured security groups and elastic IPs for the cases.

Using IAM, I created Users, groups, roles, and custom policies according to the client’s project requirements. I also enabled MFA for users and helped them set up AWS CLI, Putty, and Workspace.

Created KMS, Encryption SDK, and SSM and set up Access Report, using IAM Access Analyzer to identify and remove unintended external or unused permissions.

Implemented messaging and integration patterns using AWS SQS, SNS & Kinesis.

Set up Monitoring, Troubleshooting, Audit & Performance by using CloudWatch, X-Ray, CloudTrail, and Config.

Documented best practices on - data security, integrity, quality, governance, and consistency. Security, Backup, Snapshots, Performance, Metrics, Troubleshooting & Scalability.

Documented AWS Security best practices including KMS, Encryption SDK, SSM Parameter Store, and IAM Policies which include Data.

Participated in technical analysis and implementation of BI and DW solutions using tools like Power BI, Tableau, Amazon Quick Sight,

Engage point, Maryland &Missouri, June’13-June’15

Database & Infrastructure engineer

Tasks and Responsibilities:

Gather new system requirements and storage space availability for upcoming projects and work closely with systems and storage engineers to perform these tasks proactively.

Created documentation for infrastructure changes and operational standards.

Test and implement new features of DB2 Enterprise Server Edition if required.

Responsible for all the new code deployment for CURAM on Production.

Build a federated database reporting Server.

JPMorgan Chase, Columbus, Aug’12-’June’13

Database Administrator

Tasks and Responsibilities:

DB2 Enterprise Server Edition V9.7 installed on Linux/AIX operating systems and Data Server Client.

Build new instances and databases for new KANA project tests and production environments.

Hitachi Global Storage Technologies, San Jose, Jan’12-Aug ‘12

Database Administrator

Tasks and Responsibilities:

I was the (SME) for DB2 Upgrade project from V8.2 to V9.7 fp5

Created best practices to follow daily, weekly, and monthly to keep the database Healthy.

IBM (Merck Account), PA, Aug’11- Jan ’12

Database Administrator

Tasks and Responsibilities:

Installed DB2 Enterprise Server Edition V9.7 on Linux and created new instances and database.

Coordinated with vendor support to apply database patches.

Perform daily db2 operations according to the business requirements.

Involved in the process of implementing SQL Replication.

IBM (WDW Account), FL, Apr’09-July 11

Tasks and Responsibilities:

Disney DB2 assets managed include production (400 TB), DR (500 TB), HA (500 TB), and subordinate environments used for testing and staging activities.

Worked on multiple data refreshes and application rollouts for the new Property Management System.

Involved with the Vision Mimix replication team and went through 7006 MIMIX Admin training to perform various tasks on OS V6R1 and OS V5R4.

Demonstrated leadership and technical competency in over 24 months on the account.

Rent-A-Center, TX, Jan’08-Mar’09

Database Administrator

Tasks and Responsibilities:

Write design specifications, security plans, data migration plans, and Architecture guidelines.

Data migration from DB2 on i5/OS. V.5.3.0, to Oracle 10g R2.

Data formatting, transformation of DDL/SQL in DB2 on i5/OS. V.5.3.0, to Oracle 10g R2.



Contact this candidate