Sai Swaroop Grandhi
Data Engineer
***************@*****.***
linkedin.com/in/sai-swaroop-grandhi-919176170
PROFESSIONAL SUMMARY:
Experienced Data Engineer with 7+ years of hands-on expertise across financial, pharmaceutical, and retail industries, delivering scalable, secure, and high-performance cloud-based data solutions.
Experienced in building and maintaining batch and real-time data pipelines using AWS Glue, EMR, Kinesis, Lambda, and Kafka (MSK) as well as Azure Data Factory and Synapse Analytics to support analytics, compliance, and operational workloads.
Strong programming expertise in Python (NumPy, Pandas, Boto3), PySpark, Scala, SQL, Java, and Shell scripting for end-to-end data transformation, workflow orchestration, and automation.
Hands-on experience with AWS services including S3, Redshift, RDS, IAM, EC2, VPC, CloudWatch, SQS, SNS, and others for cloud-native data engineering.
Proficient in managing data warehouse and lakehouse environments using Snowflake, Redshift, BigQuery, Synapse, PostgreSQL, and Oracle, with support for analytics and BI visualization through QuickSight, Tableau, and Power BI.
Experienced in working with distributed data processing ecosystems, including Apache Spark, Flink, Hadoop, Hive, Pig, Sqoop, Flume, Oozie, and Zookeeper.
Experienced in writing complex SQL queries, stored procedures, views, and triggers for data transformation, validation, and business logic implementation across multiple RDBMS platforms.
Proficient in working with PostgreSQL, MySQL, Oracle, SQL Server, Amazon Redshift, and Snowflake for OLTP and OLAP use cases.
Skilled in implementing data governance and lineage using AWS Glue Data Catalog, Lake Formation, Unity Catalog, and Azure Purview & Key Vault, ensuring regulatory compliance with Basel III, CCAR, FR Y-14, AML, GxP, and 21 CFR Part 11.
Hands-on with CI/CD tools and DevOps practices, using Git, GitHub Actions, CodePipeline, CodeBuild, Terraform, CloudFormation, Jenkins and Azure DevOps Pipelines for deployment and automation.
Experienced in developing Python modules for API integration, metadata management, cloud SDKs (Boto3, Azure SDK), and custom ETL logic to support scalable data operations.
Proficient in handling NoSQL and semi-structured data using DynamoDB, MongoDB, Cassandra, and HBase, optimizing storage and query performance for unstructured datasets.
Skilled in container-based deployments using Docker and OpenShift, and orchestration of workflows with Apache Airflow for scheduling and monitoring.
Familiar with Informatica PowerCenter, Talend, and Snowflake utilities like SnowSQL and SnowPipe for efficient data ingestion and ETL workflows.
Experienced in supporting data quality, root cause analysis, KPI reporting, and collaborating with cross-functional teams under Agile/Scrum methodologies.
Knowledgeable in applying ML/NLP tools such as Scikit-learn and TensorFlow for intelligent data processing and automation.
Collaborated with BI and reporting teams to expose clean, query-ready data models in SQL databases for downstream tools like Tableau, Power BI, and Quick Sight.
CERTIFICATIONS:
Azure Data Fundamentals (DP-900)
Azure Data Engineer Associate (DP-203)
TECHNICAL SKILLS:
Programming Languages
Python, Scala, Ruby, Java, R
Big Data Technologies
HDFS, MapReduce, YARN, Sqoop, Flume, Oozie, Zookeeper, Hive, Pig, Impala, Kafka, Spark, PySpark, Apache Airflow, Apache Nifi
Cloud Platforms (AWS, Azure)
AWS (S3, Lambda, EMR, EC2, RDS, Redshift, SNS, SQS, IAM, Kinesis, Glue, API Gateway, Route53), Azure (Data Lake, Synapse, Databricks), Google Cloud Platform
Databases & Data Warehousing
Oracle DB, MySQL, PostgreSQL, SQL Server, Snowflake, Amazon RedShift, Synapse Google BigQuery, Teradata, AWS RDS
ETL Tools
Informatica, Talend, Apache Airflow
Data Visualization
Tableau, Power BI
SQL Databases
Oracle DB, Microsoft SQL Server, PostgreSQL, Teradata, Amazon RDS
NoSQL Databases
MongoDB, Cassandra, HBase, Amazon DynamoDB.
Monitoring Tools
Splunk, Nagios, ELK, AWS Cloud watch.
CI/CD & Version Control
Git, Bitbucket, Maven, SBT, GitHub Actions, AWS CodePipeline, AWS CodeBuild, AWS CodeDeploy, Azure DevOps Pipelines, Jenkins, Kubernetes.
Analytical Skills
Data Modeling, Data Quality, Root Cause Analysis, Trend Analysis, Forecasting, Business KPI Reporting
Methodologies
Agile/Scrum, Waterfall
PROFESSIONAL EXPERIENCE:
Client: TD Bank, Cherry Hill Township, NJ May 2023 - Present
Senior Data Engineer
Roles and Responsibilities:
Developed and maintained end-to-end data pipelines supporting Basel III, CCAR, FR Y-14, and AML regulatory compliance reporting using AWS Glue, EMR, and Python/PySpark.
Built scalable ETL/ELT workflows to ingest, transform, and load financial datasets into Amazon Redshift, Snowflake, and RDS (PostgreSQL/MySQL) for regulatory reporting and advanced analytics.
Automated compliance workflows using AWS Glue Workflows, AWS Lambda, and Step Functions, enabling near real-time data processing.
Ensured data quality and schema consistency using AWS Glue Data Catalog and implemented data versioning for regulatory traceability.
Implemented sensitive data discovery and classification with Amazon Macie and encryption strategies using AWS KMS for data security and regulatory compliance.
Managed and optimized MongoDB collections to handle large volumes of unstructured data, ensuring data integrity and high availability.
Tuned Spark and PySpark jobs on EMR for performance optimization on high-volume transactional datasets exceeding hundreds of GBs daily.
Developed modular and reusable Python scripts and Spark transformations for rule-based compliance checks and anomaly detection.
Designed optimized data warehouse schemas in Redshift and Snowflake (SnowPipe, SnowSQL) to accelerate analytical queries and reporting workloads.
Integrated CloudWatch Logs and CloudTrail for auditing data access, pipeline executions, and automated alerting for operational failures.
Implemented version-controlled CI/CD pipeline deployments for Glue and EMR jobs using AWS CodePipeline and CodeBuild.
Developed and maintained ETL mappings using Informatica PowerCenter to support legacy data integration and migration workflows.
Collaborated with data stewards and compliance officers to understand reporting requirements and translated them into technical solutions.
Conducted metadata tagging and lineage tracking across datasets using AWS Glue and AWS Lake Formation.
Built dashboards and metric trackers in QuickSight and Redshift for internal stakeholders to monitor data freshness and pipeline status.
Applied encryption at rest and in transit using KMS(Kubernetes) to ensure secure data handling in accordance with TD Bank’s compliance guidelines.
Participated in governance forums to enforce data access control and tagging standards across the bank’s data lake environment.
Contributed to the migration of legacy compliance workflows to AWS-native solutions, reducing processing time and operational overhead by over 30%.
Documented data flow diagrams, compliance data models, and transformation logic for audit readiness and knowledge transfer.
Built robust data access layers using MongoDB’s aggregation framework to support dynamic querying and fast data retrieval for business applications.
Used Terraform and CloudFormation for infrastructure provisioning and managed Glue/EMR configurations as code.
Engineered unified data pipelines by integrating Databricks (PySpark, Delta Lake) with Snowflake (SnowSQL, SnowPipe), enabling scalable processing, efficient data transformation, and seamless analytics across cloud platforms.
Developed big data solutions using HBase, Hive, Apache Spark, and MapReduce, integrating Sqoop and Flume for efficient data ingestion.
Crafted advanced SQL queries to retrieve, aggregate, and analyze data from relational databases such as Amazon RDS (MySQL, PostgreSQL) and Amazon Redshift.
Use Python with AWS SDK (Boto3) to interact with AWS services programmatically, automate resource management, and implement custom workflows.
Mentored junior engineers on PySpark development best practices, AWS services, and compliance-first data engineering.
Created and maintained interactive Power BI dashboards to visualize key compliance metrics and pipeline performance, enabling stakeholders to make data-driven decisions with real-time insights.
Client: Merck & Co., Rahway, NJ May 2020 to May 2023
AWS Data Engineer
Roles and Responsibilities:
Designed and implemented a real-time manufacturing data monitoring platform for pharmaceutical production lines, ensuring compliance with GxP and 21 CFR Part 11 standards.
Developed robust data ingestion pipelines using Amazon Kinesis and Kafka on MSK to stream high-frequency sensor and machine data.
Integrated industrial IoT devices with AWS IoT Core, enabling secure, low-latency communication between shop-floor sensors and the AWS cloud.
Built scalable real-time data processing applications using AWS Lambda, Apache Flink, and Apache Spark on EMR, Amazon DynamoDB to detect production anomalies and trigger alerts.
Engineered complex event processing workflows to track throughput, deviations, and critical metrics from multiple manufacturing systems in real time.
Created interactive dashboards in Amazon QuickSight for operations, quality, and compliance teams to visualize line performance and ensure adherence to production standards.
Managed structured and time-series data storage in Amazon S3, applying lifecycle policies for cost-effective and audit-compliant archival.
Utilized Python and PySpark to implement transformation logic, feature extraction, and business rules on streaming datasets.
Established automated data validation, cleansing, and enrichment processes to ensure reliability and accuracy of manufacturing data.
Enabled secure data access using IAM policies, KMS(Kubernetes) encryption, and VPC endpoints to enforce enterprise-grade security.
Monitored streaming jobs and infrastructure using Amazon CloudWatch, configuring alarms and dashboards for proactive issue detection.
Created schema registries and managed topic configurations within Kafka to support multi-line, multi-product data ingestion.
Applied AWS Glue Data Catalog for metadata management and schema tracking across raw and processed data layers.
Automated infrastructure deployment and resource provisioning using Terraform and AWS CloudFormation, ensuring consistency across environments.
Utilized Python libraries such as Pandas and PySpark to perform data wrangling, statistical analysis, and batch processing of large datasets.
Developed and scheduled complex multi-stage data pipelines using Apache Airflow and Apache Oozie, orchestrating Hive, Pig, and MapReduce jobs to streamline large-scale batch processing.
Used Hadoop MapReduce for batch processing and integrated with Hive and Pig for querying and transforming large data sets efficiently.
Participated in periodic audits and compliance assessments, providing end-to-end data lineage, versioning, and traceability documentation.
Tuned Flink jobs and EMR clusters for high availability, low latency, and efficient resource utilization under peak manufacturing loads.
Collaborated with plant engineers, QA, and compliance teams to align data monitoring with production KPIs and regulatory requirements.
Implemented fault-tolerant and self-healing data pipelines capable of recovering from network and sensor failures without data loss.
Conducted knowledge-sharing sessions and trained junior engineers on streaming data architecture and AWS-native best practices.
Implemented CI/CD pipelines using AWS CodePipeline and CodeBuild to automate testing and deployment of data workflows, ensuring rapid and reliable delivery of data engineering solutions.
Utilized Git with AWS CodeCommit for version control and collaborative development, enabling efficient branching, code reviews, and seamless integration within CI/CD workflows.
Containerized data processing applications using Docker, enabling consistent development, deployment, and scaling across cloud environments.
Implemented CI/CD pipelines using AWS Code Pipeline, Code Build, and Jenkins to automate testing and deployment of data workflows, ensuring rapid and reliable delivery of data engineering solutions.
Created interactive dashboards in Amazon QuickSight and Tableau for operations, quality, and compliance teams to visualize line performance and ensure production standard adherence.
Client: Costco, Issaquah, WA March 2018 to April 2020
Data Engineer
Roles and Responsibilities:
Designed and implemented scalable streaming data pipelines using Amazon Kinesis and Kafka on MSK to ingest billions of real-time POS transactions across global stores.
Developed Python and Scala applications for real-time data transformation, validation, and enrichment using Apache Flink and PySpark on Amazon EMR.
Built AWS Lambda functions using Python (BOTO3), Azure Functions and Node.js for low-latency event processing and real-time event triggers.
Integrated with Amazon DynamoDB for real-time product pricing and inventory lookups, achieving sub-millisecond response times.
Used Amazon Redshift and PostgreSQL, materialized views for historical trend analysis, customer purchase behavior analytics, and ad-hoc reporting.
Stored and managed raw and processed data in Amazon S3 with lifecycle policies to enable cost-effective, audit-compliant data archival.
Created automated data quality checks, schema validations, and retry logic using custom Python scripts and Spark UDFs.
Implemented data serialization using Avro and Parquet, improving query performance, and reducing storage costs in S3 and Redshift.
Designed and implemented Azure Data Factory pipelines to orchestrate ingestion from 10+ sources into ADLS, complementing AWS pipelines for hybrid-cloud integration.
Built data transformation jobs and batch ETL pipelines using Apache Spark on EMR, scheduled via Step Functions.
Applied AWS Glue for schema cataloging and version control of streaming and batch data.
Deployed and orchestrated containerized data processing workloads using Kubernetes (EKS), enabling scalable, fault-tolerant, and portable deployment of streaming and batch services.
Employed Redis (ElastiCache) to cache frequently accessed metadata and pricing data for enhanced speed.
Enforced enterprise-grade security using IAM, KMS, VPC, Azure Active Directory (AAD), RBAC, and Key Vault to ensure encryption, access control, and regulatory compliance.
Implemented hybrid observability with AWS CloudWatch/CloudTrail and Azure Monitor/Log Analytics, enabling proactive system health monitoring and audit compliance.
Developed PySpark-based transformation jobs in Azure Databricks, leveraging Delta Lake and Unity Catalog for ACID compliance, governance, and optimized query performance.
Collaborated with cross-functional teams to define data contracts, interface schemas, and SLAs for POS, pricing, and inventory pipelines.
Automated CI/CD pipelines using AWS Code Pipeline/Code Build, Azure DevOps, and IaC tools (Terraform, Bicep) for Lambda, EMR, Databricks, and Kinesis deployments.
Created and maintained Power BI dashboards to visualize key business metrics and pipeline performance, facilitating data-driven decision-making across merchandising and operations teams.
EDUCATION:
Masters in information technology from the University of Cincinnati, OH. 2024
Information Systems, Continuous Improvement, Computer Science, Apache Airflow