SHREYA B
SR GCP DATA ENGINEER
682-***-**** ****************@*****.***
PROFESSIONAL SUMMARY
Senior Data Engineer with 10+ years of experience specializing in enterprise-scale cloud data solutions across GCP, Azure, and AWS platforms. Proven expertise in architecting modern data lakes, designing real-time streaming pipelines, and implementing cloud-native data warehousing solutions with a focus on performance optimization and scalability. Deep expertise in Snowflake, including streams, tasks, secure views, and external tables integration with cloud storage. Advanced implementation of real-time data pipelines using Apache Beam, Cloud Pub/Sub, Azure Event Hubs, Apache Kafka, and AWS Kinesis.
Technical expertise encompasses designing and implementing end-to-end data infrastructure including automated ETL/ELT workflows, real-time streaming architectures, and advanced analytics platforms. Expertise in implementing data lakes using cloud-native services across GCP (BigQuery, Cloud Storage, Dataflow), Azure (Synapse Analytics, Data Lake Storage, Databricks), and AWS (Redshift, S3, Glue). Extensive experience in Snowflake implementation, particularly focusing on secure multi-cluster environments, dynamic data masking, and zero-copy cloning for enterprise workloads. Extensive experience with Hadoop ecosystem, Apache Spark, Delta Lake, and distributed computing frameworks. Automation using Terraform, Azure DevOps, and Cloud Formation for repeatable deployments. Demonstrated proficiency in building hybrid cloud solutions that seamlessly integrate on-premises systems while maintaining robust security and governance frameworks.
TECHNICAL SKILLS
Cloud Platforms: GCP (BigQuery, Dataflow, Pub/Sub), Azure (Synapse Analytics, Data Factory, Databricks), AWS (Redshift, Glue, Lambda)
Data Warehousing: Snowflake, Azure Synapse Analytics, Amazon Redshift
Stream Processing: Apache Beam, Azure Event Hubs, Amazon Kinesis, Apache Kafka
Big Data Technologies: Hadoop, Spark, Hive, MapReduce
Orchestration Tools: Cloud Composer, Azure Logic Apps, AWS Step Functions
Languages & Tools: Python, Java, SQL
BI & Analytics: Looker, Power BI
WORK EXPERIENCE
Sr GCP Data Engineer Amazon Dallas, TX Feb 2022- Present
Architected and implemented enterprise data lake solutions using AWS S3 and Amazon Redshift Spectrum to consolidate traffic sensor data, enabling real-time analytics for traffic pattern optimization.
Developed automated ETL pipelines using AWS Glue and Apache Spark to process road condition data, implementing streaming pipelines for near real-time incident reporting.
Developed scripts using PySpark to push the data from GCP to the third-party vendors using their API framework.
Leading the effort for migration of Legacy-system to Microsoft Azure cloud-based solution. Re-designing the Legacy Application solutions with minimal changes to run on cloud platform.
Developed automated ETL pipelines using Cloud Data Fusion and Dataproc to process grid sensor data, implementing delta loading patterns and data quality validation frameworks.
Designed and developed a batch ETL pipeline using Apache Spark Java API to process large datasets from HDFS and Cloud Storage.
Worked on creating data ingestion processes to maintain Global Data Lake on the GCP cloud and BigQuery.
Implemented real-time data processing using Apache Kafka and Storm to ingest and analyze streaming data from mobile applications, enhancing insights into user activities and app performance.
Highly Involved into Data Architecture and Application Design using Cloud and Big Data solutions on AWS, Microsoft Azure.
Building multiple programs with Python and Apache beam and execute it in cloud Dataflow to run Data validation between raw source file and Bigquery tables
Utilized Python and Java to write custom data transformation scripts, ensuring high efficiency and scalability in data processing pipelines for mobile app analytics.
Leveraged Jira for project management and issue tracking, facilitating collaboration between data engineering teams and stakeholders, and ensuring timely delivery of data solutions.
Designed and implemented Snowflake external tables to query data directly from AWS S3, enabling seamless data lake integration.
Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.
Configured Apache Kafka Producers and Consumers using Python and Java to handle high-throughput event streaming.
Implemented row-access security in Snowflake using secure views and role-based access control (RBAC) for sensitive transportation data.
Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery
Created data quality frameworks using AWS Glue DataBrew for validating and monitoring road sensor data accuracy across state highway systems.
Developed an RBAC-based API gateway for secure data access.
Implemented data security controls using AWS IAM and AWS VPC Security Groups to ensure compliance with state data protection requirements.
Sr GCP Data Engineer CPS Energy San Antonio, TX Mar 2020 – Jan 2022
Designed and optimized BigQuery data warehouse schemas for power outage analytics through efficient partitioning strategies and materialized views.
Implemented stream processing solutions using Cloud Pub/Sub and Dataflow to monitor real-time power consumption patterns, enabling proactive load balancing across the grid.
Developed custom MapReduce jobs in Java to process and analyze large-scale datasets, enabling advanced analytics capabilities for marketing and customer segmentation projects.
Designed and implemented RBAC policies for a multi-user data warehouse.
Built automated data integration workflows using Cloud Functions and Cloud Composer Apache Kafka to synchronize customer billing data between on-premises PostgreSQL and Cloud SQL.
Leveraged cloud and GPU computing technologies for automated machine learning and analytics pipelines, such as AWS, GCP
Developed a real-time data pipeline using Azure Databricks and Apache Spark to process streaming data from Azure Event Hubs.
Utilized Python and Java to write custom data transformation scripts, ensuring high efficiency and scalability in data processing pipelines for mobile app analytics.
Integrated Apache Kafka with Amazon S3, HDFS, and Delta Lake for long-term storage and analytics.
Integrated JIRA with Confluence for documentation of ETL workflows and design decisions.
Developed ELT processes from the files from abinitio, google sheets in GCP with compute being data prep, dataproc (pyspark) and bigquery.
Converted SAS code to python/spark-based jobs in cloud dataproc/bigquery in GCP.
Implemented data security best practices in Azure, including encryption, identity management, and role-based access control (RBAC) to ensure secure access to sensitive enterprise data.
Designed and implemented ETL pipelines using Python and Apache Spark in Azure Databricks, processing large-scale datasets from various sources. Optimized job performance, reducing processing time.
Created BigQuery authorized views for row level security or exposing the data to other teams.
Developed custom Azure Functions to handle complex data transformation logic and business rules for utility billing calculations.
Developed automated data validation and error-checking procedures using Python scripts to ensure the accuracy and integrity of data ingested from mobile app users.
Created automated data quality monitoring systems using Cloud Monitoring and AI-based anomaly detection models to detect inconsistencies in meter readings.
Works on AWS Glue allows Data Engineers to build ETL pipelines that automatically extract data from various sources, transform it into a usable format, and load it into data storage or a data warehouse (e.g., Amazon Redshift).
Used apache airflow in GCP composer environment to build data pipelines and used various airflow operators like bash operator, Hadoop operators and python callable and branching operators.
Implemented a data Lakehouse using Azure Databricks and Delta Lake to unify structured and semi-structured retail data.
Migrate SQL Server and Oracle database to Microsoft Azure Cloud.
Created automated documentation generation systems for data lineage and impact analysis using Cloud Data Catalog.
Sr Azure Data Engineer Futran Solutions, Edison NJ May 2018 – Feb 2020
Designed and implemented scalable data pipelines using Azure Data Factory for efficient extraction, transformation, and loading (ETL) of large datasets across different systems.
Developed and deployed real-time data streaming solutions using Azure Event Hubs and Azure Stream Analytics, enabling fast processing and analytics for high-velocity data.
Integrated Azure Logic Apps and Azure Functions to create event-driven data workflows, enabling automated responses to data events and triggering necessary actions for downstream processes.
Managed and maintained Azure Key Vault to store and securely manage sensitive data such as API keys, connection strings, and secrets used in data engineering workflows.
Developed Azure Data Catalog solutions to provide metadata management, data lineage, and discovery capabilities, ensuring that data assets are easily discoverable and well-documented.
Optimized the use of Azure HDInsight to process and analyze large-scale datasets with Hadoop and Spark, allowing for complex data analytics and batch processing in the cloud.
Implemented data lake architecture on Azure Data Lake Storage for storing structured, semi-structured, and unstructured data, enabling seamless data access for analytics and machine learning.
Integrated Azure Databricks with Spark for distributed data processing, enhancing the speed and efficiency of data transformation for large-scale analytics workloads.
Automated end-to-end CI/CD workflows using Azure DevOps, facilitating continuous integration and deployment of data pipeline updates and enhancements in an Agile environment.
Conducted data quality checks and validation processes on ingested datasets, ensuring that only accurate, clean data entered the data warehouse and analytics pipelines.
AWS Data Engineer Revature Reston, VA Jan 2017 – Apr 2018
Designed and developed scalable and efficient data pipelines using AWS Glue and AWS Lambda, enabling seamless ingestion, transformation, and loading (ETL) of data across various systems, improving data accessibility and processing speed.
Implemented real-time data streaming solutions with Amazon Kinesis, allowing for continuous ingestion of mobile application data and real-time analysis to track user activities and app performance metrics.
Developed and optimized ETL/ELT workflows to migrate large-scale datasets between AWS Redshift, AWS RDS, and Hadoop ecosystems, ensuring high performance, cost efficiency, and data consistency.
Built robust and automated data transformation processes using AWS Step Functions, creating efficient workflows to handle complex data processing and enabling faster decision-making based on clean data.
Leveraged Amazon Athena to enable ad-hoc querying of data stored in AWS S3, empowering the business and product teams with immediate insights for mobile application performance and user experience improvements.
Designed and implemented serverless solutions using AWS Lambda for automated data processing tasks, reducing the need for dedicated servers and minimizing operational costs.
Utilized Amazon DynamoDB to store and manage high-velocity, low-latency data from mobile applications, ensuring scalability and performance for mobile analytics in real-time.
Set up Amazon CloudWatch for real-time monitoring and logging of AWS resources, ensuring proactive identification and resolution of issues related to data processing pipelines and cloud infrastructure.
Applied DevOps best practices for CI/CD pipelines, utilizing AWS CodePipeline and CloudFormation for seamless integration and deployment of data engineering applications, improving code deployment speed and reducing errors.
Ensured best practices for data security and compliance by leveraging AWS IAM to manage user access and permissions across all AWS services, ensuring data integrity and privacy.
Worked with cross-functional teams to design and implement AWS-based data solutions that optimized workflows for mobile application development and analytics, improving product decision-making capabilities.
Data Engineer Xeliumtech Solutions India Jun 2013 - Dec 2015
Designed, developed, and maintained ETL pipelines to efficiently process mobile application data, ensuring high throughput and low latency using Hadoop, MapReduce, and Apache Hive.
Built and maintained a scalable data warehouse infrastructure with Amazon Redshift to support large-scale analytics on mobile app performance metrics and user behaviors.
Deployed and managed data infrastructure on AWS, utilizing services such as S3, EC2, and EMR for cost-effective storage and processing of mobile application data.
Integrated third-party data sources (e.g., analytics platforms, social media APIs) with internal mobile application data systems, ensuring accurate and timely synchronization for cross-platform analysis.
Developed data models and optimized SQL queries to facilitate high-speed reporting and data analysis, enhancing the speed of mobile app performance analysis.
Conducted performance tuning and query optimization on large datasets, using tools like Hadoop Pig, Hive, and SQL to ensure efficient data storage and fast retrieval.