Post Job Free
Sign in

Data Engineer Senior

Location:
Irving, TX, 75060
Posted:
October 02, 2024

Contact this candidate

Resume:

SENIOR DATA ENGINEER

Name: Pallavi Priya K +1-469-***-**** Email: ******************@*****.***

PROFESSIONAL SUMMARY:

●Senior Data Engineer with 10 years’ experience in designing, implementing, and optimizing end-to-end data engineering workflows to support data-driven decision-making and business growth using GCP Cloud Platform and BigQuery in the Banking, Retail, Healthcare, and E-commerce domains.

●Worked with Flask framework for designing REST API's using Python language Flask Framework.

●Experience in layers of Hadoop Framework - Storage (HDFS), Analysis (Pig and Hive), Engineering (Jobs and Workflows), extending the functionality by writing custom UDFs.

●Developed spark applications in python (PySpark) on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.

●Developed Python application for Google Analytics aggregation and reporting and used Django configuration to manage URLs and application parameters.

●Created several types of data visualizations using Python and Tableau, created modules for spark streaming in data into Data Lake using Spark. Conducted statistical analysis on healthcare data using python and various tools.

●Experienced in development and support, demonstrating in-depth knowledge of Oracle, SQL, PL/SQL, and T-SQL queries.

●Have Extensive Experience in IT data analytics projects, Hands on experience in migrating on premise ETLs to Google Cloud Platform (GCP) using cloud native tools such as BIGquery, Cloud Data Proc, Google Cloud Storage, Compose.

●Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.

●Possesses extensive experience in seamlessly integrating Informatica Data Quality (IDQ) with Informatica PowerCenter to ensure robust data quality management throughout the integration process.

●Extensive experience in scripting and debugging on Windows environments, with a foundational understanding of container orchestration technologies like Kubernetes, Docker, and AKS.

●Extensive working experience with big data ecosystem - Hadoop (HDFS, MapReduce, Yarn), Spark, Kafka, Hive, Impala, HBase, Sqoop, Pig, Airflow, Oozie, Zookeeper, Ambari, Flume, NiFi.

●Comfortable working with cloud platforms such as Docker, Kubernetes, and OpenShift.

●Experienced in working with various s IDE's using PyCharm, PyScripter, PyStudio, PyDev, IDLE, NetBeans and Sublime Text.

●Worked with Cloudera and Hortonworks distributions. Hands on experience on Spark with Scala, PySpark.

●Experience working in SQL Server and My SQL database. good experience working with Parquet files and parsing, validating JSON format files. Hands on experience in setting up workflow using Apache Airflow and Oozie workflow engine for managing and scheduling Hadoop jobs.

●Sound knowledge in developing highly scalable and resilient Restful APIs, ETL solutions, and third-party platform integrations as part of Enterprise Site platform.

●Experience in using various IDEs like PyCharm, IntelliJ, and repositories SVN and Git version control systems.

●Leveraged GCP Kubernetes Engine to orchestrate and manage complex data pipelines and application deployments. Demonstrated ability to implement and manage CI/CD pipelines for data services, enhancing the speed and efficiency of data processing workflows.

Technical Skills:

Programming & Scripting

Python, Scala, Java, SAS, R, SQL(SQL tuning), MATLAB, HiveQL, PowerShell and BASH Scripting

Python Libraries

Pandas, Scipy, NumPy, Scikit-Learn, Stats Models, Matplotlib, Plotly, Seaborn, TensorFlow, PyTorch, PySpark

Query languages

SQL, T-SQL/SQL, PL/SQL, HiveQL, GraphQL

Big Data Ecosystem

HDFS, YARN, MapReduce, Sqoop, Hive, Hadoop, Oozie, Pig, Spark, Zookeeper, Cloudera Manager, Kafka, Flume, Nifi, Connect, Airflow(orchestration), Spark, Stream Sets, Kafka connect, confluent

GCP

Compute Engine, BigQuery, Kubernetes Engine, Dataproc, Data Flow, Pub/Sub.

AWS

EMR, EC2, EBS, RDS, S3, Athena, Glue, Elasticsearch, Lambda, SQS, DynamoDB, Redshift, PostgreSQL

Azure

Data Lake, Blob Storage, Azure Data Factory, SQL Database, SQL Data Warehouse, Cosmos DB, Azure Active Directory, DataBricks.

ETL Tools

Informatica, Talend, SSIS, SSRS, SSAS, ER Studio, Tableau, Atscale, Power BI, Arcadia

Datawarehouse

Snowflake, Amazon Redshift, Google BigQuery, Microsoft Azure Synapse Analytics, Oracle Autonomous Data Warehouse, IBM Db2 Warehouse, Teradata, SAP BW/4HANA, Informatica PowerCenter, Apache Hive.

SQL Databases and ORM

Oracle, MySQL, Teradata, Postgres, Django ORM, SQL Alchemy

NoSQL Database

HBase, Cassandra, PostgreSQL Dynamo DB, Cosmos DB, Mongo DB

CI/CD/Build Tools

Jenkins, Maven, Ant

Version Control

Git, SVN

BI Reporting Tools

Tableau, Power BI, Looker

Operating Systems

Linux, Unix, Windows 10, Windows 8, Windows 7, Windows Server 2008/2003, Mac OS

Professional Experience:

Morgan Stanely, New York, NY May 2022 – Present Senior Data Engineer (Big Data)

Responsibilities:

●Participated in the analysis, design, and development phase of the Software Development Lifecycle (SDLC). Experience in the agile environment, used to have sprint planning meeting, scrum calls and retro meetings for every sprint, Used JIRA for the project management and GitHub for the version control.

●Developed and implemented data engineering solutions to analyze healthcare data, including electronic health records (EHR), claims data, and medical research data.

●Collaborated with cross-functional teams, including clinical researchers and data scientists, to identify data requirements and develop data models for healthcare analytics projects.

●Created and managed private catalogs in GCR, facilitating controlled access to approved container images across different projects or organizations within GCP.

●Leveraged the power of Google Data Studio, BigQuery BI Engine, and Looker to manipulate and analyze large datasets, driving the development of insightful visualization dashboards.

●Experience architecting large-scale distributed data systems with a focus on scalability, reliability, security, performance, and flexibility.

●Leveraged Apache Beam, Spark RDD, Data frame APIs, and Spark SQL on Google Cloud Dataproc to perform operations on the transformation layer, applying various aggregations provided by the Spark framework.

●Utilized Google Cloud Pub/Sub for real-time streaming of healthcare data, enabling immediate processing and analysis.

●Migrating an entire oracle database to BigQuery and using of power bi for reporting. Build data pipelines in airflow in GCP for ETL related jobs using different airflow orchestration. Experienced in GCP Dataproc, GCS, Cloud functions, BigQuery.

●Implemented data lineage tracking within Google Cloud Data Catalog, providing transparency and traceability for data assets and transformations.

●Experienced in Google cloud components, Google container builders and CP client libraries and cloud SDK'S.

●Developed Pyspark programs and created the date frames and worked on transformations.

●Experienced in working Services like Data Lake, Data Lake Analytics, SQL tuning Database, Synapse, Data Bricks, Data factory, Logic Apps and SQL Data warehouse and CP services Like BigQuery, Dataproc, Pub sub etc.

●Experienced in utilizing Scala's features like Akka for building concurrent, fault-tolerant applications, and optimizing performance through parallel processing.

●Worked on analysing the data using Pyspark,Hive, bases on ETL mappings.

●Experienced in Implementing Continuous Delivery pipeline with Maven, Ant, Jenkins, and GCP. Experienced in Hadoop 2.6.4 and Hadoop 3.1.5.

●Developed multi cloud strategies in better using GCP (for its PAAS).

●Experienced in migrating legacy systems into GCP technologies, storing data files in Google Cloud S3 Buckets daily basis, Using DataProc, BigQuery to develop and maintain GCP cloud base.

●Employed techniques like image signing, content trust, or image provenance for enhanced security and integrity of container images stored in GCR.

●Worked closely with cross-functional teams to ensure seamless integration of the Google Cloud Data Catalog with other GCP services and tools.

●Integrated machine learning models with healthcare data for predictive analytics, supporting clinical decision-making and patient care.

●Developed Pyspark script to merge static and dynamic files and cleanse the data, Worked with Different business units to drive the design & development strategy.

●Developed and deployed scalable big data processing pipelines using Google Cloud Dataflow, resulting in a 30% improvement in data processing efficiency.

●Compared Self hosted Hadoop with respect to GCs Data Proc, and explored Big Table (managed HBase) use cases, performance evolution.

●Built data pipelines in airflow in CP for ETL related jobs using different airflow operators.

●Adept at using Scala's strong type system and pattern matching techniques to ensure code reliability and improve error handling.

●Ensured GCR repositories adhere to industry-specific compliance standards (HIPAA, GDPR, etc.) by implementing necessary controls and audit trails.

●Designed and implemented complex data processing pipelines using Google Dataflow, enabling efficient ETL and stream processing.

●Designed and deployed multi-tier applications on GCP, utilizing services like Compute Engine, Cloud Storage, Cloud SQL, Pub/Sub, and Cloud IAM for high availability and fault tolerance.

●Implemented machine learning models using PySpark's MLlib library, enabling predictive analytics and advanced data insights.

●Proficient in optimizing BigQuery queries and data ingestion processes for improved performance and cost efficiency.

●Extensive experience in integrating and processing healthcare-specific data formats such as HL7, FHIR, and DICOM for interoperable healthcare data solutions.

●Implemented advanced encryption and access control measures to ensure the confidentiality and integrity of patient data.

●Experience in designing and implementing data warehousing solutions using Hive, enabling efficient storage, retrieval, and analysis of structured and semi-structured data.

●Automated ETL processes for big data workloads using Google Cloud Composer and Apache Airflow, reducing data integration time by 40%.

●Built an ETL framework using Apache Airflow, Apache Sqoop, and Apache Spark (PySpark) for data migration to GCP, streamlining the data transfer process.

●Incorporated Pub/Sub's message retention policies to ensure the durability and reliability of event streams, leading to more consistent and dependable data processing.

●Utilized custom metadata and labels in GCR to annotate and categorize container images, facilitating efficient search, organization, and metadata-driven deployments.

●Leveraged Spark Scala functions on Google Cloud Dataproc to derive real-time insights and generate reports by mining large datasets. Utilized Spark Context, Spark SQL, and Spark Streaming to efficiently process and analyze extensive data sets.

●Designed and developed Airflow orchestration (DAGs) to automate end-to-end data workflows, including data ingestion, transformation, loading, and analytics processes, orchestrating tasks across various GCP services such as BigQuery, Dataflow, Dataproc, and Pub/Sub.

●Experience in leveraging Hadoop for processing and analyzing big data sets efficiently, employing MapReduce and other frameworks within the ecosystem.

●Expertise in using PySpark's SQL capabilities for querying and manipulating large-scale datasets, ensuring efficient data retrieval and transformation.

●Leveraged Google Cloud's Data Modeler tool to craft meticulous and efficient designs for relational databases and dimensional data warehouses.

●Collaborated on Apache Spark data processing projects, handling data from RDBMS and multiple streaming sources, and utilizing Python to develop Spark applications on GCP's distributed data processing services.

●Experienced in transferring and migrating data to GCS from on-premises systems or other cloud platforms using tools like gsutil, Storage Transfer Service, or Transfer Appliance.

●Aggregated log data from various servers using Apache Kafka, making it accessible for downstream systems for improved data analytics.

●Creating Jenkins jobs and distributing load on Jenkins server by configuring Jenkins nodes which will enable parallel builds.

●Designed, scheduled, and monitored workflows with Cloud Composer, automating data processing and ETL tasks efficiently.

●Implemented logging and auditing mechanisms within GCR, generating compliance reports and audit trails for image access, modifications, and deletions.

●Utilized Google Container Registry for secure storage and management of Docker container images, facilitating seamless deployment across GCP services.

●Enabled real-time data analytics by setting up streaming data pipelines using Pub/Sub and Dataflow, providing timely business insights.

●Developed customized ETL solutions and real-time data ingestion pipelines using PySpark and Shell Scripting on Google Cloud Dataproc, facilitating efficient data movement in and out of the Hadoop cluster.

●Proficient in using functional programming libraries like Cats, ScalaZ, and Shapeless to implement advanced functional patterns and abstractions in Scala code.

●Skilled in optimizing Hadoop jobs, tuning configurations, and improving resource utilization to enhance overall performance and throughput.

●Orchestrated hybrid cloud environments by integrating GCR with on-premises container registries, enabling seamless image replication and synchronization.

●Contributed to the development of BigQuery BI Engine, creating a semantic layer for consumption tools like Google Data Studio and Looker, enabling efficient and intuitive data exploration.

●Proficient in SQL tuning techniques to enhance database performance and reduce query execution time, including index optimization, query rewriting, and query plan analysis.

●Developed real-time streaming applications using PySpark, Apache Flink, Kafka, and Hive on distributed Hadoop clusters on GCP for efficient and timely data processing.

●Ensured big data security and compliance on GCP by implementing best practices in data encryption, IAM policies, and VPC Service Controls.

●Skilled in integrating GCS with other Google Cloud Platform (GCP) services like BigQuery, Dataflow, or AI Platform, enabling seamless data ingestion, processing, and analysis.

●Implemented dependency management and task scheduling logic within Airflow orchestration(DAGs), ensuring proper execution order and handling of task dependencies, retries, and error handling scenarios to maintain data pipeline reliability and integrity.

●Experience in employing testing frameworks like ScalaTest and Specs2 to conduct comprehensive unit, integration, and functional testing for Scala applications.

●Proficient in Hive, utilizing its SQL-like query language (HiveQL) to perform data querying, transformation, and analysis on Hadoop distributed file systems.

●Demonstrated expertise in configuring Google Compute Engine, Google Cloud Storage, Google Cloud Load Balancing, Security Groups, and other GCP services for robust network security.

●Optimized big data workflows on GCP by tuning performance settings and leveraging GCP’s managed services, leading to cost savings and improved processing times.

●Proficient in setting up and configuring Jenkins pipelines for automated build, test, and deployment workflows, enabling efficient CI/CD processes.

●Designed and implemented a scalable big data lake on Google Cloud Storage, enabling efficient storage and retrieval of unstructured data.

●Proficient in Hadoop ecosystem components (HDFS, MapReduce), capable of developing and optimizing distributed computing applications for handling large-scale data.

●Developed an application on Google Cloud App Engine to compare parties between source systems and Master Data Management (MDM) system, identifying missing parties and mismatched attributes to ensure data accuracy and consistency.

●Conducted advanced big data analytics using Google BigQuery, providing actionable insights that drove key business decisions.

●Proficient in enabling versioning for GCS objects, managing object lifecycles, and implementing retention policies to control data archival and deletion.

●Developed, implemented, and meticulously maintained CI/CD pipelines for cutting-edge Big Data solutions on Google Cloud Build, collaborating closely with development teams to ensure seamless code deployment to the production environment.

●Developed custom User-Defined Functions (UDFs) in PySpark for specialized data transformations and manipulations, enhancing data processing capabilities.

Discover Financial Service, Chicago, IL. Oct 2020 – Apr 2022

Data Engineer (Applications)

Responsibilities:

●Participated in the analysis, design, and development phase of the Software Development Lifecycle (SDLC). Experience in the agile environment, used to have sprint planning meeting, scrum calls and retro meetings for every sprint, Used JIRA for the project management and GitHub for the version control.

●Developed and implemented data engineering solutions for recommendation engines, leveraging customer data on GCP to provide personalized product recommendations and enhance customer engagement.

●Collaborated with cross-functional teams, including marketing and sales, to incorporate recommendation algorithms into e-commerce platforms hosted on GCP, increasing conversion rates and sales revenue.

●Configured custom notifications and triggers within GCR for events such as new image uploads or deletions, enabling proactive monitoring and workflow automation.

●Designed and implemented complex data architectures on Google Cloud Platform, ensuring scalability, high availability, and optimized performance.

●Developed rest API's using python with flask and Django framework and done the integration of various data sources including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets, and Text files.

●Experience in monitoring GCS usage metrics, setting up logging and monitoring with Cloud Monitoring, Cloud Logging, or integrating with third-party monitoring tools for performance tracking.

●Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine Learning use cases under Spark ML and MLlib.

●Adept at schema design in Hive, optimizing table structures, partitioning, and bucketing strategies for improved query performance.

●Developed map-reduce jobs using Scala on Dataproc to compile program code into bytecode for JVM, enabling efficient data processing. Implemented Spark jobs using Python in a test environment and utilized BigQuery SQL for seamless data querying.

●Crafted and launched custom functions, stored procedures, and triggers for Firestore. Developed distinct modules in microservices to gather application statistics for comprehensive visualization.

●Proficiency in integrating GCS with on-premises systems or hybrid cloud environments, enabling seamless data synchronization and leveraging tools like Cloud Storage Gateway for efficient data transfer.

●Proficient in designing, developing, and maintaining complex data workflows using Apache Airflow on GCP.

●Designed and developed advanced ETL pipelines using Dataflow for efficient data ingestion, transformation, and streaming analytics.

●Proficient in optimizing BigQuery queries, data ingestion, and partitioning strategies for enhanced performance and cost efficiency.

●Implemented complex data transformations with Cloud Functions, enhancing the value and utility of raw data for downstream analytics and business intelligence (BI) tools.

●Integrated Kubernetes Engine with other GCP services like Pub/Sub and Dataflow, enabling a seamless, end-to-end data processing pipeline.

●Showcased expertise in developing scalable and reliable data processing systems utilizing Pub/Sub and crafted shell scripts for scheduled execution of BigQuery jobs via API access.

●Utilized Google's Container Analysis service to perform vulnerability scanning on container images stored in GCR, ensuring images comply with security best practices.

●Employed GCP's Data Catalog for cataloging and preparing data for ETL, simplifying the management of data across diverse sources.

●Designed a robust infrastructure with Cloud Data Fusion, adept at extracting data from various sources, setting the stage for an efficient reporting environment.

●Proficiency in developing RESTful APIs using Scala and frameworks like Akka HTTP or Play Framework, ensuring efficient communication between applications.

●Experience in leveraging GCS as a data lake for large-scale analytics, integrating with tools like BigQuery or Dataflow for real-time analytics or batch processing of structured and unstructured data.

●Skilled in optimizing costs by implementing data lifecycle management policies, leveraging storage classes based on access patterns, and estimating costs using GCP pricing tools.

●Extensive experience in integrating Airflow with various GCP services like BigQuery, Cloud Storage, Dataflow, and Pub/Sub for orchestrating data pipelines.

●Implemented Cloud Pub/Sub for real-time messaging and event-driven data processing, enabling immediate insights and actions.

●Implemented Cloud DLP for sensitive data discovery and protection, ensuring compliance with data privacy regulations.

●Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.

●Leveraged Cloud Data Fusion for streamlined data integration from diverse sources, ensuring seamless data processing and analytics.

●Profile structured, unstructured, and semi-structured data across various sources to identify patterns in data and Implement data quality metrics using necessary queries or python scripts based on source.

●Adept at developing microservices architectures using Scala and frameworks like Lagom or Finch, enabling modular and scalable application designs.

●Skilled in tuning Hive queries, leveraging indexing, partitioning, and query optimization techniques to enhance query execution speed and efficiency.

●Utilized Cloud Dataprep for automated data cleaning and transformation, ensuring high data quality for analytics and reporting.

●Integrated AI Platform for deploying and managing machine learning models, enabling predictive analytics and intelligent automation.

●Leveraged Dataproc for efficient processing of large-scale data analytics tasks, ensuring timely insights and decision-making.

●Conducted performance monitoring and troubleshooting of BigQuery queries, ensuring optimal system performance for personalization and segmentation tasks.

●Designed and implemented ETL pipelines to extract, transform, and load data from diverse sources into a centralized data warehouse.

●Adept at implementing redundancy and disaster recovery strategies in GCS using regional or multi-regional storage configurations for high availability.

●Proficiency in using Scala's Futures, Promises, and Reactive Streams for asynchronous programming, facilitating non-blocking and responsive applications.

●Developed microservices using Cloud Functions to make API calls for third party vendors. Involved in supporting cloud instances running Linux and Windows on GCP, with experience in VPC and Cloud IAM.

●Skilled in creating Directed Acyclic Graphs (DAGs) in Airflow orchestration for efficient scheduling, monitoring, and managing data workflows on GCP.

●Experience in using Scala as a primary language for Apache Spark, enabling efficient data processing, manipulation, and analysis at scale.

●Designed and executed data transformation pipelines in Dataflow, applying complex data manipulation and enrichment operations.

●Designed and developed data pipelines using Data Fusion, enabling efficient data processing, transformation, and integration.

●Utilized Google Cloud SDK to automate backups of ephemeral data-stores to Cloud Storage and create snapshots for mission-critical production servers.

●Implemented comprehensive data cataloging strategies using Cloud Data Catalog for efficient data discovery, lineage tracking, and data governance.

●Generated SQL tuning and PL/SQL scripts to install, create and drop database objects, including tables, views, primary keys, indexes, sequences.

●Virtualized the servers using Docker for the test environments and dev-environments needs, also configuration automation using Docker containers.

●Experience in creating Docker Containers leveraging existing Linux Containers and AM's in addition to creating Docker Containers from scratch on both Linux and Windows servers. Container clustering with Docker Swan Mesos/Kubernetes.

●Implemented multi-tenancy strategies in GCR by utilizing separate repositories or IAM controls, ensuring resource isolation and access segregation for different teams or projects.

●Constructed ETL Mapping with Talend Integration Suite to extract data from source, apply transformations, and load data into the target database, enhancing data flow and quality.

●Deployed Spark jobs on GCP's Dataproc for comprehensive data tasks, including cleansing, validation, standardization, and transformation in line with use case requirements.

●Orchestrated the creation of data pipeline flows, scheduled jobs programmatically (DAG) in the Cloud Composer workflow engine, ensuring impeccable support for the scheduled tasks.

●Developed custom operators and hooks in Airflow for seamless interaction with GCP services, enhancing workflow capabilities and reusability.

●Designed and deployed APIs using Cloud Endpoints, enabling secure and controlled access to data and services.

●Created customized and interactive dashboards using Data Studio, enabling stakeholders to derive actionable insights from data.

●Designed and managed complex workflows using Cloud Composer, orchestrating data processes and integrations with precision.

●Configured custom notifications and webhooks in GCR, triggering events in external systems based on image-related activities for streamlined workflows.

●Utilized Cloud Spanner for building horizontally scalable, globally distributed databases, ensuring consistent high-performance data access.

●Designed and implemented data lakes on GCP, incorporating Cloud Storage and ensuring efficient storage and retrieval of data.

●Skilled in creating documentation outlining GCS architectures, configurations, and best practices for team reference and knowledge sharing.

●Experience in implementing real-time stream processing using Hadoop ecosystem components like Apache Kafka and Spark Streaming, handling continuous data streams effectively.

●Skilled in monitoring Hive query performance, identifying and resolving performance bottlenecks for optimized data retrieval and processing.

●Implemented IAM policies and access controls to ensure secure and controlled data access within GCP environments.

●Created continuous integration system using Ant, Jenkins, Puppet full automation, Continuous Integration, faster and flawless deployments.

●Installed and administered GIT source code tool and ensured the reliability of the application as well as designed the branching strategies for GIT.

●Worked with GCP's GKE (Google Kubernetes Engine) for Docker-based cluster management. Created Jenkins jobs and distributed load on Jenkins server by configuring Jenkins nodes which enabled parallel builds.

●Proficient in optimizing Hadoop cluster performance by fine-tuning configurations, managing resource allocation, and monitoring cluster health.

●Familiarity with Scala libraries like Breeze and MLlib, leveraging them for implementing machine learning algorithms and statistical analysis.

●Proficient in creating and customizing Dataproc clusters with specific configurations tailored to workload requirements, optimizing performance and resource utilization.

●Optimized Airflow workflows on GCP by implementing best practices, improving performance, and minimizing execution time for data processing.

●Design and implementation of ETL (Extract, Transform, Load) pipelines using Hive, integrating with other Hadoop ecosystem tools for seamless data flow.

●Automated Data Ingestion (DI) and Data Loading (DL) scripts using Python & Beam, enhancing operational efficiency and data processing speed.

●Integrated GCR into CI/CD pipelines, automating image builds and deployments with services like Cloud Build or Jenkins, streamlining the software development lifecycle.

Macy's - Jersey City, NJ Nov 2019 – Sep 2020

Python Data Engineer

Responsibilities:

●Developed and implemented data engineering solutions for customer data management, ensuring accurate and comprehensive customer profiles and preferences.

●Designed and maintained data pipelines for integrating customer data from various sources, including transaction systems and external vendors.

●Collaborated with cross-functional teams, including marketing and sales, to leverage customer data for targeted campaigns and personalized customer experiences.

●Led the design and implementation of advanced data architecture strategies on Google Cloud Platform, ensuring scalability, high availability, and cost-effectiveness.

●Adept at configuring GCS for multi-regional redundancy and geo-replication, ensuring data durability and high availability across different geographic regions.

●Configured geo-replication across regions for GCR repositories, ensuring disaster recovery capabilities and minimizing downtime in case of regional outages.

●Implemented optimization techniques for BigQuery, resulting in a 30% reduction in query execution times and a 20% reduction in data processing costs.

●Designed and implemented



Contact this candidate