Name:Gopi Bellamkonda
Gmail: ***********@*****.***
Phone:302-***-****
LinkedIn:https://www.linkedin.com/in/bellamkonda-gopi-342042245
Professional Summary:
• Over 10 years of experience in data engineering, leveraging diverse technologies and platforms to deliver robust data solutions that meet business requirements.
• Utilized AWS Services including EC2, S3, Autoscaling, and Glue to build and manage scalable and reliable data infrastructure for various applications.
• Implemented data pipelines using AWS Redshift, DynamoDB, and AWS Data Pipelines to ensure efficient data flow and processing across different systems, reducing processing time by 25%.
• Developed and maintained ETL processes using Talend, DataStage, and QualityStage, facilitating seamless data integration and transformation.
• Leveraged expertise in AI and ML to design and implement predictive models and advanced analytics solutions, utilizing tools such as TensorFlow, PyTorch, and AWS SageMaker. Applied machine learning algorithms to drive data-driven decision-making, enhance predictive accuracy, and optimize business processes across diverse project
• Performed big data processing using Hadoop (HDFS, MapReduce), Spark, Pig, and Sqoop, enabling large-scale data analysis and reporting, increasing reporting speed by 35%.
• Leveraged AWS and Terraform to architect and deploy secure, scalable cloud solutions. Proficient in network troubleshooting, global load balancing, and identity management using Microsoft AD and Azure AD. Skilled in implementing role-based access controls and managing service quotas to optimize cloud resource utilization and maintain compliance with security standards.
• Designed and optimized data warehouses with Snowflake, Redshift, and Google BigQuery, improving query performance and data storage efficiency.
• Implemented and managed Apache Airflow workflows on Google Cloud Platform (GCP), orchestrating complex ETL processes by integrating with GCP services like BigQuery, Cloud Storage, and Dataflow to ensure seamless data processing and automation.
• Proficient in leveraging Google Cloud services including DataProc, DataFlow, Pub/Sub, Cloud Storage, Cloud Composer, and Firestore to design and implement scalable data solutions. Demonstrated expertise in real-time data processing, ETL automation, and efficient data management, resulting in optimized data pipelines and enhanced data-driven decision-making capabilities.
• Managed and analyzed data in UNIX Shell Scripting, Python, and Scala, automating data workflows and enhancing operational efficiency.
• Utilized Azure Data Factory, Databricks, and Data Lake to develop and manage scalable data solutions on the Azure platform.
• Azure Data Engineering Expertise: Demonstrated proficiency in leveraging Azure Cloud services, DBT, and Python to automate data ingestion and transformation processes, ensuring efficient data workflows and high data quality. Skilled in writing complex SQL queries and integrating diverse data sources.
• Configured and maintained databases including PostgreSQL, MySQL, MS-SQL, Oracle, and DB2, ensuring data integrity and availability.
• Implemented data security and governance using IAM Security, VPC Configuration, and Data Catalog, maintaining compliance and protecting sensitive data.
• Designed and managed complex data integration and transformation pipelines using Azure Data Factory, optimizing data flow and quality within a healthcare environment to ensure accurate and efficient data processing.
• Created interactive dashboards and visualizations using Tableau, Power BI, and DataStudio, providing stakeholders with actionable insights.
• Automated deployment and monitoring using Jenkins, Azure DevOps, and Git, ensuring continuous integration and delivery of data applications, enhancing deployment efficiency by 40%
Technical Skills:
Category
Skills
Cloud Platforms
AWS (EC2, S3, Autoscaling, Glue, Trino, EMR, Redshift, DynamoDB, Elasticsearch, Data Pipelines, EKS, Lambda, SageMaker, Athena, Kinesis, Airflow), GCP (BigQuery, DataProc, DataFlow, Pub/Sub, Cloud SQL, Cloud Storage, Cloud Architecture,Composer, Firestore, Databricks, Airflow), Azure (Data Factory, Data Lake, Databricks)
Data Warehousing
Snowflake, Redshift, BigQuery, DynamoDB, MySQL, MS-SQL, Oracle, DB2, Cloud SQL
Big Data Technologies
Hadoop (HDFS, MapReduce), Spark, Databricks, Talend, Impala, Hive, Pig, Sqoop, HBase, Oozie, Cloudera, Flume, Pyspark
ETL Tools
Talend, DataStage, QualityStage, Matillion, SSIS, SSAS, Informatica 6.1
Programming Languages
Python, Scala, UNIX Shell Scripting, Java, Java( Springboot, Hibernate), JMS, Bash
Database Tools
SQL, PL/SQL, PostgreSQL, MongoDB, Cassandra, TOAD, Hbase,Couchbase, Micosoft
DevOps Tools
Jenkins, Azure DevOps, Git, Bitbucket, Maven, Jira
BI & Analytics
Tableau, Power BI, DataStudio, Jupyter Notebook, SAS, SPSS, Excel
Data Security & Governance
IAM Security, VPC Configuration, Data Catalog, Glue Catalog
Project Management
Jira
Version Control
Git, Bitbucket
Middleware
WebLogic
Operating Systems
UNIX, Linux
Monitoring & Logging
Elasticsearch, CloudWatch
Networking & Security
SSH, VPN Google-Client
Data Orchestration
Oozie, Apache Kafka
Data Transfer Services
Service Data Transfer, Cloud Storage Transfer Service
Statistical Analysis Tools
R, SAS, SPSS
Data Integration
Data Flux, Flat Files, Federated Queries
Reporting Tools
SSRS, Tableau, Power BI
CI/CD Tools
Jenkins, Azure DevOps, Git, Bitbucket, Maven, Docker, Kubernetes
Top of Form
Professional Experience
Abbvie, Vernon hills,il
Big Data/Snowflake/AWS Data Engineer Jan 2022.to Present
Responsibilities:
• Managed AWS S3 buckets for storing massive datasets, optimizing data retrieval and cost efficiency.
• Utilized Amazon Athena for ad-hoc queries, enabling quick data analysis without complex ETL processes.
• Developed scalable data pipelines using AWS Glue and AWS Data Pipelines to process and distribute data across systems.
• Leveraged EC2 instances for deploying and scaling applications, utilizing Autoscaling to enhance performance by 35%.
• Implemented and managed Customer Data Platforms (CDP) using AWS services, such as Amazon Redshift and AWS Glue, to centralize customer data, improve segmentation, and enhance targeted marketing strategies
• Designed and implemented FHIR and EDI data integration solutions in AWS, leveraging AWS InterOp capabilities to ensure seamless healthcare data exchange, compliance, and interoperability across multiple systems
• Utilized AWS services to enhance supply chain operations by implementing end-to-end data pipelines and data lakes.
• Designed and implemented scalable workflows using AWS Step Functions for orchestration, integrated with AWS EventBridge for event-driven architectures, and utilized SNS and SQS for messaging and queue management to ensure reliable and efficient data processing and event handling.
• Designed and implemented event-driven architectures using AWS EventBridge to facilitate seamless communication between microservices and serverless applications, enabling real-time data processing and enhancing system scalability and responsiveness.
• Configured custom event buses in AWS EventBridge to capture and route events from various sources, including S3, EC2, and third-party SaaS applications, streamlining workflows and improving operational efficiency through automated event-driven responses and notifications.
• Deployed Trino (formerly Presto) on Amazon EMR to execute fast, distributed SQL queries across multiple data sources, including S3 and Redshift. Enabled seamless querying of large datasets in a hybrid cloud environment, integrating data lakes and data warehouses for advanced analytics.
• Implemented big data processing workflows on Amazon EMR, using Apache Spark and Hadoop to process genomic and clinical trial data, reducing processing time by 40%.
• Engineered real-time data streaming solutions with Amazon Kinesis for high-velocity data ingestion from clinical trials and IoT devices, improving analytics capabilities and reducing latency by 60%.
• Deployed and managed Apache Druid clusters on AWS, integrating with S3 for deep storage and Kinesis for real-time ingestion, enabling high-performance analytics on large-scale streaming data for business intelligence use cases.
• Optimized cloud infrastructure for cost efficiency and performance through Platform Engineering techniques, including detailed monitoring and reporting.
• Led the implementation of Palantir Foundry on AWS for large-scale data integration and analytics.
• Designed and deployed APIs using AWS AppSync with GraphQL to facilitate real-time, scalable data queries for cloud-native applications.
• Automated infrastructure management using Terraform, enabling the provisioning and scaling of Cloudera clusters on AWS, which streamlined data processing workflows and reduced setup times by 40%. Implemented IAM policies for secure access control, ensuring data security and compliance with regulatory standards.
• Developed and maintained complex ETL pipelines using Talend and Kafka, integrating data from various sources, including JSON files, into Cloudera Hadoop ecosystems. Optimized data transformations and ingestion using Sqoop, Java, and Unix Shell Scripting and Bash which improved data processing efficiency by 35% and supported real-time analytics.
• Integrated MuleSoft with AWS services for seamless data flow between on-premises systems and cloud-based platforms.
• Designed, developed, and implemented end-to-end data pipelines from scratch using SQL Server Integration Services (SSIS), automating ETL processes for data extraction, transformation, and loading across multiple sources, significantly enhancing data integration and operational efficiency.
• Designed and optimized complex data models using AWS RDS, DynamoDB, and Redshift, supporting OLAP and OLTP workloads to ensure high performance and scalability for enterprise applications.
• Created ER diagrams and logical/physical data models using Erwin, ensuring adherence to data modeling best practices.
• Developed scalable ETL pipelines for large datasets using Databricks and integrated with AWS services like S3, Redshift, and RDS.
• Developed data pipelines for structured and unstructured data integration in AWS S3, utilizing SQL queries and AWS Glue for seamless ETL processes.
• Implemented robust indexing and partitioning strategies in AWS Redshift to enhance query performance and reduce data retrieval times.
• Ensured data integrity by establishing referential constraints and normalization techniques, adhering to 3NF for transactional systems in DynamoDB.
• Collaborated with DevOps teams to integrate model versioning into AWS CodePipeline and CloudFormation templates for continuous deployment.
• Supported multi-dimensional data warehousing solutions with AWS, enabling efficient reporting and analysis for BI tools like Tableau and QuickSight.
• Led data modeling, mining, cleaning, and visualization efforts within Agile frameworks, utilizing tools such as Yarn for resource management, and OLAP/OLTP for optimized data processing. Conducted unit testing to ensure data accuracy and integrity and leveraged JSON for flexible data storage.
• Utilized Microsoft Power BI and Tableau for data visualization, driving insights that enhanced decision-making processes and reduced data errors by 30%.
• Developed REST APIs with AWS Lambda and API Gateway for microservices integration, optimizing serverless architectures.
• Developed and deployed MLOps pipelines using AWS services such as SageMaker, Lambda, Step Functions, and CodePipeline to automate model training, deployment, monitoring, and retraining. Leveraged S3 for model artifacts storage and ECR for containerized model deployments.
• Implemented Datadog for comprehensive monitoring and alerting, setting up custom dashboards to visualize system performance metrics. Configured alerts to proactively identify and resolve issues, improving system reliability and reducing downtime by 40%.
• Designed and deployed a Data Fabric architecture on AWS, integrating data sources like AWS Glue, Redshift, and S3, enhancing data accessibility and reducing integration time by 50%.
• Developed and maintained ETL workflows using Pentaho Data Integration (PDI) on AWS, increasing processing efficiency by 40%.
• Established monitoring mechanisms using AWS CloudTrail and Amazon CloudWatch to ensure adherence to data governance policies and detect any anomalies in data usage.
• Automated ETL workflows with Apache Airflow, improving operational efficiency by 40% and ensuring timely data delivery.
• Integrated and automated data extraction from SaaS platform into AWS Redshift using Python and SQL, ensuring seamless API data ingestion and transformation.
• Engineered and trained machine learning models using AWS SageMaker, improving model performance by 20% with automated hyperparameter tuning.
• Configured and maintained Snowflake environments to support data integration and analytics, enhancing efficiency by 45%.
• Automated data processing workflows using Bash scripts in AWS, streamlining ETL processes and optimizing data pipelines across EC2, S3, and Redshift environments.
• Implemented graph databases like Amazon Neptune, improving query performance and data connectivity by 30%.
• Implemented and managed GitLab CI/CD pipelines for automating deployment processes in AWS environments. Integrated GitLab with AWS Code Deploy and CloudFormation to streamline infrastructure as code (IaC) deployments and automate version control for cloud-based services
• Deployed Imperva security solutions to protect web applications and APIs, utilizing AWS CloudFront and WAF to secure traffic, enhance data protection, and prevent potential threats to sensitive business applications.
• Employed Scala and Python for robust data processing jobs and developed data transformation workflows using DBT, reducing errors by 40%.
• Automated workflows with Apache Oozie and Jenkins, reducing manual intervention and improving process efficiency.
• Configured Impala to provide high-performance, low-latency SQL queries on Hadoop, enhancing business decision-making.
• Leveraged Hadoop ecosystem technologies including HDFS, MapReduce, Hive, HBase, and Pig for comprehensive big data solutions.
• Implemented and managed MongoDB for handling unstructured and semi-structured data from clinical trials and research projects, leveraging AWS services to ensure scalability and high availability. This setup enhanced data retrieval performance by 50% and supported flexible, real-time analytics, contributing to accelerated research outcomes and improved data-driven decision-making.
• Optimized Teradata and Oracle databases for large-scale analytics, enhancing query performance by 40% and reducing execution times by 35%.
• Developed and optimized SQL and NoSQL queries on PostgreSQL and DynamoDB to fetch and analyze data efficiently.
• Integrated FSTP (Fast Storage Technology Protocol) with AWS services to optimize data storage and retrieval processes.
• Leveraged Vertica for high-performance, scalable analytics on large datasets. Implemented data pipelines to ingest and transform structured data into Vertica clusters, optimizing query performance and enabling real-time analytics for business intelligence.
• Leveraged Alteryx Core to automate data preparation and ETL processes, integrating data from Amazon S3 and Redshift, enhancing data workflows for cloud-based analytic.
• Leveraged QuickSight to build and deploy scalable data solutions, implement machine learning models, and perform advanced analytics. Developed end-to-end data pipelines and real-time analytics platforms, optimizing data processing and insights generation for business decisions.
• Employed AbInitio for data integration and ETL processes on AWS, creating scalable and efficient data pipelines to handle large volumes of data, and utilizing AWS Glue for orchestrating and managing data workflows.
• Achieved a 30% reduction in response time through performance optimization techniques, including caching, thread pooling, and database indexing. Implemented a robust monitoring and alerting system to ensure application uptime and identify performance bottlenecks.
• Successfully implemented a responsive UI that adapted to different screen sizes and devices. Improved user experience by reducing page load times by 25% through code optimization and image compression.
• Designed and managed complex DAGs with Apache Airflow and optimized data processing with formats like Avro, Parquet, and ORC, improving processing speed by 45%.
• Utilized PySpark for large-scale data processing of clinical trial datasets, reducing processing time by 50%.
• Implemented containerization with Docker and orchestrated applications using Kubernetes, improving deployment consistency and reducing setup time by 50%.
• Managed version control and CI/CD pipelines with Git, Bitbucket, and Control-M, streamlining development workflows and improving project turnaround time.
Environment: AWS (S3, EC2, Glue, Data Pipelines, Trino, Athena, EMR, Redshift, Kinesis, Lambda, API Gateway, SageMaker, Terraform), Apache Spark, Hadoop, Databricks, Hive, HBase, Pig, Impala, AWS Glue, Pentaho Data Integration (PDI), SSIS, Apache Airflow, Jenkins, Snowflake, AWS Redshift, Teradata, Oracle, Python, Scala, SQL, NoSQL, Power BI, Tableau, DBT, Elastic Search, Docker, Kubernetes, Bitbucket, Jenkins, Control-M, AWS SageMaker, AWS IAM, Data Catalog, MongoDB
Thomson&reuters,mn
Big Data/Snowflake/GCP Data Engineer Apr 2019 - Dec 2021
Responsibilities:
• Configured and managed GCP BigQuery for scalable and cost-effective data analysis, enabling rapid insights into pet food sales trends.
• Utilized Federated Queries in BigQuery to access and query data from various external sources, enriching analytical capabilities.
• Implemented and managed Google Cloud Storage (GCS) buckets to handle vast amounts of unstructured data for a high-traffic e-commerce application, designing a lifecycle policy that reduced storage costs by 40%.
• Implemented Palantir Foundry to drive data-driven insights using BigQuery for high-performance data querying and Dataflow for real-time data processing.
• Integrated Palantir’s powerful ontology framework to provide centralized data access and governance, ensuring consistency in data quality and security.
• Leveraged Mulesoft to connect and manage data between GCP services like BigQuery, Pub/Sub, and Cloud Storage, enabling efficient data synchronization and API management for improved data accessibility and integration.
• Set up MLOps pipelines using GCP tools such as AI Platform (Vertex AI), Cloud Functions, and Cloud Composer (Apache Airflow) for automated model training, versioning, and deployment. Utilized BigQuery and Dataflow for data preprocessing and transformation in machine learning workflows.
• Proven experience as a GCP Engineer, specializing in implementing security and compliance measures across various Google Cloud Platform services, ensuring adherence to industry standards and regulatory requirements.
• GCP Identity and Access Management (IAM), enabling fine-grained access control and role-based permissions to secure resources and data within the cloud environment.
• Proficient in utilizing Prisma Cloud and conducting Google Cloud Security Posture Reviews (CSPR) to assess and enhance the security posture of cloud infrastructure, identifying vulnerabilities and implementing remediation strategies.
• Strong collaboration skills demonstrated by working effectively with Cloud Foundations teams to implement secure solutions, facilitating the adoption of security best practices across the organization.
• leveraging Terraform and Deployment Manager to automate the deployment of secure infrastructure, ensuring consistent configurations and compliance across cloud environments.
• Key GCP services including Compute Engine, Cloud Storage, and Kubernetes Engine, optimizing security configurations and resource management in alignment with organizational policies.
• Cloud security principles, threat detection mechanisms, and incident response processes, ensuring timely identification and mitigation of security risks.
• Proficient in scripting languages such as Python and Bash for automation purposes, streamlining security tasks and improving operational efficiency in managing cloud resources.
• Utilized Erwin in GCP to create logical and physical data models, ensuring seamless integration and optimizing data architecture for analytics solutions.
• Implemented FSTP by utilizing Google Cloud Storage for high-speed data access and Google Dataflow for real-time processing, enhancing data handling efficiency and performance in data-intensive applications.
• Scalable model training, tuning, and real-time inference for predictive analytics.
• Deployed TigerDB on Google Cloud, utilizing Compute Engine and Cloud Storage for distributed data management, ensuring high availability and performance for large-scale data analytics applications.
• Utilized Ab Initio for designing and developing complex ETL pipelines, integrating data from multiple sources into BigQuery and Cloud Storage, and optimizing data workflows for high performance and scalability.
• Designed and implemented data processing pipelines using Google Cloud Dataflow, Data Lake and Dataproc resulting in a 30% improvement in data processing efficiency for large-scale analytics projects.
• Orchestrated complex ETL workflows using Google Cloud Composer (Apache Airflow), automating data transformation and loading processes and reducing manual intervention by 40%.
• Designed and implemented a data pipeline using Apache Beam and Google Cloud Functions to process and transform large datasets in real-time, improving data processing throughput by 50% and reducing latency.
• Developed interactive dashboards and reports using Power BI and Looker, integrating these BI tools with data sources in BigTable and Cloud Spanner for real-time visualization of key metrics.
• Implemented data transformation workflows using DBT (Data Build Tool) to enhance data modeling and ensure reliable data pipelines.
• Applied data mining techniques to extract actionable insights and performed data cleaning to maintain high-quality datasets, supporting strategic decision-making through effective data visualization.
• Implemented a comprehensive big data solution using Hadoop, Hive, and HBase, configuring HDFS for distributed storage and using MapReduce for batch processing.
• Implemented GraphQL APIs using Google Cloud Functions and Firebase as part of a microservices architecture.
• Designed and deployed GitLab pipelines to support DevOps practices, automating the continuous integration and delivery (CI/CD) process.
• Integrated GitLab with GCP Kubernetes Engine (GKE) and Cloud Functions, ensuring scalable, automated deployment and management of microservices and cloud functions.
• Utilized Alteryx Core to design and implement data transformation pipelines, extracting and processing data from Google Cloud Storage and BigQuery, optimizing data efficiency.
• Developed scalable data processing pipelines using PySpark for distributed data processing and transformation and utilized Java and Scala for implementing complex data algorithms and integrating with big data frameworks.
• Developed and maintained Bash scripts to manage and orchestrate data pipelines, facilitating seamless integration for efficient data processing.
• Employed Sqoop for efficient data import/export between Hadoop and relational databases, and managed data warehousing solutions with Teradata.
• Developed and integrated Customer Data Platforms (CDP) utilizing BigQuery and Dataflow to unify and analyze customer data from multiple sources, enabling personalized customer experiences and data-driven decision-making
• Leveraged Unix for scripting and automation tasks and utilized GIT and Bitbucket for version control and collaborative development within Agile teams.
• Engineered ETL workflows using Talend and integrated them with Kafka for real-time data streaming and processing, reducing data integration time by 40%.
• Utilized Cloud Shell for interactive development and management of GCP resources, ensuring streamlined deployment and monitoring of data engineering workflows.
• Applied Agile methodologies for iterative development and continuous improvement of data solutions, incorporating feedback and adapting to changing requirements.
• Utilized GCP tools such as BigQuery, Vertex AI, Dataflow, and Looker to create data-centric solutions and deploy machine learning models. Designed and managed data pipelines, performed complex analytics, and delivered actionable insights, enhancing data-driven decision-making and operational efficiency.
• Integrated Snowflake with Google Cloud Platform services to streamline data processing, leveraging Google Cloud Storage for data storage and Google Dataflow for real-time ETL pipelines. Enhanced data accessibility and analytics capabilities across the platform.
• Configured Snowflake data sharing features with GCP BigQuery to facilitate secure and scalable data collaboration between different teams and external partners, ensuring efficient data exchange and consistency.
• Created data pipelines to extract, transform, and load data into Snowflake, ensuring timely and accurate data availability for business intelligence.
• Established a CI/CD pipeline using Jenkins and Docker to automate build, test, and deployment processes, resulting in a 30% reduction in deployment time and increased reliability.
• Implemented Firestore as a NoSQL database for managing unstructured data, optimizing data retrieval times by 25% through efficient indexing and querying strategies.
• Managed Apache Airflow workflows on GCP to orchestrate complex ETL processes, integrating with GCP services such as BigQuery, Cloud Storage, and Dataflow.
• Optimized Airflow performance and resource utilization on GCP by fine-tuning DAG execution and task scheduling, leveraging DAGs for complex workflow orchestration.
• Implemented YARN for resource management in Hadoop environments and utilized OLAP for multidimensional data analysis.
• Integrated data formats such as Avro, Parquet, and ORC for efficient data storage and retrieval, and handled JSON for semi-structured data processing. This approach improved data processing speeds and reduced operational costs.
• Utilized Cloud SQL and MySQL for database management, ensuring high availability and security of transactional data.
• Implemented Datadog for monitoring and observability, configuring custom dashboards and alerts to track system performance and resource utilization.
• Developed ETL pipelines using Matillion to transform and load data into Snowflake, enhancing data warehousing capabilities.
• Managed database applications using MS-SQL, ORACLE, Postgre and DB2, ensuring robust data storage solutions.
• Enhanced data security through meticulous IAM Security configurations, protecting sensitive customer and business data.
• Leveraged Databricks for complex data processing tasks, increasing efficiency and scalability of data operations.
• Configured and optimized Apache Druid, improving query performance and ensuring efficient data aggregation across distributed environments.
• Automated routine tasks and data workflows using Python and shell scripts, significantly reducing manual effort and error.
• Configured VPC settings to isolate and secure network environments, enhancing overall data security within Chewy.
• Ensured reliable interconnectivity through VPN Google-Client configurations, reducing access issues by 20%.
• Led the migration of data from on-premises MongoDB instances, ensuring high availability and disaster recovery through automated replication and failover strategies.
• Developed and maintained data integration solutions using SSIS, SSAS, and SSRS to support business intelligence initiatives.
• Orchestrated data extraction from SaaS platformusing Python and SQL with Cloud Dataflow and BigQuery. Managed API integration for seamless data ingestion and employed Kafka for real-time streaming, supporting analytics and dashboard needs.
• Executed data migration projects with DATASTAGE and QUALITYSTAGE, ensuring accuracy and quality of data during the transfer process.
Environment: GCP, GCP Big Query, DataFlow and DataProc, Pub/Sub, Google Cloud Composer, Cloud SQL, Firestore, Cloud Storage, Matillion, DataStudio, MySQL, MS-SQL, ORACLE, DB2, Federated Queries, IAM Security, Snowflake, GCP Databricks, Service Data Transfer, python, shell scriptsVPC Configuration, Data Catalog. Mongo DB, VPN Google-Client, Pub Sub, SSIS, SSAS, SSRS, DATASTAGE, QUALITYSTAGE.
Macy’s,ny
Big Data/Snowflake/Azure Data Engineer Jan 2017 – Mar 2019
Responsibilities:
• Developed and maintained data pipelines using Azure Data Factory (ADF) to extract, transform, and load (ETL) large datasets into Azure SQL Database and Data Lake, ensuring optimized performance for downstream analytics.
• Employed Databricks to perform complex data transformations and batch processing, leveraging Spark for enhanced data processing efficiency.
• Implemented data warehousing solutions using Azure Synapse Analytics, optimizing data processing workflows, enhancing business intelligence and reporting capabilities in a real-time Azure environment.
• Developed and managed Hive and HBase databases to handle large datasets, optimizing data retrieval and storage processes.
• Managed and maintained code repositories using Azure Repos, enabling efficient version control, collaboration, and continuous integration/continuous deployment (