Data Engineer Azure

Location:

Posted:

April 25, 2024

Resume:

Vamsi Krishna B

Sr.Data Engineer Big Data Engineer Cloud Data Engineer Data Scientist Data Analyst AWS Data Engineer Azure Data Engineer ETL Developer

Mail : ***********@*****.***

Mob : +1-479-***-****

Skilled Data Engineer with 8+ years of experience in designing, and implementing robust data solutions on cloud platforms such as Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), and Snowflake. Proficient in building end-to-end ETL pipelines, optimizing data processing workflows, and leveraging a wide array of technologies including Apache Spark, Hadoop, and Databricks to extract actionable insights from large and complex datasets. Seeking a challenging role where I can utilize my expertise in cloud-based data engineering, machine learning, and analytics to drive impactful business decisions and contribute to the success of innovative projects.

Professional Summary

Experience in working with Amazon Web Services (AWS S3, AWS Glue, EC2 instances, Redshift, EMR, DynamoDB, Amazon RDS, AWS Lambda, Athena, IAM, Kinesis, VPC, DMS, Amazon Elastic Load Balancing, CloudWatch, SNS, SQS etc.) and Microsoft Azure (Azure Data Factory, Azure Data Lake, Azure Synapse Analytics, Azure Functions, Azure SQL, Azure Databricks, Cosmos DB, Azure Active Directory and Azure Key Vault) for building scalable and reliable data solutions.

Experience in building data pipelines using Azure Data Factory, Azure Databricks, and loading data to Azure Data Lake, Azure SQL Database, Azure Synapse Analytics to control and grant databases access.

Experienced with cloud platforms like Amazon Web Services (AWS), Azure, Databricks (both on Azure as well as AWS integration of Databricks)

Proficiency in RDBMS like MySQL, PostgreSQL, Redshift, SQL Server, and Oracle Databases, with Python for data manipulation and analysis.

Experience in designing interactive dashboards, reports, performing ad-hoc analysis, and visualizations using Tableau, Power BI, and Matplotlib for Business Intelligence (BI) purposes.

Experienced in utilizing Google Cloud Storage, Cloud Dataflow, BigQuery, and Cloud Dataproc for seamless data storage, processing, analytics, and cluster management, respectively, enabling efficient and scalable data solutions.

Knowledge of migrating SQL databases to Azure Data Lake, Azure Data Lake Analytics, Azure SQL Database, Data Bricks, and Azure SQL Data warehouse, as well as controlling and granting database access and migrating on-premises databases to Azure Data Lake utilizing Azure Data Factory, incorporating Python for data migration and preprocessing

Experience in developing the Hadoop based applications using HDFS, MapReduce, Spark, Hive, Sqoop, HBase and Kafka.

Proficient in industry-standard data modeling techniques, including star and snowflake schema, to create scalable and efficient data structures that meet business requirements.

Having good experience in different Software Development Life Cycle ( SDLC ) models including Waterfall and Agile.

Hands-on experience implementing Continuous Integration (CI) and Continuous Deployment (CD) in diverse applications utilizing Jenkins, Docker, and Kubernetes technologies.

Strong working experience in Cloud data migration using AWS and Snowflake Clouds.

Expertise in designing and implementing end-to-end Extract, Transform, Load (ETL) pipelines, optimizing performance, and collaborating within Databricks environments.

Designed and implemented data solutions on AWS, utilizing services such as Amazon S3, AWS Glue, AWS Lambda, Amazon Redshift, and Amazon Athena.

Experience in building data pipelines using Azure Data Factory and Azure Databricks and loading data to AzureData Lake, Azure SQL Database, Azure Synapse Analytics, and Controlling and granting database access.

Proficient in applying Statistical Modelling and Machine Learning techniques (Linear Regression, Logistic Regression, Decision Trees, Random Forest, SVM, K - Nearest Neighbors, XG Boost) in Forecasting/ Predictive Analytics.

Expert in python libraries such as NumPy for mathematical calculations, Pandas for data preprocessing/ data wrangling / data preparation, Matplotlib, Seaborn for data visualization, Sklearn for machine learning.

Expert in SQL across several platforms (commonly write MySQL, PostgreSQL, Redshift, Azure Synapse Analytics, SQL Server, and Oracle Database)

Extensive experience using the Python libraries such as PySpark, Pytest, Boto3, NumPy and Pandas for tasks such as distributed data processing, unit testing, integration, data manipulation and statistical analysis.

Extensive experience utilizing Databricks and PySpark for data processing and analysis tasks.

Built comprehensive Tableau dashboards to monitor project progress, enabling project managers to track milestones and ensure project success.

Integrated Power BI with relational databases (SQL Server, MySQL) and NoSQL databases (MongoDB, Cassandra) for seamless data extraction and analysis.

Designed and implemented scalable data warehousing solutions using Snowflake on AWS. Leveraged Amazon S3 as a data storage layer, utilizing Snowflake's seamless integration with S3 for efficient data ingestion and data extraction.

Collaborated with cross-functional teams, including Product Managers and business stakeholders, for business understanding and deliver impactful insights.

Programming Languages

SQL, Python, Scala and Java.

Big Data Tools

Hadoop, Apache Spark, Apache Airflow, Apache Hive, Apache Kafka, Sqoop, Oozie.

Cloud Platform

Amazon Web Services (AWS) and Microsoft Azure

SQL DBs

MY SQL, SQL Server

NoSQL DBs

MongoDB, DB2, cosmosDB, Cassandra, Hbase

Data Warehouses

Redshift, Azure Synapse, Snowflake and Teradata

SQL Server Components

SQL Server Integration Services (SSIS), SQL Server Analysis Services (SSAS), SQL Server Reporting Services (SSRS).

File Formats

Parquet, JSON, CSV, TSV, JSON, Avro, PSV, ORC, XML

Data Visualization Tools

Tableau and Power BI

Operating System

Windows, Unix / Linux

Work Experience:

Walgreens, Chicago, Illinois Sep 2023 to Current

Role : Sr. Data Engineer

Responsibilities:

Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation, utilizing Python.

Responsible for loading data into S3 buckets from the internal server and the Snowflake data warehouse.

Developed ETL Pipelines in and out of data warehouse, develop major regulatory and financial reports using advanced SQL queries.

Proficient in leveraging AWS services such as EMR, S3 and CloudWatch for executing and Hadoop and Spark tasks on Amazon Web Services (AWS).

Implement Data Migration of Multistate level data from SQL server to Snowflake by using Python and SnowSQL.

Established seamless connectivity between AWS S3 and Snowflake data warehouse for efficient data transfers and processing.

Proficient in working with Snowflake for efficient storage, management, and analysis of large datasets, utilizing Python for ETL processes.

Developed Extract, Transform, Load (ETL) pipelines in and out of Snowflake using a combination of Python and SnowSQL.

Working on Amazon S3 for persisting the transformed Spark Data Frames in S3 buckets and using Amazon S3 as a Data lakes to the data pipeline running on spark and Map Reduce.

Developed spark applications in PySpark on Databricks to load huge number of CSV files with different schema into S3.

Integrated AWS services (AWS Glue, AWS Lambda, Amazon S3) with Databricks for seamless data movement and processing.

Writing Scripting for interacting Internal server with S3 bucket for migrate the data.

Design and Develop ETL Processes in AWS Glue to migrate data from external sources into AWS Redshift.

Utilized AWS Glue catalog alongside a crawler to retrieve data from S3 and executed SQL queries with AWS Athena for data operations.

Leveraged Python libraries such as Pandas, NumPy, and Scikitlearn to perform advanced data analysis and statistical modeling, driving actionable insights.

Implemented AWS Step Functions to automate and orchestrate the Amazon SageMaker related tasks such as publishing data to S3, training Machine Learning model, and deploying it for prediction, contributing to data science workflows with Python.

Applied supervised and unsupervised learning techniques to discover hidden patterns and structures within the data.

Evaluated model performance using metrics such as accuracy, precision, recall, and F1-score, and iteratively improved model performance through feature engineering and hyperparameter tuning.

Collaborated with DevOps to integrate IAM roles and permissions into CI/CD pipelines for automated access control.

Developed data visualizations and dashboards in Tableau to facilitate communication of findings to stakeholders. Proficient in SQL query writing to extract and manipulate data for analysis and reporting, ensuring accurate and relevant insights.

Environment: AWS, S3 bucket, AWS Glue, AWS EMR, EC2, Snowflake, SQL, Amazon EKS, Redshift, Databricks, Pyspark, IAM, Lambda, Athena, Boto3, DynamoDB, Python,Amazon SageMaker, Exploratory Data Analysis, Pandas, Numpy,Feature Engineering, Machine Learning, prediction, predictive modeling, Glue Catelog, Elastic Search, Technical Writing, AWS DataSync, Delta Lake, RDS, DBT, MySQL, Hadoop, HBase, Hive, Sqoop, Tableau, Alteryx, PL/SQL, NoSQL DBs, Git.

Academy Bank, Jersey City, New Jersey (Remote) Aug 2022 to Aug 2023

Role : Sr. Cloud Data Engineer

Responsibilities:

Developed and implemented data integration solutions on both Microsoft Azure and AWS cloud platforms, leveraging services such as Azure Data Factory (ADF), Azure Databricks, Azure Synapse Analytics, Azure SQL Database (ADB), Azure Blob Storage, Azure Data Lake Storage (ADLS), AWS Glue, AWS Lambda, AWS S3, and AWS IAM.

Designed and built end-to-end data pipelines in Azure Data Factory (ADF) and AWS Glue, utilizing Linked Services, Datasets, and Pipelines to extract, transform, and load data from diverse sources including Azure SQL, Blob storage, AWS S3, and various third-party APIs.

Utilized PySpark within Azure Databricks and AWS Glue to perform complex data transformations and manipulations, ensuring data quality and consistency before ingestion into Azure and AWS services such as Data Lake, Azure SQL Database, AWS S3, and AWS Redshift.

Orchestrated data processing workflows across Azure and AWS cloud environments, seamlessly integrating Azure Data Factory (ADF) with AWS Lambda for serverless data processing and automation.

Leveraged Kubernetes for the runtime environment in the CI/CD system, facilitating the creation and deployment of Docker images with multiple microservices across Azure and AWS cloud platforms.

Executed advanced data operations using T-SQL, Spark SQL, and Python within Azure Data Factory, Azure Databricks, AWS Glue, and AWS Lambda, ensuring efficient data processing and analysis in Azure Synapse Analytics, Azure Data Lake Analytics, and AWS Redshift.

Demonstrated expertise in managing data storage and data governance in both Azure and AWS environments, including configuring access controls, data encryption, and data lifecycle management policies.

Collaborated closely with cross-functional teams to gather business requirements, design data architectures, and deliver scalable, cost-effective data solutions aligned with organizational goals and objectives.

Performed basic data cleansing utilizing Power BI desktop such as splitting columns, datatype conversion, conditional columns.

Leveraged Power BI extensively to design and deploy a diverse array of analytical dashboards catering to the specific needs of business users, facilitating rapid data-driven decision-making.

Developed interactive and visually appealing dashboards using Power BI, empowering business stakeholders to gain quick insights into key metrics and trends, enhancing their understanding of complex datasets.

Collaborated closely with cross-functional teams to gather requirements and translate business needs into impactful Power BI dashboards, aligning with organizational objectives and driving strategic decision-making.

Developed custom Power BI visuals and templates to standardize reporting across the organization.

Environment: Azure Cloud, Azure data factory(ADF), Azure Data Lake Storage(ADLS), Databricks, Azure Synapse Analytics, MySQL, Python, Hive, Spark-SQL, Terraform, SQL, ADF, Kafka, AWS cloud, S3 bucket, AWS Glue, AWS EMR, EC2, Snowflake, SQL, Amazon EKS, Redshift, Databricks, Pyspark, IAM, Lambda, Athena, Boto3, DynamoDB, Azure Devops, Azure Repos, Pyspark, Delta lake, SparkSQL, Azure SQL Server, Power BI, PostgreSQL, Power Shell, Kinesis, REST APIs, Cosmos DB, Mongo DB, blob storage, Micro Services, ETL Tools, Docker, APIs, Git, GitHub, Jira, Jenkins, Confluence.

Caterpillar, Irving, Texas (Remote from Bangalore office) May 2019 – Mar 2022

Role : Cloud Data Engineer

Responsibilities:

Played a key role in migrating on-premises data systems to the Azure cloud platform, leveraging Azure services such as Azure Blob Storage, Azure SQL Database, and Azure Data Lake Store, while utilizing Python for data manipulation and scripting tasks.

Designed and executed end-to-end data pipelines on Databricks, enabling efficient data processing, transformation, and integration across hybrid cloud environments.

Demonstrated proficiency in Python (Programming Language) for data manipulation, analysis, and scripting tasks.

Orchestrated data workflows using Google Cloud Composer and Azure Data Factory, seamlessly integrating data from on-premises sources (MySQL, Cassandra) and cloud storage (GCS, Blob Storage, Azure SQL DB) prior to ingestion into Azure Synapse with Python for data preprocessing and integration tasks.

Developed data ingestion pipelines on Azure HDInsight Spark clusters, leveraging Azure Data Factory and Spark SQL for scalable processing of large-scale datasets.

Oversaw the migration of extensive data sets to Databricks (Spark), handling cluster administration, data loading, and pipeline configuration, including ingestion from Azure Data Lake Storage Gen2 using Azure Data Factory pipelines.

Leveraged Azure Data Factory for data transformations, integration runtimes, Azure Key Vaults, triggers, and migration of data factory pipelines across environments, encompassing hybrid cloud setups.

Managed Azure Data Lake Storage (ADLS), Databricks, Delta Lake, and integrated these services with other Azure offerings for enhanced data processing and analytics capabilities.

Used Python libraries like Pandas and NumPy to wave my data-wrangling wand and make sense of messy datasets.

Conducted data profiling and cleansing activities to ensure data quality and integrity for Power BI dashboards, incorporating Python for data preprocessing and cleaning tasks.

Engaged with various Azure platforms such as Azure Data Lake, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, and HDInsight for comprehensive data solutions.

Utilized Azure Blob Storage and Data Lake Storage for loading and processing data within Azure Synapse Analytics, accommodating hybrid cloud scenarios.

Automated the provisioning and management of Azure resources within hybrid cloud environments using Terraform for infrastructure as code (IaC).

Demonstrated strong SQL skills for data manipulation, querying, and database management across multiple cloud platforms.

Managed project-specific workspaces on Power BI service, organizing reports for easy access. Transitioned reports from Power BI Desktop to the appropriate workspace within Power BI Services for seamless sharing and collaboration.

Worked on Power BI reports using multiple types of visualizations including line charts, doughnut charts, tables, matrix, KPI, scatter plots, box plots, etc.

Optimized Power BI reports and dashboards for performance and usability by fine-tuning data models, queries, and visualizations. Collaborated closely with stakeholders to gather feedback and iteratively refine reports to meet evolving business needs.

Environment: Azure, Azure Data Factory (ADF), Azure Data Lake Storage (ADLS), Databricks, Azure Synapse Analytics, MySQL, Python, Hive, Spark-SQL, Terraform, SQL, Kafka, Spark Streaming, Azure SQL Server, PostgreSQL, Kinesis, Azure DevOps, Cosmos DB, MongoDB, Power BI, Azure Event Hubs, Blob Storage, Microservices, ETL, Docker, APIs,, Git, GitHub, Jira, Jenkins.

Toshiba, Hyderabad, India July 2016 – Apr 2019

Role : Big Data Engineer

Responsibilities:

Experience in creating ETL data pipelines using Python, PySpark, Redshift,Amazon EMR, S3.

Designed and setup Enterprise Data Lakes to provide support for various use cases including data storing, data processing, big data analytics and Reporting of voluminous, rapidly changing data by using various AWS Services.

Utilized a combination of Google Cloud Platform (GCP) and Amazon Web Services (AWS) cloud services to architect and implement data solutions, incorporating components such as Google Cloud Storage (GCS), BigQuery, Amazon S3, Amazon Redshift, and AWS Glue.

Created multiple ETL jobs in Glue and then processed the data by using different transformations and then loaded into S3,Redshift and RDS.

Utilized AWS EMR to efficiently transform and transfer large datasets between various AWS data stores and databases, including Amazon S3 and Amazon DynamoDB.

Create Terraform scripts to automate deployment of EC2 Instance, S3, EFS, EBS, IAM Roles and Jenkins.

Create data ingestion modules using AWS Glue for loading data in various layers in S3 and reporting using Athena and QuickSight.

Utilized a combination of Google Cloud Platform (GCP) and Azure cloud services to architect and implement data solutions, incorporating components such as Google Cloud Storage (GCS), BigQuery, Azure Data Lake, Blob Storage, and Azure Synapse

Create manage bucket policies and lifecycle for S3 storage as per organizations and compliance guidelines.

Established CI/CD tools such as Jenkins and Bitbucket for code repository, build and deployment of the python code base.

Developed interactive dashboards and visualizations in Tableau to convey complex data insights to business stakeholders.

Integrated Tableau with relational databases and NoSQL databases.

Generated insightful Tableau visualizations to understand customer behaviour and preferences, empowering businesses to tailor products and services to meet customer needs effectively.

Developed detailed Tableau reports to analyze financial data, providing valuable insights for budget planning and resource allocation.

Hands on experience on working with AWS services like Lambda function, Athena, DynamoDB, Step functions, SNS, SQS, S3, IAM etc.

Created informative Tableau visualizations to analyze product performance metrics, aiding in product development and enhancement initiatives.

Responsible for maintaining quality reference data in source by performing operations such as cleaning, transformation and ensuring Integrity in a relational environment by working closely with the stakeholders & solution architect.

Imported data from SQL Server to AWS Redshift and utilized PySpark to execute transformations and actions to get the required outcome.

Generated Tableau visualizations to analyze sales pipeline data, aiding in sales forecasting and revenue planning.

Worked with big data technologies like Apache Spark and Hadoop ecosystem using Python APIs for distributed data processing.

Built Tableau dashboards to monitor manufacturing processes, enabling organizations to optimize production efficiency and reduce costs.

Environment: AWS Glue, Pyspark, S3, IAM, EC2, RDS, Redshift, Lambda, Boto3, DynamoDB, Google Cloud Platform (GCP), Google Cloud Composer, Google BigQuery, Google Cloud Storage, Apache Spark, Kinesis, Athena, Hive, Sqoop, Python, EBS, ELB, EMR, SNS, SQS, VPC, IAM, Cloud formation, CloudWatch, Bitbucket, Python, Shell Scripting, GIT, Jira, Unix/Linux,, Dynamo DB, Kinesis, RDS.

Technoidentity, Hyderabad, India May 2015 – June 2016

Role : Data Analyst

Responsibilities:

Worked in all phases of research like Data Cleaning, Data Mining, Validation, Visualizations, and performance monitoring, with a focus on identifying trends and troubleshooting data-related issues to ensure data accuracy and reliability, including data cleaning and analysis using Python.

Applied analytical skills to interpret complex datasets, identify trends, and troubleshoot data-related issues, ensuring data accuracy and reliability, with Python for data manipulation and analysis.

Involved in generating Daily/Weekly Sales and Finance trending reports using Tableau by identifying the dimensions, measures, measure values and level of details for Business Analytics and Top Executives to give an overview current Trends.

Developed various reports using Tableau features like Parameters, Filters, Sets, Groups, Actions, etc. to present users with various scenarios of predictive analysis, with Python for data preprocessing and manipulation.

Created interactive Tableau dashboards with intuitive visualizations, allowing stakeholders to easily track and analyze key performance metrics, fostering data-driven decision-making.

Developed comprehensive Tableau reports to identify sales trends, enabling organizations to gain insights into revenue growth drivers and make informed strategic decisions.

Interacted with Subject Matter Experts and End Users and gathered business requirements.

Involved in data collection and data cleaning of large amount of data using python libraries pandas and numpy.

Adept in using Python libraries such as Pandas, Numpy, Seaborn, Matplotlib, to analyze the insights of data and perform data cleaning and data visualization.

Demonstrated analytical problem solving skills in identifying and addressing data-related challenges to ensure accurate and reliable insights.

Environment: MySQL, SQL Server, Python, Pandas, numpy, SQL, NoSQL, Matplotlib, Seaborn, Tableau, Excel, Statistics, Business requirements.

Contact this candidate