Post Job Free
Sign in

Machine Learning Azure Data

Location:
Farmington, MI, 48336
Posted:
February 10, 2025

Contact this candidate

Resume:

PROFESSIONAL SUMMARY:

Professional *+ years of extensive hands-on experience in the IT industry, Spark, Scala, Python, Machine Learning Algorithms Deployment, AWS, Kafka, and Hadoop Components.

My expertise includes transforming complex data into actionable insights and ensuring seamless integration across various platforms.

Experienced in Building End to End Pipelines for Real-time data analytics on Cloud using AWS services EMR, EC2, Dynamo DB, RDS, Athena, S3, Lambda, SNS, SQS.

Extensive experience in project life cycle including Data Acquisition, Data Cleaning, Validation, Data Manipulation, Data Validation, Data Mining, Algorithms, and Visualization.

Good understanding of Spark Architecture with Databricks, and Structured Streaming. Setting Up AWS and Microsoft Azure with Databricks, Databricks Workspace for Business Analytics, Managing Clusters in Databricks, and Managing the Machine Learning Lifecycle.

Successfully built and managed ETL processes and data integration using tools such as Apache NiFi, Informatica, Sqoop, AWS Glue and Azure Data Factory (ADF) ensuring smooth data flow and transformation.

Proficient with various data storage systems including SQL Server, CosmosDB, MongoDB, MySQL, PostgreSQL, Oracle, DynamoDB, Snowflake and Google BigQuery, optimizing data access and performance.

Extensive experience with big data technologies like Apache Spark, Hadoop, Spark MLlib, Flink, Hive, HBase and Pig, enabling efficient handling and analysis of large datasets.

Leveraged AWS (AWS S3, EMR, Kinesis, Lambda and Redshift), Azure (Azure Data Lake Storage, Event Hubs and DevOps) and Google Cloud Platform (Dataflow and Pub/Sub) to create scalable cloud-based solutions.

Utilized Python libraries such as Pandas, NumPy, TensorFlow and Scikit-Learn for advanced data analysis and machine learning, providing valuable insights and predictive capabilities.

Designed and implemented dimensional data models and data warehousing solutions enhancing data organization and accessibility for business intelligence and reporting.

Experienced in setting up CI/CD pipelines with tools like Jenkins, GitLab CI and Azure DevOps and managing code with Git, GitHub and Bitbucket ensuring efficient development processes.

Implemented best practices for data security and compliance using AWS IAM, Azure AD, KMS (AWS), JWT and OAuth ensuring data integrity and adherence to industry standards.

Applied Agile methodologies, including Scrum and Kanban, to contribute effectively to project management and ensure timely delivery of data engineering solutions.

Addressed complex data challenges and communicated effectively with cross-functional teams and stakeholders ensuring alignment with business goals and practical solutions.

Experience in Database Design and development with Business Intelligence using SQL Server, Integration Services (SSIS), SQL Server Analysis Services (SSAS), OLAP Cubes, Star Schema, and Snowflake Schema.

Experience creating Visual reports, Graphical analyses, and Dashboard reports using Tableau, Power BI, Looker and Google Data Studio, Informatica of historical data saved in HDFS, and data analysis using Splunk enterprise edition.

Can work parallelly in both GCP and Azure clouds coherently. Experience in creating, debugging, scheduling, and monitoring jobs using Airflow.

TECHNICAL SKILLS

Languages

Python, Java, SQL, MySQL, TSQL, PostgreSQL, Shell Scripting

Cloud

Azure, AWS

ETL/Reporting Tools

Power BI, SSIS, SSAS, SSRS, Azure Data Factory, Snowflake

Big Data Tools

Hive, Pig, MapReduce, Hadoop, Apache Spark, Apache Kafka, Sqoop, HDFS

Analytics Tools

Tableau, Power BI, Microsoft SSIS, SSAS and SSRS

OLAP Tools

Business Objects and Crystal Reports 9

Data Modelling Tools

Erwin Data Modeler, ER Studio v17

IDEs

Eclipse, IntelliJ IDE, PyCharm IDE, Notepad++ and Visual Studio

Operating System

Windows, Unix, Linux

CI/CD, DevOps Tools

GIT, Git Hub, Docker, Jenkins, Kubernetes, Splunk, Grafana

Databases

SQL DB, SQL Server 2019/2016/2014/2012, Oracle 12C/11gR2/10g/9i

Methodologies

RAD, JAD, System Development Life Cycle (SDLC), Agile

Certifications:

Certified Azure Administrator Associate.

Certified AWS Developer Associate.

PROFESSIONAL EXPERIENCE:

Client: Paccar, Dallas TX Aug 2023 - Present

Role: Sr. Azure Data Engineer

Responsibilities:

Highly Involved in Data Architecture and Application Design using Cloud and Big Data solutions on AWS, Microsoft Azure.

Designed and developed data integration pipelines in Azure Data Factory to ingest 50 TB of data daily from on-prem SQL servers to Azure SQL Data Warehouse.

Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.

Created aggregation and partition logic using PySpark in Databricks to optimize query performance.

Designed and managed Snowflake data warehouses, improving scalability and performance for large-scale data analytics.

Utilized Snowflake’s SQL capabilities for complex data queries and reporting, enhancing data insights.

Leveraged Azure Synapse Link to enable seamless integration between Azure Synapse Analytics and Azure Cosmos DB.

Conducted performance monitoring and troubleshooting activities within Azure Synapse, proactively identifying and resolving performance bottlenecks and ensuring uninterrupted data processing and analytics operations.

Monitored and managed Airflow DAGs, addressing issues and optimizing pipeline performance.

Designed and implemented Java solutions for integrating with data warehouses such as Snowflake and Azure Synapse, optimizing data loading processes and ensuring efficient data storage and retrieval.

Utilized Java libraries such as Apache POI for handling Excel files and Jackson for JSON processing, facilitating effective data manipulation and transformation as part of ETL workflows.

Implemented monitoring and logging for Java-based data applications using tools such as Log4j and SLF4J, ensuring visibility into application performance and facilitating troubleshooting and issue resolution.

Implemented PostgreSQL performance tuning strategies, including indexing and query optimization, to enhance database efficiency.

Implement One-time Data Migration of Multistate level data from SQL server to Snowflake using Python and Snow SQL.

Developed interactive notebooks in Databricks for collaborative data exploration and analysis.

Designed and implemented Snowflake Data Warehouse solutions for scalable and performant analytics. Implemented Data Vault modeling for creation Enterprise Data Warehouse.

Implemented robust data quality checks and validations within ETL processes to ensure accuracy and completeness of healthcare data, minimizing errors and discrepancies in downstream analytics.

Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark Databricks cluster.

Leveraged Azure Synapse's integration with Azure Data Factory to automate and orchestrate data workflows, reducing manual intervention and improving overall data pipeline efficiency.

Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that processes the data using the SQL activity. Involved in gathering the requirements, designing, development and testing.

Managed version control for data engineering projects using Git, ensuring code integrity and collaboration.

Integrated Git with CI/CD pipelines to automate testing and deployment processes.

Implemented Jenkins for monitoring and alerting on data pipeline statuses and build processes.

Integrated Snowflake with various data sources and BI tools, streamlining data access and visualization

Optimized Snowflake data warehouse performance through efficient data modeling, indexing strategies, and query optimization techniques, improving analytics query response times and overall system efficiency.

Environment: Hadoop, Azure Data Factory, Azure Data Lake, Azure Storage, Azure SQL, Azure Datawarehouse, Azure Databricks, Azure Power Shell, Azure Synapse, Map Reduce, Hive, Spark, Python, Yarn, Tableau, Kafka, Sqoop, Scala, HBase.

Client: United Airlines, Chicago, IL Jul 2022 – Jul 2023

Role: Sr. Data Engineer

Responsibilities:

Involved in complete Software Development Life Cycle (SDLC) - Business Requirements Analysis, preparation of Technical Design documents, Data Analysis, Logical and Physical database design, Coding, Testing, Implementing, and deploying to business users.

Developed ETL pipelines using AWS Glue to integrate diverse data

Amazon RDS (Relational Database Service) for managing relational databases like MySQL or PostgreSQL, handling structured data from various sources.

Used Kinesis Firehose for efficient delivery of processed data to Amazon S3 and Redshift.

Designed and implemented infrastructure as code (IaC) using Terraform to automate the provisioning and configuration of Kinesis streams, ensuring consistent deployment across environments.

Proficient in Terraform for automating and managing AWS infrastructure deployments.

Configured Jenkins pipelines for building, testing, and deploying data applications, ensuring high code quality.

Implemented Hadoop ecosystem tools (Hive, Pig) for advanced data querying and transformation.

Designed and enforced IAM policies to control access to AWS resources based on the principle of least privilege.

Designing and deploying multi-tier applications with an emphasis on high availability, fault tolerance, and auto scaling on AWS Cloud Formation utilizing all of the AWS services (EC2, AWS GLUE, Athena, Lambda, S3, RDS, Dynamo DB, SNS, SQS, IAM, etc.)

Created advanced data analysis scripts using Python libraries (Pandas, NumPy), enhancing data insights and decision-making.

Monitored and maintained Kubernetes environments, addressing issues and ensuring reliable operations.

Designed and implemented ETL pipelines between various Relational Data Bases to the Data Warehouse using Apache Airflow.

Designing and deploying multi-tier applications with an emphasis on high availability, fault tolerance, and auto scaling on AWS Cloud Formation utilizing all of the AWS services (EC2, AWS GLUE, Athena, Lambda, S3, RDS, Dynamo DB, SNS, SQS, IAM, etc.)

Designing and deploying several applications that make use of practically all AWS services, with an emphasis on high availability, fault tolerance, and auto-scaling in AWS Cloud Formation, including EC2, RedShift, S3, RDS, Dynamo DB, SNS, and SQS.

Custom Kafka producers and consumers have been developed for a variety of publishing and subscribing to Kafka topics.

Writing code that optimizes performance of AWS services used by application teams and provide Code-level application security for clients (IAM roles, credentials, encryption, etc.)

Used Amazon EMR for map reduction jobs and test locally using Jenkins. Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark.

Create external tables with partitions using Hive, AWS Athena and Redshift. Developed the PySprak code for AWS Glue jobs and for EMR.

Good Understanding of other AWS services like S3, EC2 IAM, RDS Experience with Orchestration and Data Pipeline like AWS Step functions/Data Pipeline/Glue.

Environment: Hive, Spark, Python, Yarn, Tableau, Kafka, Sqoop, Scala, HBase, AWS, EC2 (Elastic Compute Cloud), S3, RDS, Glue, Lambda, RedShift, Cloudwatch, Snowflake, SQL, python, Pyspark, ETL,

Client: CVS Health, Monroeville, PA Aug 2020 – June 2022

Role: Data Engineer

Responsibilities:

Designed, deployed, maintained and lead the implementation of Cloud solutions using Microsoft Azure and underlying technologies.

Leveraged Azure Functions to develop serverless applications, integrating HTTP Triggers and Application Insights for enhanced monitoring and load testing via Azure DevOps Services.

Established CI/CD pipelines using Docker, Jenkins, TFS, GitHub, and Azure Container Services, achieving streamlined deployments and operational efficiency.

Automated Azure infrastructure provisioning using Terraform, optimizing resource management for virtual machine scale sets in production environments.

Utilized Ansible for comprehensive configuration management, including infrastructure setup and application deployments. Integrated monitoring solutions using Nagios and ELK stack for real-time operational insights.

Deployed the initial Azure components like Azure Virtual Networks, Azure Application Gateway, Azure Storage and Affinity groups.

Designed and implemented Java solutions for integrating with data warehouses such as Snowflake and Azure Synapse, optimizing data loading processes and ensuring efficient data storage and retrieval. Utilized Java libraries such as Apache POI for handling Excel files and Jackson for JSON processing, facilitating effective data manipulation and transformation as part of ETL workflows.

Implemented monitoring and logging for Java-based data applications using tools such as Log4j and SLF4J, ensuring visibility into application performance and facilitating troubleshooting and issue resolution.

Developed and deployed microservices architectures in Kubernetes for modular and flexible data solutions.

Monitored and maintained Kubernetes environments, addressing issues and ensuring reliable operations.

Configured Jenkins pipelines for building, testing, and deploying data applications, ensuring high code quality.

Integrated machine learning models into production systems, driving data-driven decision-making.

Ensured compliance with HIPAA regulations throughout ETL processes, maintaining data security and confidentiality standards required for healthcare data.

Knowledgeable in databases including PostgreSQL, MySQL, SQL server, and Oracle. Conducted capacity planning and architecture of Storage Accounts.

Developing the Databricks Notebooks using PySpark and extracted the multiple source files and loaded them into SQL tables.

Optimized Snowflake performance through query tuning and data partitioning strategies.

Implemented Databricks Delta for reliable data lake operations, enabling ACID transactions and versioning.

Prepared capacity and architecture plan to create the Azure Cloud environment to host migrated IaaS VMs and PaaS role instances for refactored applications and databases.

Implemented PostgreSQL database solutions for automotive applications, ensuring efficient data management and high availability.

Managed PostgreSQL databases for various client applications, optimizing performance and troubleshooting issues as needed.

Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test, and deploy.

Managed Terraform Cloud/Enterprise to manage infrastructure in a collaborative and scalable manner.

Environment: Hadoop, Azure Data Factory, Azure Data Lake, Azure Storage, Azure SQL, Azure Datawarehouse, Azure Databricks, Azure Power Shell, Azure Synapse, Map Reduce, Hive, Spark, Python, Yarn, Tableau, Kafka, Sqoop, Scala, HBase.

Client: Bank of America, India. Nov 2017 – Dec 2019

Role: Data Engineer

Responsibilities:

Used AWS Athena extensively to ingest structured data from S3 into other systems such as RedShift or to produce reports.

Worked with Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which include loading nested JSON formatted data into snowflake table.

Worked on the code transfer of a quality monitoring program from AWS EC2 to AWS Lambda, as well as the creation of logical datasets to administrate quality monitoring on snowflake warehouses.

The Spark-Streaming APIs were used to conduct on-the-fly transformations and actions for creating the common learner data model, which receives data from Kinesis in near real time.

Performed end- to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, S3, Athena, Glue and Kinesis.

Hive As the primary query engine of EMR, we have built external table schemas for the data being processed.

Using AWS Glue, I designed and deployed ETL pipelines on S3 parquet files in a data lake.

Create, develop and test environments of different applications by provisioning Kubernetes clusters on AWS using Docker, Ansible, and Terraform.

Optimized PySpark jobs through performance tuning and efficient resource management.

Worked on deployment automation of all the micro services to pull image from the private Docker registry and deploy to Docker Swarm Cluster using Ansible.

Integrated Snowflake with various data sources and BI tools, streamlining data access and visualization

Worked on scalable distributed data system using Hadoop ecosystem in AWS EMR.

Migrated on premise database structure to Confidential Redshift data warehouse.

into this application by using Hadoop technologies like PIG and HIVE.

Used JSON schema to define table and column mapping from S3 data to Redshift

On demand, secure EMR launcher with custom spark submit steps using S3 Event, SNS, KMS and Lambda function.

Monitored and tuned Hadoop clusters to maintain performance and resource efficiency.

Used the Multi-node Redshift technology to implement Columnar Data Storage, Advanced Compression, and Massive Parallel Processing

Environment: AWS, EC2, S3, RDS Glue, Lambda, RedShift, Cloudwatch, Snowflake, SQL, python, Apache Airflow, AWS Glue, Talend, JAVA, Informatica, Apache NiFi, Microsoft Azure Data Factory, Apache Spark, Fivetran, Stitch, Matillion, dbt, DataStage, Apache Flink.

Client: Media Mint, India. Apr 2015– Oct 2017

Role: Data Analyst

Responsibilities:

Actively involved in gathering requirements from end users, involved in modifying various technical & functional specifications.

Designed and implemented data models and reports in Power BI to help clients analyze data to identify market trends, competition, and customer behaviors.

Closely worked with ETL to implement Copy activity, Custom Azure Data Factory Pipeline Activities for On-cloud ELT processing. Created Azure DevOps pipeline for Power BI report deployments.

Worked on design, development of end user applications for the data presentation and analytics, including scorecards, dashboards, reports, monitors, and graphic presentation using Power BI and Tableau

Used Microsoft Power BI, designed dashboards and published the dashboards to the server, data gateway concept.

Developed and published reports and dashboards using Power BI and wrote DAX formulas and expressions.

Utilized Power Query in Power BI to Pivot and Un-Pivot the data model for data cleansing and data massaging

Created several user roles and groups to the end-user and provided row level security to them Worked with table and matrix visuals, worked with different level of filters like report level, visual level filter, page level filters.

Developed various solution driven views and dashboards by developing different chart types including Pie Charts, Bar Charts, Tree Maps, Circle Views, Line Charts, Area Charts, and Scatter Plots in Power BI.

Worked on data transformations such as adding calculated columns, manage relationships, create different measures, remove rows, replace values, split column, date & time column, etc.

Involved in designing/building complex stunning reports/dashboards using Filters (Slicers), Drill-down Reports, Sub reports and Ad-Hoc reports in Power BI Desktop.

Provided continued maintenance and development of bug fixes for the existing and new Power BI Reports.

Environment: Power BI, Azure, RDS, Snowflake, SQL, python, Apache Airflow, AWS Glue, Talend, JAVA, Informatica, Apache NiFi, Microsoft.

Educational Qualification:

Bachelor’s degree in Business Administration, Osmania University, India.



Contact this candidate