Resume

Azure Data Lake

Location:

Chicago, IL

Posted:

November 07, 2023

Contact this candidate

Resume:

Ali Mohammed

Phone: +1-779-***-****

Email: ad0w8c@r.postjobfree.com

Chicago, IL

Professional Summary:

Overall 9+ years of strong background in designing, developing, and maintaining complex data systems that process, store, and analyze vast amounts of data.

Fronted the modernization of on-premises data infrastructure by implementing cutting-edgecloud-based solutions, increasing the efficiency and scalability of data processing and analysis

Extensive experience in developing PySpark applications to process large-scale datasets and transforming them into meaningful insights.

Worked on various Azure/GCP data services such as Azure Data Factory, Azure Data Lake Storage, Azure Databricks, and Azure Data Explorer, Google Big Query, GCS, Composer, Data Proc to deliver end-to-end data solutions.

Expertise with Spark Core, SparkSQL for data processing. Used Apache Spark 2.0 and explored using Data Frames, RDD Spark SQL and Spark Streaming.

Great hands-on experience with PySpark for using Spark libraries by using python scripting for data analysis.

Used Azure Data Explorer to analyze ingested data in pipeline.

Created an Event grid ingestion method in Azure data explorer(ADX) to ingest data via ADLS.

Updated the policies in Azure Data Explorer staging table based on retention policies.

Executed codes in PL/SQL for Oracle database using functions and procedures

Experience in Implementing Continuous Integration using Jenkins and Git.

Managed GitHub repositories and permissions including branching, merging, and tagging.

Experience using Business intelligence tools like Tableau, PowerBI, and TIBCOSpotfire.

Expertise includes Data modeling, DB Design, Data cleansing, Data Validation, Data mapping identification &documentation, Data Extraction and Load Process from multiple Data sources, Data verification, Data Analysis, Transformation, Integration, Data import, Data export and use of multiple ETL tools.

Strong ability to reshape data using techniques like pivoting, melting, and merging for analysis and modeling.

Familiarity with libraries such as Pandas for efficient data manipulation in Python.

Proficient in using cloud-based data manipulation tools and platforms (e.g., AWS, Azure)

Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.

Extensive experience in developing and optimizing Stored Procedures, views to consume for Daily, Weekly Reports. Strong experience with all phases including Requirement Analysis, Design, Coding, Testing, Support, and Documentation.

Strong experience with Oracle Exadata for data migration.

Proficiency in data migration tools and methodologies.

Excellent problem-solving skills and attention to detail.

Proficiency in PL/SQL programming.

Experience with database migrations and ETL processes.

Knowledge of database performance tuning.

Experience in designing, developing, and maintaining data pipelines.

Strong understanding of data integration and ETL processes.

Knowledge of SQL and database management.

Experience with Azure NoSQL database services, including Cosmos DB or MongoDB.

Strong understanding of NoSQL database architecture and design.

Knowledge of data modeling and optimization for NoSQL databases.

Familiarity with data security best practices.

Developing and maintaining data pipelines and ETL processes to ensure timely and accurate data ingestion, transformation, and delivery using Azure Data Factory/Synapse Analytics.

Support the execution of Power BI projects, working alongside expert Principal Consultants and Solution Architects.

Expertise in loading data into Snowflake DB in the cloud from various sources. Validated the data feed from the source system to Snowflake DW Cloud platform.

Good understanding of Big Data Hadoop and Yarn architecture along with various Hadoop Demons such as Job Tracker, Task Tracker, Name Node, Data Node, Resource/Cluster Manager, and Kafka (distributed stream-processing).

Experience in Database Design and development with Business Intelligence using SQL Server 2014/2016, Integration Services (SSIS), DTS Packages, SQL Server Analysis Services (SSAS), DAX, OLAP cubes, Star Schema and Snowflake Schema.

Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.

Strong skills in visualization tools Power BI, Confidential Excel - formulas, Pivot Tables, Charts and DAX Commands.

Developed core modules in large cross-platform applications using JAVA, JSP, Servlets, Hibernate, RESTful, JDBC, JavaScript, XML, and HTML.

Experience in analyzing data using HiveQL, and MapReduce Programs.

Experienced in ingesting data into HDFS from various Relational databases like MYSQL, Oracle, DB2, Teradata, Postgres using sqoop.

Experienced in importing real time streaming logs and aggregating the data to HDFS using Kafka and Flume.

Experience in Developing Spark applications using Spark - SQL, Pyspark and Delta Lake in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.

Good understanding of Spark Architecture, MPP Architecture, including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.

Experienced and Implemented SAN TR migrations like Host based and Array based migrations.

Hands on Experience in performing Host based online SAN migrations.

Experienced working as Cloud Administrator on Microsoft Azure, involved in configuring virtual machines, storage accounts, resource groups.

Experience with MS SQL Server Integration Services (SSIS), T-SQL skills, stored procedures, triggers.

Azure Data Factory (ADF), Integration Run Time (IR), File System Data Ingestion, Relational Data Ingestion.

Leveraged relational databases with Tableau, Power BI, QuickSight to create effective visualizations.

Mastered development of applications/tools using Python and worked on several python packages like

NumPy, Pandas, Matplotlib, Seaborn, Scikit-learn, SciPy, Pytables etc.

Used traditional databases like MS SQL Server, MySQL, and Oracle to move data into Hadoop by using HDFS and performed Data Profiling and Data Analysis using SQL on different extracts.

Customized and branded Jira (server and cloud) and used Jira bash shell scripting. Documented the connection between Jira Align and Jira using align connectors.

Support and management of NoSQL database Install, configure, administer, and support multiple

Experience on Handling Heterogeneous data sources, IBM DB2, and XML files using SSIS.

Experience on RDBMS database design, Data warehouse, performance tuning, optimization, client requirement Analysis, Logical design, Development, Testing, Deployment, and Support.

Extremely attentive, strong team player with an ability to take on new roles.

Technical Expertise:

Big Data Tools

Spark, Hive, Hadoop, Airflow, Kafka

Programming Languages

Python, SQL, PL/SQL, Scala, Java, c++, HTML5, JQuery

Cloud Services

Azure Data Lake Storage, Azure Data Factory, Blob storage, Azure SQL DB, Azure Data Explorer ADX

Azure Databricks, Azure Event Hubs, Azure Key vault, Google Cloud Storage,

Pub/Sub, Composer, Data Proc

Databases/Data

Warehouses

Google Big Query, Azure Synapse Analytics, MySQL, SQL Server, Oracle, PL/SQL, esp AbInitio, Datastage, Snowflake, Teradata, Informatica, Talend, Altylex, Trifacta, SSIS

Visualization Tools

PowerBI, Tableau

Version & Containerization

Azure DevOps, Git, Jenkins, Kubernetes, Docker

Operating Systems

Unix, Linux, Windows, Mac OS

Certification:

Microsoft Certified: Azure Fundamentals (AZ -900)

Microsoft Certified: Azure Data Engineer Associate (DP-203)

Professional Work Experience:

Client: Walgreens, Chicago, Illinois. Nov 2019 – Present

Role: Senior Azure Data Engineer

Responsibilities:

Designed and build automated Azure cloud services for processing and storing data on daily basis in Azure Data Lake with the help of Azure Databricks Sparks which enabled availability of 60 TB structured data for data science projects.

Developed scalable, fault-tolerant pipelines from existing workflows into Azure datafactory pipelines for ingesting large volumes of data.

Refactored complex Alteryx workflows into Azure Databricks notebooks with PySpark and Pandas, developed data quality rules and end-to-end transformation logics, created Azure Data Factory pipelines for data flow orchestration, and implemented CI/CD pipelines with branching strategies using Azure DevOps.

Used Apache Kafka to integrate with Databricks clusters for streaming data.

Utilized Oracle Exadata to manage data migration from on-premises environments to cloud-based platforms.

Developed and executed Oracle data migration strategies to ensure a seamless transition to the cloud.

Collaborated with crossed-functional teams to identify data migration requirements and ensure data integrity throughout the process.

Created efficient and effective PL/SQL scripts, including stored procedures and triggers, to support data migration and ongoing data management.

Configured and managed Azure NoSQL databases, such as Cosmos DB or MongoDB, for efficient data storage and retrieval.

Monitored and optimized NoSQL database performance and scalability.

Implemented data security and access control policies for NoSQL databases.

Collaborate with data engineers and database administrators to optimize SQL queries and improve database performance.

Worked with Azure services, such as Azure Data Factory, to design and implement data migration solutions from on-premises databases to Azure Cloud.

Created data pipelines and workflows for data integration and transformation.

Develop and maintain data migration scripts for seamless data transfer.

Used Azure Data Explorer to analyze ingested data in pipeline and to generate graphical reports.

Created an Event grid ingestion method in Azure data explorer(ADX) to ingest data via ADLS.

Updated the policies in Azure Data Explorer staging table based on retention policies.

Created Clusters in Azure Data Explorer.

To trace any issue and diagnose in Databricks used Cluster logs, Databricks run logs, Databricks Event Logs, Databricks Unit logs, cluster History Logs, Cluster UI and Monitoring.

Responsible for interacting with stakeholders to gather requirements and prioritizing tasks according to the roadmap to use Epics, Stories used in Agile practices such as Sprint planning using Kanban Dashboards.

Utilized Databricks to process and analyze data as it arrives, allowing for timely insights and actions

Working on Azure Data Factory pipeline automating different cases, handling pipeline failures, optimizing tables, introducing automation for data validation, and enhancing pipeline utility with custom triggers for stored procedures.

Managing full lifecycle of on-premises to cloud data migration, using Azure Data Factory and T-SQL for transformation, loading data into Power BI's Fact and Dimension tables with incremental refresh, and establishing pipelines with T-SQL logics and Azure Logic Apps for handling failures and sending notifications.

Developed streaming applications using Apache Spark Streaming on Databricks for data analysis and monitoring.

Collaborated with DevOps engineers to develop automated CI/CD and test-driven development pipeline using azure as per the client requirement.

Developed automated pipelines to deployed 30+ KPIs for the Supply chain Customer Service hub, such as Fill Rate & its associated metrics like Overall Cuts in Dollars & cases, In Yard/In House cuts, on time Delivery, Unit Accuracy, Space Utilization, Etc.

Automated Azure Data Factory pipeline for data validation, reducing manual effort and ensuring data accuracy with a 99.9% success rate.

Create interactive data visualizations and dashboards using tools like Databricks notebooks, Delta Lake, and integration with visualization libraries.

Optimized tables in the cloud to increase query performance resulting in faster access to critical business information for stakeholders.

Collaborated with cross-functional teams to implement custom triggers for stored procedures, improving pipeline utility and increasing efficiency of data processing.

Developed ELT pipelines on Apache Airflow using python.

Designed and deployed data pipelines using Data Lake, Databricks, and Apache Airflow.

Collaborated on ETL tasks, maintaining data integrity and verifying pipeline stability.

Deployed Data Factory for creating data pipelines to orchestrate the data into SQLdatabase.

Working on Snowflake modeling and highly proficient in data warehousing techniques for data cleansing, Slowly Changing Dimension phenomenon, surrogate key assignment and change data capture.

Analytical approach to problem-solving; ability to use technology to solve business problems using Azure data factory, data lake and azure synapse.

Developed python scripts to grab data from external sources and ingest it into the DataLake.

Building data pipelines and applying data transformations for batch and real-time messaging systems.

Worked with large data sets (>1M+ records).

Share knowledge and train team members on Databricks usage.

Data Ingestion to at least one Azure Services – (Azure Data Lake, Azure Storage, Azure SQL,Azure DW) and processing the data in Azure Databricks.

Environment: Azure Databricks, Data Lake (Gen2), Data Factory (v2), Snowflake, MS SQL,PL/SQL

Cosmos DB, MongoDB, Teradata, Oracle, Postgres, Cassandra, Flume, Apache Flink, HDFS,

MapReduce, YARN, Spark, Hive, SQL, Python, Scala, Pyspark, GIT, JIRA, Jenkins, Apache Beam,

Apache Airflow, Docker, ADF Pipeline, Power Bi, NOSQL, Oracle, PostgreSQL, Cassandra.

Client: GEICO, BROOKLYN, NY Jan 2018 – Oct 2019

Role: Azure Data Engineer

Responsibilities:

Migrated from on-prem SQL Server DB to Azure and Re-engineering of Stored Proc jobs to PySpark.

Worked on Azure Data Lake storage, Azure Data Factory for orchestration, and Azure Databricks, with Spark, Python, and SQL on various tasks.

Utilized Azure CosmosDB and Azure SQL Database for low cost storing and used Azure Databricks with sparkSQL for the validation process.

Used Databricks AutoML for automated model tuning and selection.

Implemented Azure Synapse Link to build PowerBI reports as a cost saving mechanism. Integrated Azure monitor and Azure Log Analytics to monitor the logs and Resources.

Used Azure Data Factory to implement triggers and logic apps.

Wrote codes in Databricks notebooks for data analysis and debugging issues.

Worked on migrating data from blob storage to Snowflake by implementing Snowpipes.

Achieved 70% faster query run times utilizing Snowflake search optimizing service.

Optimized data storage costing by implementing micro partitions in snowflake.

Created CI/CD pipelines using Azure Devops to test and deploy solutions

Used Databricks instance pools and auto-scaling to manage resources efficiently.

Used Azure Repos for version controlling and code reviews and Azure Boards for SDLC tracking.

Designed and developed airflow DAGs in Python using different airflow operators.

Developed ELT/ETL pipelines to move data to and from Snowflake data store using combination of Python and Snowflake Snow SQL.

Collaborate with team members using Databricks Workspace to share notebooks, code, and insights.

Implemented Docker Swarm to deploy load balance, scale and manage Docker containers with multiple names spaced versions and integrated Cluster Management with Docker Engine using Docker Swarm.

An in-depth understanding of the terminologies, code sets, and standards of healthcare data.

Collaborated with data analysts, developers, and end users to ensure project requirements are met.

Built extraction and mapping rules for loading data from multiple sources for data warehouse based on MS Azure.

Implemented ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight.

Developed ADF Pipelines to load data from on perm to AZURE cloud Storage and databases.

Creating pipelines, data flows and complex data transformations and manipulations using ADF and SQL with Databricks.

Working with JIRA to report on Projects, and creating sub tasks for Development, QA, and Partner validation.

Worked on Using Azure Data Factory, Azure storage accounts, Data Lake store, Data Lake Analytics, Azure Automation Account, Azure Services, Azure Databricks, SQL, Oracle, PostgreSQL, Cassandra, Couch DB and Mongo DB.

Used Azure Key vault to store the secrets and configured the ADF pipeline to get the connection string secrets from the Key vault at the run time.

Responsible for implementing containerized based applications on Azure Kubernetes by using Azure Kubernetes service (AKS), Kubernetes Cluster, which are responsible for cluster management, Virtual Network to deploy agent nodes, Ingress API Gateway, MySQL Databases and Cosmo DB for stateless storage of external data, and setup reverse proxy Nginx in the cluster.

Deployed and optimized Python web applications to Azure DevOps CI/CD to focus on development.

Used Jupiter notebooks to analyze and connect the data from multiple sources.

Developed enterprise level solution using batch processing (using Apache Pig) and streaming framework using Spark Streaming, apache Kafka & Apache Flink.

Worked on migration of data from on perm SQL server to cloud databases (Azure Synapse Analytics (DW) and Azure SQD Db).

Environment: Azure HDInsight, Databricks, Data Lake (Gen2), Data Factory (v2), Azure DevOps, MS

SQL, Cosmos DB, MongoDB, Teradata, Ambari, Flume, Apache Flink, HDFS, MapReduce, Data

Warehousing, Azure Synapse, Advanced SQL, YARN, Spark, Hive, Sqoop, Python, Scala, GIT, JIRA,

Jenkins, Apache Beam, Apache Airflow, SVN, Kubernetes, Docker, Ansible, Terraform, UDF, Snappy,

LZO, ADF Pipeline, Power Bi, NOSQL, Oracle, PostgreSQL, Cassandra.

Client: Optum, Hartford, CT. Mar 2016 – Dec 2017 Role: Data Engineer

Responsibilities:

Involved in the complete Software Development Life Cycle SDLC process by analyzing business requirements and understanding the functional workflow of information from the source system to the destination system.

Advanced extensible reporting skills using SQL server Data Tool 2012 SSRS.

Designed and created Report templates based on the financial data.

Developed various types of Complex reports like Drill Down and Drill through reports.

Resolved bugs in the pipelines and addressed queries from end users.

Involved in designing Parameterized Reports for generating Ad hoc reports as per client requirements.

Coordinate with front-end for implementing logic in stored procedures and functions Experienced in writing Complex SQL Queries, Stored Procedures, Triggers, Views, Joins, Constraints, DDL, DML, and User Defined Functions to implement the business logic and created clustered and non-clustered Indexes.

Ingested data from disparate data sources and created data views to be used in Tableau.

Developed an SCRA website.

Developed Questionable, Error Response, Reports, and Admin modules for the SCRA website developed a secure web application, which includes a login form, which will authorize and authenticate the users by retrieving their login names and passwords from the database using ADO.Net.

Developed User Control according to the requirement.

Used ADO.net Connection-Oriented and disconnected Architecture for database connectivity extensively used ADO.Net objects to populate Data sets, Data grid, and Repeater controls for display and manipulation of records.

Used Tableau to create dashboards based for management.

Created a simple, aesthetic, and consistent User Interface with shortcuts, menus, forms, and controls inVB.NET.

Dealt with complex design and maintenance issues for SQL by implementing atomicity, consistency, isolation, and durability.

Used Data Grids, Data Sets, Data Views, and Data Adaptors to extract data from the backend Worked extensively on web forms and data binding controls like Grid View, Data List and drop-down boxes, and Mapping page fields to the database fields.

Wrote SQL queries to perform backend testing of the front-end application.

Used Web Forms and Server controls in ASP.Net.

Created Dynamic Controls on web pages.

Developed object interfaces for relational database access from the presentation layer.

Created session objects to maintain the session between ASP.NET Pages.

Successfully developed and deployed .NET applications following all coding standards.

Performed code reviews and assisted developers in optimization and troubleshooting.

Validating the whole functionality.

The system was deployed as an ASP.NET web application using VB.NET. Environment: T SQL, SQL Server2012, Microsoft SQL Server Management Studio, SQL Server Integration Services SSIS, SSRS, SSAS, VB.Net 4.5, and ASP.NET.4.5.

Client: Infosys, India Oct 2013– Sep 2015

Role: Data Analyst

Responsibilities:

Part of the team that developed Dashboard Applications for the Global clients.

Collaborated with L3 support team to implement proactive measures that reduced production outages by 40%, ensuring consistent application availability.

Collected, cleansed, and provided modelling and analyses of structured and unstructured data used for business initiatives.

Streamlined daily batch file processing by optimizing the Spark ETL process and reducing data loading time by 30%.

Created customized user configuration files to improve ETL filtering accuracy, resulting in a 25% reduction in false positive.

Collaborated with clients to understand their business needs and translated them into actionable Tableau reports.

Created interactive dashboards in Tableau to monitor daily sales data, providing end-users with actionable insights.

Presented monthly Tableau reports to executives, highlighting key performance metrics and suggesting process improvements.

Achieved a 20% reduction in customer service calls received over two months by analyzing and identifying areas for improvement in the data.

Monitored daily jobs that pull data from multiple sources into the Data Warehouse, proactively troubleshooting job failures to ensure data accuracy and completeness.

Conducted ad-hoc data analysis to identify trends and opportunities for process improvements, delivering insights to stakeholders.

EDUCATIONAL QUALIFICATION:

Master’s Education in Information technology from Wilmington University USA.

Contact this candidate