Data Engineer

Location:

Toronto, ON, Canada

Posted:

September 25, 2023

Contact this candidate

Resume:

Muneeb Adil

(Data Engineer)

Mobile number: +1-647-***-****

Email: **************@*****.***

LinkedIn: linkedin.com/in/muneebadil/

INTRODUCTION:

Hello! I am Muneeb Adil, an experienced Data Engineer with 5+ years in architecting and deploying robust data solutions using cloud platforms like AWS and Azure. Proficient in Python, SQL, and ETL processes, ensuring accurate data transformation for strategic decision-making. A dedicated collaborator committed to optimizing data-driven strategies and maintaining quality in dynamic environments.

PROFESSIONAL SUMMARY:

Over 5+ years of experience in Unified Data Analytics with Databricks, Databricks Workspace User Interface, Managing Databricks Notebooks, Delta Lake with Python, Delta Lake with Spark SQL.

Good understanding of Spark Architecture with Databricks, Structured Streaming.

Setting Up AWS and Microsoft Azure with Databricks, Databricks Workspace for Business Analytics, Manage Clusters in Databricks.

Experience in developing data pipelines using AWS services including EC2, S3, Redshift, Glue, Lambda functions, Step functions, CloudWatch, SNS, DynamoDB, SQS.

Proficiency in multiple databases like MongoDB, MySQL, ORACLE, and MS SQL Server.

Worked as team JIRA administrator providing access, working assigned tickets, and teaming with project developers to assess product requirements/bugs/new improvements.

Hands on experience in Test Driven Development (TDD), Behavior Driven Development (BDD) And Acceptance Test Driven Development (ATDD) approaches.

Provided full life cycle support to logical/physical database design, schema management and deployment. Adept at database deployment phase with strict configuration management and controlled coordination with different teams.

Experience in writing code in R and Python to manipulate data for data loads, extracts, statistical analysis, modeling, and data munging.

Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test, and deploy. Experience in working on creating and running docker images with multiple microservices.

Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and Mongo DB using Python.

Excellent communication skills. Successfully working in fast-paced multitasking environment both independently and in collaborative team, a self-motivated enthusiastic learner.

TECHNICAL SKILLS:

Cloud Technologies

AWS, Azure, Google cloud platform (GCP)

IDE’s

IntelliJ, Eclipse, Spyder, Jupyter.

Databases & Warehouses

Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, HBASE, NoSQL, SQL Server, MS Access, Teradata

Programming / Query Languages

Java, SQL, Python, NoSQL, PySpark, SQL, PL/SQL, Linux shell scripts, Scala.

Data Engineer / Big Data Tools / Cloud / Visualization / Other Tools

Databricks, WS, Azure Databricks, Azure Data Explorer, Azure Salesforce, Linux, Unix, Tableau, Power BI, SAS, We Intelligence, Crystal Reports.

Version Controllers

GIT, SVN, Bitbucket

Methodologies

Agile, Scrum

WORK EXPERIENCE:

About:

CPP Investments is a renowned investment management organization dedicated to ensuring Canadians' financial security throughout retirement. My position as a member of the Data Engineering team is critical in architecting a strong data infrastructure, integrating diverse datasets, and ensuring data accuracy for well-informed investment decisions across asset classes. Our combined efforts enable CPP Investments to optimize strategies by leveraging data-driven insights to achieve our mission of ensuring Canadians' retirement well-being.

Responsibilities:

Hands - on experience in Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis services, Application Insights, Azure Monitoring, Key Vault, Azure Data Lake.

Worked on creating tabular models on Azure analysis services for meeting business reporting requirements.

Have valuable experience working with Azure BLOB and Data Lake storage and loading data into Azure SQL Synapse analytics (DW).

Extract, transform and load data from source systems to Azure Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL of Azure Data Lake Analytics.

Data ingestion to one or more Azure services (Azure Data Lake, Azure Storage, Azure SQL DB, Azure SQL DW), and processing the data in Azure Databricks.

Have experience of working on Snow-flake data warehouse.

Moved the data from Azure Blob storage to snowflake database.

Designed custom Spark REPL application to handle similar datasets.

Used Hadoop scripts for HDFS (Hadoop File System) data loading and manipulation.

Performed Hive test queries on local sample files and HDFS files.

Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing.

Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, hive, HBase, Spark and Sqoop.

Building the pipelines to copy the data from source to destination in Azure Data Factory

Creating Stored Procedure and Scheduled them in Azure Environment

Experience in using SSIS tools like Import and Export Wizard, Package Installation, and SSIS Package Designer.

Experience in ETL processes involving migrations and in sync processes between two databases.

Analyzed Data Profiling Results and Performed Various Transformations.

Hands on Creating Reference Table using Informatica Analyst tool as well as Informatica Developer tool.

Written Python scripts to parse JSON documents and load the data in database.

Generating various capacity planning reports (graphical) using Python packages like Numpy and matplotlib.

Analyzing various logs that are been generating and predicting/forecasting the next occurrence of event with various Python libraries.

Hands-on experience with Snowflake utilities, SnowSQL, SnowPipe, Big Data model techniques using Python.

ETL pipelines in and out of data warehouse using combination of Python and Snowflakes SnowSQL Writing SQL queries against Snowflake.

Environment: Azure, ADF, Azure Databricks, Snowflake, Linux, Oracle 11g, SQL, SQL Server, MySQL, SSIS, Oracle

About:

Trillium Health Partners is a leading healthcare provider committed to providing outstanding patient care and advancing healthcare solutions. As a member of the Data Engineering team at Trillium Health Partners, I was responsible for architecting data solutions, coordinating with cross-functional teams for ensuring data accuracy, and optimizing data processing workflows. Our efforts facilitated seamless data integration and aided the hospital's mission to provide high-quality healthcare to our community.

Responsibilities:

Hands-on developing a data platform from scratch and took part in requirement gathering and analysis phase of the project in documenting the business requirements.

Worked closely with Data Scientists to know data requirements for the experiments.

Migrated from SAS application to pyspark.

Refactored the SAS code to pyspark SQL.

Used AWS EMR for Data Extraction, Transformation and Loading from homogeneous or heterogeneous data sources and dump the output parquet files into S3 for the modelers.

Developed scripts to load data to hive from HDFS and involved in ingesting data into Data Warehouse using various data loading techniques.

shell scripts to run the jobs on a Linux environment. Ingested the data from Restful API, Databases, and csv files.

Developing data processing tasks using Pyspark such as reading data from external sources, merging data, perform data enrichment and loading in to target data destinations.

Perform Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.

Used Django REST framework and integrated new and existing API's endpoints.

Worked on Django ORM API to create and insert data into the tables and access the database.

Used and customized NGINX server to check our developed project.

Implemented the use of Amazon EMR for Big Data processing among a Hadoop Cluster of virtual servers on Amazon related EC2 and S3.

Worked on building docker images and run jobs on Kubernetes cluster.

Extensive expertise using the core Spark APIs and processing data on an EMR cluster.

Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS.

Environment: AWS EC2, AWS S3, AWS ERM, Agile methods, MySQL

About:

ICICI Bank is a well-known financial organization that provides a wide range of services. Our responsibilities in the Data Engineering team at ICICI Bank included creating and maintaining reliable data solutions, working with other departments for seamless integration, and facilitating the provision of high-quality financial services to a range of customers.

Responsibilities:

Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.

Recommended structural changes and enhancements to systems and databases.

Conducted Design reviews and technical reviews with other project stakeholders.

Was a part of the complete life cycle of the project from the requirements to the production support.

Created test plan documents for all back-end database modules.

Used MS Excel, MS Access, and SQL to write and run various queries.

Worked extensively on creating tables, views, and SQL queries in MS SQL Server.

Collaborated with internal architects and assisting in the development of current and target state data architectures.

Coordinate with the business users in providing appropriate, effective, and efficient way to design the new reporting needs based on the user with the existing functionality.

Write Python scripts to parse JSON documents and load the data in database.

Generating various capacity planning reports (graphical) using Python packages like Numpy and matplotlib.

Analyzing various logs that are been generating and predicting/forecasting the next occurrence of event with various Python libraries.

Worked on data that was a combination of unstructured and structured data from multiple sources and automated the cleaning using Python scripts.

Extensively performed large data read/writes to and from csv and excel files using pandas.

Tasked with maintaining RDD's using SparkSQL.

Communicated and coordinated with other departments to collect business requirements.

Used python APIs for extracting daily data from multiple vendors.

Worked on AWS Data Pipeline to configure data loads from S3 to Redshift.

Environment: SQL, SQL Server, MS Office, MS Visio, SQL Server 2012, Jupyter, R 3.1.2, Python, SSRS, SSIS, SSAS, Microsoft office, Business Intelligence Development Studio.

EDUCATION:

Osmania University, India

Bachelor of Engineering, Information Technology

CPP Investments, Toronto, ON July 2022 - Present

Data Engineer

Trillium Health Partners, Mississauga, ON June 2021 - July 2022

Data Engineer

ICICI Bank July 2018 - May 2021

Jr. Data Engineer

Contact this candidate