Prav Dhakal
**********@*****.*** +1-425-***-**** LinkedIn
PROFESSIONAL SUMMARY:
Results-driven Data Engineer with 7+ years of experience designing and optimizing scalable data pipelines, ETL workflows, and cloud-based big data solutions. Expertise in Python, Apache Spark (Scala, PySpark), Snowflake (data modeling, optimization, secure sharing), SQL, AWS, and Kafka, with a strong focus on real-time data streaming, data modeling, and performance optimization. Adept at building high-performance distributed systems, implementing dimensional modeling
(Star & Snowflake Schema), and ensuring data integrity, quality, and governance. Proven ability to automate workflows, enhance data processing efficiency, and deliver actionable business insights using Tableau, Power BI, and advanced SQL techniques. Experienced in cloud infrastructure management, leveraging AWS (EMR, S3, EC2, RDS, Lambda) and Azure for scalable, cost-effective data solutions. Experienced with CI/CD automation and orchestration tools, including Apache Airflow and Jenkins. A collaborative team player with a track record of driving data-driven decision-making and improving operational efficiency in Agile-driven environments. TECHNICAL SKILLS:
• Programming Languages: Python, SQL, Scala, Java
• Databases: Snowflake, AWS RDS, Teradata, Oracle, MySQL, PostgreSQL
• NoSQL Databases: MongoDB, Hadoop HBase, Apache Cassandra
• Cloud Platforms: AWS (EMR, S3, EC2, RDS, Lambda, DynamoDB, Athena, Step Functions), Microsoft Azure (ADF, Synapse, Key Vault, Databricks, Azure ML, Azure DevOps)
• Big Data & Processing: Apache Spark (Scala, PySpark), Hadoop, Hive, MapReduce, Kafka
• Querying & ETL Tools: SQL, NoSQL, PostgreSQL, Jenkins, Apache Airflow, dbt(Cloud&CLI)
• AI/ML Tools & Concepts: Scikit-learn, Azure ML, MLflow, Feature Engineering, Model Deployment, MLOps
• Reporting & Visualization: Tableau, Power BI, Amazon QuickSight
• Operating Systems: MacOS, Linux, Windows
EXPERIENCE
Client: Verizon, Irving, TX Oct 2024 – To date
Role: Data Engineer
• Collaborated with multi-disciplinary teams, including IT, business, software engineers, and data scientists, to deliver innovative data engineering solutions within Data Works.
• Developed and managed ETL pipelines using Azure Databricks, leveraging PySpark and Scala for large-scale data.
• Responsible for obtaining data from the system of record and establishing batch or real-time data feeds to provide analysis in an automated fashion.
• Designed and maintained near-real-time data pipelines using Kafka, Airflow, and Delta Lake, enabling operational reporting and reducing end-to-end latency.
• Built automated data transformation and validation tools using Python and PySpark, improving data reliability and minimizing manual effort.
• Collaborated cross-functionally with software engineers, product owners, and data scientists to define requirements and develop end-to-end data solutions supporting business KPIs.
• Established data quality and observability standards, leveraging Airflow logging and alerting mechanisms to ensure pipeline reliability and SLA adherence.
• Integrated CI/CD practices into data workflows using Azure DevOps and Jenkins, improving deployment speed and reproducibility.
• Worked with Delta Lake for efficient storage, versioning, and schema evolution in Databricks.
• Designed and implemented a medallion architecture in Azure Databricks using Delta Lake, structuring data across Bronze (raw ingestion), Silver (cleansed and enriched), and Gold (business-ready) layers to support analytics, AI/ML, and reporting use cases.
• Designed and optimized Teradata SQL queries for high-performance data retrieval and reporting.
• Designed data pipelines for model training and inference using Databricks.
• Designed and implemented a configurable data delivery pipeline for scheduled updates to customer-facing data stores built with Python.
• Automated advanced SQL queries and ETL techniques using Apache Airflow to reduce boring weekly administration tasks.
• Integrated machine learning model outputs into data pipelines to support predictive analytics for customer segmentation and sales forecasting.
• Managed data ingestion via Snowflake share, tracking report attributes like codes, State Purchased, and delivery destinations while performing manual Code Validation.
• Expertly manage and track data engineering tasks and projects using JIRA, ensuring timely completion and resolution of issues through well-defined workflows and thorough documentation.
• Involved in Agile methodologies, daily scrum meetings, and sprint planning. Environment: Python, Scala, PySpark, Azure Databricks, Delta Lake, Azure Synapse Analytics, Azure Data Factory, Azure Key Vault, Azure ML, Teradata, Snowflake, Apache Airflow, SQL, Power BI, Tableau, JIRA, Agile Client: 7-Eleven, TX Sep 2022 – Sept 2024
Role: Data Engineer
• Involved in the full Software Development Life Cycle (SDLC), including requirements analysis, technical design, data analysis, and deployment, ensuring efficient data workflows.
• Developed and optimized Spark jobs using Scala for data cleansing, transformation, and batch processing, reducing processing time by 30% and enhancing pipeline efficiency.
• Designed and implemented ETL workflows, performing data acquisition, transformation, cleansing, and storage in HDFS, ensuring data consistency and integrity.
• Built real-time event processing pipelines using Apache Storm and Kafka, enabling seamless data ingestion and integration across multiple servers.
• Configured Kafka consumers and S3 sink connectors using Kafka Connect on Amazon MSK to stream and persist event data in S3, enabling cost-effective storage, archiving, and downstream processing for analytics and ML use cases.
• Designed and developed dimensional data models (Star and Snowflake Schemas) to optimize storage, querying, and reporting for analytics.
• Created interactive Tableau and Power BI dashboards, incorporating real-time operational metrics, scatter plots, and geographical maps for business decision-making.
• Automated data quality checks and monitoring using Python scripts and Apache Airflow, ensuring data integrity and timely issue resolution across ETL pipelines.
• Explored and integrated AI/ML capabilities to improve data insights, driving innovation in engineering and operational analytics.
• Configured and troubleshooted AWS cloud services (EC2, S3, RDS, ELB, CloudWatch, IAM) for efficient server migration and cloud infrastructure management.
• Implemented SQL and PL/SQL stored procedures, optimizing database performance and query execution.
• Worked within Agile and Scrum methodology, participating in code reviews, bug fixing, and sprint planning to enhance development efficiency.
Environment: Scala, Spark, Kafka, Apache Storm, AWS (EC2, S3, Glue, RDS, CloudWatch, MSK IAM, Athena, Step Functions), Snowflake, Redshift, HDFS, SQL, PL/SQL, Tableau, Power BI, Apache Airflow, Agile, Git Client: Southwest Airlines, TX May 2020 – Aug 2022 Role: Data Engineer
• Involved in requirements gathering, analysis, design, development, deployment, and change management for data engineering projects.
• Developed and optimized Python-based ETL pipelines using PySpark for data extraction, transformation, and aggregation across multiple sources.
• Designed and implemented Snowflake data models, schemas, and stored procedures to support analytical and reporting needs.
• Built real-time and batch data pipelines using AWS services (EC2, S3, RDS) for scalable data processing.
• Created and maintained Power BI and Tableau dashboards, transforming complex datasets into actionable insights for business teams.
• Developed Scala-based Spark applications for performing data cleansing, data aggregation, and maintaining data pipelines, ensuring efficient ingestion.
• Utilized dimensional modeling techniques (Star Schema, Snowflake Schema) for optimized data warehousing.
• Supported predictive analytics use cases by preparing and structuring data for machine learning models in collaboration with data science teams.
• Developed and maintained PySpark jobs for parallel processing, enhancing data processing efficiency.
• Followed Agile methodologies, actively participating in Scrum meetings, Sprint planning, and iterative reviews to ensure project alignment.
Environment: Python, PySpark, Scala, AWS (Glue, Lambda, EC2, S3, RDS), Snowflake, Redshift, SQL, Tableau, Power BI, Apache Spark, Dimensional Modeling, Agile
Client: Merck Group, PA Nov 2019 – Apr 2020
Role: Data Engineer
• Executed SQL queries for data profiling and analysis, supporting dimensional and relational data warehouses and ensuring data quality across various environments.
• Designed and implemented a fully automated data pipeline for report creation, QA Consistency checks, and delivery to the customer’s SFTP server, with email notifications for Data Availability.
• Collaborated with business users to gather requirements and define data-driven solutions, ensuring alignment with business objectives.
• Performed data extraction, cleansing, transformation, and validation using SQL and Python to maintain high data quality.
• Developed Spark applications in Python (PySpark) on a distributed environment to load a huge number of CSV files with different schemas into Hive ORC tables.
• Designed and developed interactive Tableau dashboards and Power BI reports, incorporating advanced visualizations and KPIs for business insights.
• Executed complex SQL queries for data analysis, reporting, and trend identification in relational and dimensional data warehouses.
Environment: SQL, Python, PySpark, SQL, Hive, ORC, Azure Synapse, Azure Data Factory, Azure DevOps, Azure Key Vault, Tableau, Power BI, Excel, Data Warehousing, ETL, Agile Client: Western Union, WI Feb 2018 – Oct 2019
Role: Data Engineer
• Executed SQL queries for data profiling and analysis, supporting dimensional and relational data warehouses and ensuring data quality across various environments. Collaborated with business users to gather requirements and define data-driven solutions, ensuring alignment with business objectives.
• Developed Hadoop-based ETL pipelines using Pig and Hive, increasing data processing efficiency.
• Performed data extraction, cleansing, transformation, and validation using SQL and Python to maintain high data quality.
• Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries, and writing data back into the OLTP system through Sqoop.
• Managed NoSQL Data Processing workflows in Kafka and Spark Streaming, reducing processing time. Environment: Python, Scala, SQL, Azure Data Factory, Azure Synapse, Hadoop, Pig, Hive, Spark, MapReduce, Sqoop, Kafka, Tableau, Data Warehousing, ETL, Agile