Shumet Nigatu
******.******@*****.***
Summary
Senior Data Engineer with over 8 years of experience designing and optimizing scalable data pipelines and data warehousing solutions. Expertise in Talend, Snowflake, and AWS infrastructure enabled delivery of reliable, high-performance data solutions. Proven track record in mentoring teams, enhancing ELT processes, and driving data-driven decision-making. data modeling ETL processes Data warehousing pipeline development software development life cycle (SDLC) requirements analysis data mapping testing customer support collaboration data ingestion data transformation
storage performance tuning presentation dimensional modeling data engineering SQL queries, views, triggers, stored procedures generative AI Large Language Model (LLM) langchain
• Core Skills: data modeling, ETL processes, Data warehousing, pipeline development, software development life cycle (SDLC), requirements analysis, data mapping, testing, customer support, collaboration, data ingestion, data transformation, storage, performance tuning, presentation, dimensional modeling, data engineering, SQL queries, views, triggers, stored procedures, generative AI, Large Language Model (LLM), langchain, SQL Performance Tuning, Data Security
• Operating Systems: Windows, Sun Solaris, Linux (Ubuntu)
• Languages: PL/SQL, Python, Java, Unix Schell Scripting
• Big Data Technologies: Spark, Kafka, Hive
• Version Control: Gitlab, GitHub, Azure DevOps
• Cloud Platforms: Azure Cloud, AWS, EC2, RDS
• Databases: Oracle, MS SQL, Teradata, PostgreSQL, Hadoop, Snowflake, MongoDB, NoSQL
• ETL Tools: Databricks, Azure Data Factory, AWS Glue, S3, SQS
• Data Visualization: Power BI, Tableau
• Data Modeling Tools: ERWIN, Data Vault
• Development Tools: CI/CD, Orchestration Tools
PROFESSIONAL EXPERIENCE
Geico Insurance Mar 2020 - Present
Senior Data Engineer Chevy Chase, MD
• Designed and implemented end-to-end data pipelines using Azure Data Factory (ADF) to extract, transform, and load (ETL) data from multiple sources, demonstrating robust data engineering principles aligned with scalable architecture and Snowflake best practices.
• Integrated data from Amazon S3 buckets into Azure, ensuring seamless data ingestion and storage in Azure Blob Storage and Azure Data Lake while maintaining compliance with data governance standards.
• Implemented ADF mapping data flows for code-free data transformations, enhancing development efficiency and reducing time-to-de- livery.
• Monitored and optimized ADF pipelines for performance and cost-efficiency, ensuring smooth data operations, adherence to SLAs, and optimal resource utilization.
• Leveraged ADF's integration with Azure Databricks to perform advanced data transformations and analytics on large datasets, utilizing Python and PySpark for scalable processing.
• Developed and maintained Azure Databricks notebooks using PySpark and Python to ingest, transform, and load large-scale datasets into Delta Lake, ensuring reliability and scalability.
• Utilized Delta Lake to enable ACID transactions, schema enforcement, and time travel capabilities, thereby improving data reliability and auditability.
• Implemented Unity Catalog to centralize data governance, enabling fine-grained access control and data lineage tracking across the organization.
• Developed and maintained automated ETL pipelines to load data from Azure ADLS into Snowflake, applying Snowflake best practices for warehouse sizing and performance tuning.
• Wrote and optimized SQL queries in Snowflake to create, update, and delete records, ensuring accuracy and consistency of data.
• Designed and deployed automated Databricks workflows to process and transform streaming and batch data, ensuring timely availability for analytics.
• Developed and maintained data lakes and analytical platforms using Azure Databricks, emphasizing scalability, data security, and infrastructure as code (IaC) automation.
• Developed real-time data streaming applications using Azure Stream Analytics and Apache Kafka, enabling timely insights and responsive decision-making.
Bank Of America Jan 2015 - Mar 2021
Data Engineer Charlotee, NC
• Built scalable and reusable ADF pipelines to ingest and process large volumes of structured and semi-structured data from diverse sources, including S3 buckets, SQL databases, and flat files, integrating AWS cloud resources where applicable.
• Improved Databricks job performance by implementing partitioning and bucketing strategies, reducing data skew and optimizing cluster resource usage.
• Integrated REST APIs to fetch and process external data sources, expanding the breadth of available data for analytics.
• Implemented incremental data loading strategies to optimize pipeline performance and reduce processing time.
• Participated in the design, analysis, implementation, testing, and support of ETL processes for staging, ODS, and Mart layers.
• Developed and maintained ADF pipelines to ingest and transform data from CSV files, JSON data, and SQL tables into Snowflake and ADLS, adhering to effective data modeling and schema design principles.
• Automated data ingestion workflows from Salesforce and other sources using ADF triggers, reducing manual intervention and improving operational efficiency.
• Configured ADF pipelines to handle complex data transformations, including data cleansing, aggregation, and enrichment, ensuring high-quality data for downstream analytics.
• Integrated Salesforce data into Azure, enabling seamless data ingestion and storage in Snowflake and ADLS.
• Utilized ADF's mapping data flows for intuitive, code-free transformations that enhanced development efficiency and reduced time-to-delivery.
• Built scalable and reusable ADF data flows to process large volumes of structured and semi-structured data from diverse sources, including Salesforce, SQL Server, and cloud storage.