Data Warehouse Modeling

Location:

Charleston, IL

Salary:

50$/hr

Posted:

March 01, 2024

Contact this candidate

Resume:

Name: Laharika

Email: ************@*****.***

Phone No: 470-***-****

PROFESSIONAL SUMMARY:

Around 5+ years of Experience in Design, Development, Testing, Deployment and Support of various Data warehouse technologies including DATAIKU, Informatica, Datastage, Oracle, MySQL, UNIX Shell Scripting, PL/SQL, Autosys, AWS S3, ADF, Python, SAS and Snowflake. Additionally possesses extensive knowledge of working in Agile methodology and Git, bitbucket, JIRA.

Good understanding of ETL/Informatica standards and best practices, Slowly Changing Dimensions (SCD1, SCD2).

Strong knowledge of cloud data technologies, with a focus on Snowflake, Azure Data Warehouse, and AWS Redshift. Proficient in working with cloud storage solutions like Amazon S3 and Azure Blob Storage for efficient data storage and retrieval.

Expertise in sourcing the data from various Relational database systems like Oracle and MySQL

Strong understanding of Data warehouse concepts, ETL, data modeling experience using Normalization, Business Process Analysis, Re-engineering, Dimensional Data Modeling, Physical & Logical data modeling.

Optimized PySpark jobs, resulting in a [85%] improvement in overall performance.

Utilized PySpark to process and analyze streaming data, contributing to real-time decision-making capabilities.

Worked with various data sources, including HDFS and external databases, ensuring seamless data integration.

Proven experience in designing and building Software Development Kits (SDKs) and Application Programming Interfaces (APIs).

Demonstrated ability to create robust and user-friendly SDKs/APIs for seamless integration with external systems.Familiarity with distributed compute platforms, with a preference for expertise in Spark and DASK.

Proven ability to leverage distributed computing for scalable and parallel processing.Strong understanding of database system internals, including database design, architecture, and optimization strategies.

Experience in designing and implementing efficient database systems. Proficient in Snowpark framework for data integration and processing.

Hands-on experience with SQL, Stored Procedures, and Python for Snowflake utilities.

In-depth knowledge of Snowflake as a cloud-based data warehouse platform. Strong collaboration skills, working effectively with cross

Experienced on developing ETL pipelines in and out of data warehouse using combination of Python and Snowflake SnowSQL.

Developed and implemented data processing pipelines using Snowpark, enabling complex data transformations within Snowflake, resulting in improved data processing efficiency.

Designed and deployed real-time data ingestion pipelines using Snowpipe, ensuring continuous and automated loading of data into Snowflake from external sources.

Collaborated with data engineering team to optimize Snowflake usage, leveraging Snowpark and Snowpipe to streamline data workflows and enhance overall data pipeline performance.

Used ETL methodologies and best practices to create Talend ETL jobs. Followed and enhanced programing and naming standards.

Write highly tuned and performant SQLs on various DB platform including MPPs.

Develop highly scalable, fault tolerant, maintainable ETL data pipelines.

Have a good working experience in Hadoop, HDFS, Map-Reduce, Hive, Teg, Python, PySpark.

Worked with cloud architect to set up the environment and experience in building Snow pipe.

Operationalize data ingestion, data transformation and data visualization for enterprise use.

Define architecture, best practices, and coding standards for the development team.

Provides expertise in all phases of the development lifecycle from concept and design to testing and operation.

Skilled in working with Hive, SparkSQL, Kafka, and Spark Streaming for ETL tasks and real-time data processing.

Interface with business customers, gathering requirements and delivering complete Data Engineering solution.

Depth knowledge of the Data Warehousing Life Cycle, dimensional data modeling (star/snowflake schema), all types of dimensions and facts in data warehouse/data mart by using Extraction, Transformation and Loading (ETL) mechanism using Informatica, Datastage and Dataiku.

Experienced on migrating other databases to Snowflake, building Snowpipe, Snowflake Multi - Cluster Warehouses, Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which include loading nested JSON, PARQUET formatted data into snowflake table.

Performance tuning and optimization of the complex SQL’s in SnowSQL, MySQL.

Operationalize data ingestion, data transformation and data visualization for enterprise use.

Define architecture, best practices and coding standards for the development team.

Provides expertise in all phases of the development lifecycle from concept and design to testing and operation.

My experience extends to understanding enterprise-wide data warehouses, operational data stores (ODS), data marts, data sources, data extraction, and data standardization. I am proficient in integrating data from various data sources (S3, DB2, MS SQL Server, Oracle 10g/11g/19c) into data staging areas and target data warehouses (with cloud databases like Redshift and Snowflake).

Develop RPD and Creating analysis, dashboards in OBIEE 12c/11g.

Enough experience in AGILE (Scrum) Methodology, participated in daily scrum meetings, and being actively involved in sprint planning and product backlog creation.

Worked in every phase of entire BI data warehouse Life Cycle process that includes analysis, design, implementation, testing, deployment, documentation, Production Support, training and maintenance.

TECHNICAL SKILLS:

Data Integration Tools

Informatica Power Center, Datastage

AI/ML Tools

Dataiku

Databases

Oracle, MySQL, SQL server, DB2

Programming Languages

PL/SQL, Python, R, SAS, XML, Java and UNIX shell scripting

Cloud Technologies

Snowflake, SnowSQL, Snowpark, Amazon Web Services (AWS), S3, SNS, Azure Data factory

Data Modeling Tools

Toad Data Modeler, Erwin

Reporting

Power BI, Dataiku reporting, Excel reports.

Tools

TOAD, SQL Developer, Autosys, Bitbucket, Confluence, Jira, Putty, WinSCP

Project Execution Methodologies

Agile

Operating Systems

Unix, Windows

PROFESSIONAL EXPERIENCE:

Client: NorthHighland Atlanta, GA Oct 2022 – Till Date

Role: Snowflake/ETL Developer

Responsibilities:

Member of Data integration team, responsible for design and develop pipelines to build Data Warehouse

Responsible for integrating data from different SORs to build Data Warehouse on Snowflake

Implement Data migration pipelines to move on premise Data Warehouse to Snowflake

Build Data Lake on Snowflake by sourcing data from different SORs by creating pipelines to bring data from S3 using Snowpipe and SNS

Create Tasks, Streams and Procedures to build Dimensions and Facts on Data Warehouse

Create Views to share data to downstream users

Employ ETL (Extract, Transform, Load) processes, particularly using DATAIKU, to seamlessly transfer and transform data between different systems.

Utilize Snowflake as the cloud data warehouse to efficiently ingest, store, and manage large volumes of data.

Demonstrated track record in the creation of Software Development Kits (SDKs) and Application Programming Interfaces (APIs), showcasing expertise in their design and construction. Proven ability to develop robust and user-friendly SDKs/APIs, ensuring smooth integration with external systems.

Proficient in distributed computing platforms, particularly skilled in Spark and DASK, and adept at utilizing distributed computing for scalable and parallel processing.

Comprehensive understanding of database system internals, encompassing database design, architecture, and optimization strategies. Hands-on experience in the design and implementation of efficient database systems.

Proficiency in utilizing the Snowpark framework for data integration and processing, coupled with practical expertise in SQL, Stored Procedures, and Python for Snowflake utilities.

In-depth knowledge of Snowflake as a cloud-based data warehouse platform, with a focus on leveraging its capabilities. Strong collaboration skills, enabling effective teamwork across diverse functions to deliver successful SDKs and APIs. Clear and concise communication of technical concepts to both technical and non-technical stakeholders, fostering understanding and collaboration within the team and beyond.

Create pipelines for data ingestion tasks to pull source data to create data lake in Snowflake

Workflow Automation: Creating custom workflow components in Dataiku to automate data pipelines, model training, and deployment processes, ensuring efficient and consistent data processing and model updates.

Furthermore, I have utilized Dataiku's visualization capabilities to present data insights and findings in a visually compelling manner. By leveraging the recipe components, I have developed streamlined workflows, automated data processing tasks, and implemented data governance and security measures to ensure compliance and data integrity.

Performing thorough data profiling to lead database design, ETL design, and BI solutions.

Handing over conceptual models to data modelers based on data profiling and analysis and working closely to convert them to physical models.

Worked closely with the analytics team and developing high quality data pipelines in snowflake

Provided reliable and structured format data to meet different business needs

Responsible for integrating data from different Systems of Records (SORs) and provision data to downstream users in bank and federal regulators

Implemented changes required, involved in the migration of Oracle data warehouse and loading data from S3 files to Snowflake table loads

Worked on SnowSQL and Snowpipe, created Snowpipe for continuous data load

Created internal and external stage and transformed data during load. Redesigned the Views in snowflake to increase the performance

Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production.

Implemented One time Data Migration of Multistate level data from SQL server to Snowflake by using Python and SnowSQL.

Experience in working with Relational DB (RDBMS) like Snowflake, MYSQL, PostgreSQL, SQLite and No-SQL database MongoDB for database connectivity.

Responsible for Data Cleaning, feature scaling, and feature engineering using NUMPY and Pandas in Python.

Did extensive work with ETL testing including Data Completeness, Data Transformation Data Quality for various data feeds coming from source.

Developed ETL pipelines in and out of data warehouse using the combination of Python and Snowflake’s Snow SQL.

Utilize Snowflake to write reusable functions, optimizing data processing and storage efficiency.

Ensure compliance with data privacy and security regulations, including data encryption, access controls, and data masking techniques.

Utilize Python programming to perform complex logical operations and build insightful reports and visualizations.

Provide expertise and guidance to clients and internal teams regarding data management best practices, data governance, and compliance.

Environment: Snowflake, SAS, Snowpark, AWS (S3), Dataiku, Python, SQL, Oracle, Unix, Git, Bitbucket, Jira, Informatica, Power BI,Python, Data Warehouse.

Client: Neiman Marcus, Dallas, TX May 2021 – Aug 2022

Role: Snowflake Developer

Responsibilities:

Designed and implemented AWS solutions using various services such as EC2, S3, Lambda, and CloudFormation.

Developed and maintained automation scripts using Python and AWS CLI to automate infrastructure deployment and configuration.

Conducted security audits and implemented security best practices such as IAM roles, encryption, and access control.

Worked with cross-functional teams to identify business requirements and design technical solutions.

Provided support for production environments and resolved issues related to AWS infrastructure and applications.

Developed Python scripts to take backup of EBS volumes using AWS Lambda and Cloud Watch.

Created roles and access level privileges and taken care of Snowflake Admin Activities.

Implemented data pipelines and workflows to automate data integration and processing.

Conducted performance tuning and query optimization to improve query performance and reduce query times.

Ensured data security by implementing role-based access control, encryption, and auditing.

Collaborated with cross-functional teams to understand business requirements and translate them into technical solutions.

Provided production support and troubleshooting for Snowflake data warehouse and ETL.

Involved in creating new stored procedures and optimizing existing queries and stored procedures.

Creating scripts for system administration and AWS using languages such as BASH and Python.

Created Dax Queries to generate computed columns and Published reports and dashboards using Power BI.

Designed and developed Snowflake data models, schemas, tables, views, and stored procedures to support business requirements.

Developed ETL processes using Matillion and Informatica to extract, transform, and load data from various sources into Snowflake.

Creating Reports in Looker based on Snowflake Connections and Validation of Looker report with Redshift database.

Worked on replication and data mapping of ODS tables to Guidewire Claim Center type lists and entities.

Consulting on Snowflake Data Platform Solution Architecture, Design, Development, and deployment focused to bring the data driven culture across the enterprises.

Evaluate Snowflake Design considerations for any change in the application and build the Logical and Physical data model for snowflake as per the changes required.

Working on Initial Data Load (IDL) strategy, production checkouts to ensure reliability of data loaded, and providing production support based on incidents/tickets in CA service desk for on-call production support in all subject areas.

Validating the data from SQL Server to Snowflake to make sure it has source to target.

Consulting on Snowflake Data Platform Solution Architecture, Design, Development, and deployment focused to bring the data driven culture across the enterprises.

Driving is replacing every other data platform technology using Snowflake with the lowest TCO with no compromise on performance, quality, and scalability.

Designed and Created Hive external tables using shared Meta-store instead of derby with partitioning, dynamic partitioning, and buckets.

Implemented Apache PIG scripts to load data to Hive.

Responsible for Continuous Integration (CI) and Continuous Delivery (CD) process implementation- using Jenkins along with Python and Shell scripts to automate jobs.

Created Pre-commit hooks in Python/shell/bash for authentication with JIRA-Pattern Id while committing codes in SVN, limiting file size code and file type, and restricting development team to check-in while code commit.

Integration of Puppet with Apache and developed load testing and monitoring suites in Python.

Developed microservice on boarding tools leveraging Python and Jenkins allowing for easy creation and maintenance of build jobs and Kubernetes deploy and services.

Environment: Snowflake, Redshift, SQL server, AWS, AZURE, Python, ETL, TALEND, SVN, GIT, JENKINS, and SQL.

Client: Nextbrain Technologies Pvt ltd, India April 2020 – April 2021

Role: Snowflake Developer

Responsibilities:

Created data pipeline for several events of ingestion, aggregation & load consumer response data from AWS S3 bucket into Hive external tables.

Implemented a 'server less' architecture using API Gateway, Lambda, & Dynamo DB & deployed AWS Lambda code from Amazon S3 buckets. Created a Lambda Deployment function & configured it to receive events from your S3 bucket.

Developed workflow in to SSIS automate the tasks of loading the data into HDFS and processing using Hive.

Experience in Python Development and Scientific Programing and using MUMMY and Pandas in Python for Data Manipulation.

Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production.

Implemented One time Data Migration of Multistate level data from SQL server to Snowflake by using Python and SnowSQL.

Experienced with performing CRUD operations using HBase Java Client API and Solar API.

Created conceptual, logical & physical models for OLTP, Data Warehouse Data Vault & Data Mart Star/Snowflake schema implementations.

Worked with Terraform Templates to automate the Azure laaS virtual machines using terraform modules and deployed virtual machine scale sets in production environment.

Experience in working with Relational DB (RDBMS) like Snowflake, MYSQL, PostgreSQL, SQLite and No-SQL database MongoDB for database connectivity.

Responsible for Data Cleaning, feature scaling, and feature engineering using NUMPY and Pandas in Python.

Working experience with Kimball Methodology & Data Vault Modeling.

Did extensive work with ETL testing including Data Completeness, Data Transformation Data Quality for various data feeds coming from source.

Developed ETL pipelines in and out of data warehouse using the combination of Python and Snowflake’s Snow SQL.

Implemented microservices, application development, and migration using AWS/Azure services such as Azure DevOps, Kubernetes Service (AKS), Container Registry, Cosmos DB, and Grafana, Azure pipelines, Monitor, RBAC, AWS Kubernetes EKS, and Kubernetes API to run workloads.

Setup full CI/CD pipelines so that each commit a developer makes will go through standard process of software lifecycle & gets tested well enough before it can make it to the production.

Validated the data load process for Hadoop using the HiveQL Query’s.

Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).

Create Metadata validation plan and guide the team to create scripts to run and compare and two different environments.

Environment: Snowflake, Matillion, ETL, SQL server 2017,Snowpark, Hadoop, Hive, ETL, Jenkins, Kubernetes, DBT,Python, Azure, AWS.

Client: Accolite Software India Pvt Ltd Jan 2019 – March 2020

Role: ETL Developer

Responsibilities:

Involved in building the ETL architecture and Source to Target mapping to load data into Data warehouse.

Worked on Informatica Power Center tools- Designer, Repository Manager, Workflow Manager, and Workflow Monitor.

Responsible for Business Analysis and Requirements Collection.

Parsed high-level design specifications to simple ETL coding and mapping standards.

Used various transformations like Source Qualifier, Filter, Aggregator, Lookup, Sorter, Expression, Normalizer, Router, Sequence Generator, Update Strategy, Joiner, Stored Procedure, and Union to develop robust mappings in the Informatica Designer.

Created mapping documents to outline data flow from sources to targets.

Tuned the Informatica mappings for optimal load performance.

Identified sources, targets, mappings, and session bottlenecks and tuned them to improve performance.

Created SQL queries, for validating the target data against source data.

Extracted data from different sources like Oracle, Flat files and SQL server databases into the staging area and then to the Fact tables.

Extensively involved in writing queries for testing the target data against source data.

Supported during QA/UAT/PROD deployments and bug fixes.

Developed mappings to load into staging tables and then to Dimensions and Facts.

Modified existing mappings for enhancements of new business requirements.

Created Materialized Views using the Dimension and Fact tables.

Involved in code Reviews as per ETL/Informatica standards and best practices.

Followed Informatica recommendations, methodologies and best practices.

Environment: Informatica PowerCenter/Developer 9.6.1(HF2), SQL Server Management Studio, UNIX, Oracle 11g, Oracle 12c, SQL, PL/SQL, SQL developer, SQL SERVER, Linux, Microsoft Visio Standard 2012.

Contact this candidate