Post Job Free

Resume

Sign in

Data Analyst Engineer

Location:
Columbus, OH
Posted:
December 15, 2022

Contact this candidate

Resume:

Rahila

Email: adt2qj@r.postjobfree.com

LinkedIn: linkedin.com/in/rahila-r-91606324a

Phone: 234-***-****

Data Engineer

PROFILE SUMMARY

As a data engineer, I have 8+ years of experience in analysis, design, development, and implementation.

Expert in offering ETL solutions for all business models.

Extensive working knowledge of big data technologies such as HDFS, MapReduce, YARN, Apache Cassandra, NoSQL, Spark, Python, Scala, Sqoop, HBase, Hive, Oozie, Impala, Pig.

Developed Spark applications using Spark-SQL and PySpark in Databricks for data extraction, transformation, and aggregation from multiple file formats for Analyzing& transforming the data to uncover insights into the customer usage patterns.

Developed a variety of read/write strategies and worked on migrating Cassandra and the Hadoop cluster to AWS glue.

Knowledge of scripting, data pipeline construction, and Unix/Linux systems.

Extensive expertise with Amazon Web Services such as Amazon EC2, S3, RDS, IAM, Auto Scaling, CloudWatch, SNS, Athena, Glue, Kinesis, Lambda, EMR, Redshift, and DynamoDB.

Worked on ETL Migration services by creating and deploying AWS Lambda functions to provide a serverless data pipeline that can be written to Glue Catalog and queried from Athena.

Analytics and cloud migration from on-premises to AWS Cloud with AWS EMR, S3, and DynamoDB.

Involved in data modeling and dimensional modeling with 3NF, Star and Snowflake schemas for OLAP and ODS (Operational data store) applications.

Expertise in data warehouse ETL skills with Informatica 9.1/8.6.1/8.5/8.1/7.1 PowerCenter Client tools: Mapping Designer, Repository Manager, and Workflow Manager/Monitor and Server tools: Informatica Server, Repository Server.

Build real time streaming pipeline utilizing Kafka, Spark Streaming and Redshift.

Worked on scheduling all jobs using Airflow scripts using python added different tasks to DAG, LAMBDA.

Orchestration experience using Azure data factory, Airflow 1.8, Airflow1.10 on multiple cloud platforms and understand the process of leveraging the airflow operations.

Implemented BI Data Warehouse standards at various SDLC levels: designed and customized data models for a data warehouse that support real-time data from a variety of sources.

Experienced with Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Big Data Technologies (Apache Spark), and Databricks.

Excellent understanding of technologies on systems that include huge amounts of data and run in a highly distributed fashion in Cloudera, Hortonworks Hadoop distributions, and Amazon AWS.

Extensive experience developing and implementing cloud architecture on Microsoft Azure.

Experience in creating and managing reporting and analytics infrastructure for internal business clients using AWS services including Athena, Redshift, Spectrum, EMR, and Quick Sight.

Possesses strong SQL and programming knowledge to deal with data transformations/translations.

In-depth knowledge of relational data modeling, dimensional data modeling, and design; extensive experience in data analysis using SQL and MS Excel.

Experience with data quality testing; validating source and & database using data mapping/transformation rules.

Proficient in SQL Server and T-SQL (DDL and DML) in constructing tables and normalization techniques on database tables.

Developing complex SQL queries, stored procedures, triggers, views, cursors, joins, constraints, DDL, DML, and user-defined functions to implement the business logic.

Worked on GUI based ETL Process using Teradata Studio Express, Snowflake, Oracle SQL.

Experience in DW, Business Intelligence and ETL with hands-on experience in Teradata, Snow SQL Script

Utilize data visualization tools like Tableau and business intelligence tools like Business Objects.

extensive expertise in scheduling and process automation scripting using Python and the Shell

Involved in various projects related to data modeling and system/data analysis; led design and development for both OLTP and Data Warehousing (OLAP) environments.

Knowledge of working in SSIS platform to create ETL packages to validate, extract, transform and load data to data warehouse/data mart databases.

Knowledge of how to import and export data from RDBMS to HDFS and Hive with SQOOP.

Knowledge of SSRS, SSIS, and T-SQL on Microsoft SQL Server.

Worked on production issues in Data Warehouses like reloading data, transformations, and translations

Led operations that helped increase revenue, reduce costs, reduce risk, increase quality, increase efficiency, and solve complex business problems.

TECHNICAL SUMMARY

Database Programming: T-SQL, Dynamic SQL, MDX, DAX, Snow SQL

Development Tools: BIDS, SSDT, SSMS

Integration Tool: SSIS

Analysis Services: SSAS, OLAP Cubes, Tabular Model

Reporting Tools: SSRS, Power BI, Excel Power BI, jira

Source Control & Collaboration: Team Foundation Server (TFS), SharePoint

Cloud Experience: Azure, AWS

Database: SQL Server 2017, 2016, 2014, 2012, Azure SQL

SDLC: Agile, Scrum, Waterfall, and Spiral

Data Modelling: ER-win, MS Visio

Snowflake: Snow SQL, Snow pipe, Zero-copy Cloning, Time travel, Stages, RM

Programming: Python, PySpark, Scala, Spark, perl

PROFESSIONAL EXPERIENCE

Client: Cummins Columbus, Indiana January 2021 to Present Data Engineer

Identified the business requirements by going over the business process and interacting with the business analyst and management.

Developing and maintaining end to end operations of ETL data pipeline and working with large data sets in azure data factory.

Building the data pipeline using Azure Service like Data Factory to load the data from Legacy, SQL server to Azure Data warehouse using Data Factories and Databricks Notebooks.

Consumed data from Kafka into Databricks to perform various transformations and manipulations with the data using Python and PySpark scripts.

Developed various data ingestion frameworks to load the data using SSIS (SQL server integration service) into Azure SQL server.

Designed, built, and implemented analytically and enterprise applications using machine learning, Python, and Java.

Directed full lifecycle project management for projects with a special focus on data management, data warehousing, and business intelligence solutions.

Design and implement data base solutions in Azure SQL data warehouse, azure SQL.

Querying Snowflake using SQL. Performed data analysis in Snowflake by creating tables and views

Design and development of various Azure data lake solutions using the blob storage, ADF (Azure data factory) and Azure SQL database.

Worked on developing Kafka Producers and Kafka Consumers for streaming millions of events per second on streaming data.

Developed data mapping, data governance, transformation, and cleansing rules involving OLTP and OLAP.

Resolved multiple data governance issues to support data consistency at the enterprise level.

Responsible for technical data governance, data modeling, and database design.

Developed complex end to end Azure data factory pipeline to read the data from blob storage and load into Azure SQL data warehouse.

Created azure SQL database, performed monitoring, and restoring azure SQL database. Performed migration of Microsoft SQL server to azure SQL database.

Worked on creating, debugging, scheduling, and monitoring jobs using Airflow for ETL batch processing to load into snowflake for analytical process.

Designed and developed components using Python to retrieve and manipulate data.

Led data analysis for existing data warehouse and changed the internal schema for performance.

Used Erwin tool for dimensional modeling of the staging database as well as the relational data warehouse.

Designed critical staging procedures to load data marts and report databases with multiple features.

Led development and automation of data reconciliation T-SQL queries and comparison against business reports to validate the overall data migration.

Implementing ETL pipelines within and outside of a data warehouse using Python and Snowflakes Snow SQL.

Fine-tune performance of SQL queries using windows performance monitor and SQL profiler in SQL server.

Designed complex T-SQL queries, and user-defined functions; stored procedures and triggers followed by thorough analysis and testing of those database objects before deployment to the production server.

Designed and developed

Worked on developing and designing data integration solutions using informatica power center and Teradata utilities for handling large volumes of data.

Development of scripts using Python for loading, extracting, and transforming data.

Client: Macy's, New York, NY September 2018 to December 2020

BI Data Engineer

Development of scripts using Python for loading, extracting, and transforming data.

Extensively used AWS Athena to import structured data from S3 into other systems such as Red Shift or to generate reports.

Worked with Spark to improve the speed and optimization of Hadoop's current algorithms.

Migrated an existing on-premises application to AWS. AWS services such as EC2 and S3 were used for data set processing and storage. Experienced in maintaining a Hadoop cluster on AWS EMR.

Created a Data Pipeline utilizing Processor Groups and numerous processors in Apache Nifi for Flat File, RDBMS as part of a Proof of Concept (POC) on Amazon EC2.

Developed data pipeline using Sqoop, HQL, Spark and Kafka to ingest Enterprise message delivery data into HDFS.

Worked on scheduling all jobs using Airflow scripts using python added different tasks to DAG, LAMBDA.

Migrated an existing on-premises application to AWS. AWS services such as EC2 and S3 were used for data set processing and storage. Experienced in maintaining a Hadoop cluster on AWS EMR.

The Spark-Streaming APIs were utilized to perform on-the-fly transformations and actions for the common learner data model, which gets data from Kinesis in near real-time.

Performed end-to-end architecture and implementation evaluations of different AWS services such as Amazon EMR, Redshift, S3, Athena, Glue, and Kinesis. Hive We created external table schemas for the data being processed as the primary query engine of EMR.

Created Apache presto and Apache drill configurations on an AWS EMR (Elastic Map Reduce) cluster to integrate different databases such as MySQL and Hive. This allows for the comparison of outcomes such as joins and inserts on many data sources controlled by a single platform.

AWS RDS (Relational database services) was created to act as a Hive meta store, and metadata from 20 EMR clusters could be integrated into a single RDS, preventing data loss even if the EMR was terminated.

Developed and implemented ETL pipelines on S3 parquet files in a data lake using AWS Glue.

Developed a cloud formation template in JSON format to utilize content delivery with cross-region replication using Amazon Virtual Private Cloud.

The AWS Code Commit Repository was utilized to preserve programming logic and scripts, which were subsequently replicated to new clusters.

Implemented Columnar Data Storage, Advanced Compression, and Massive Parallel Processing using the Multi-node Redshift technology.

Involved in the development of the new AWS Fargate API, which is comparable to the ECS run task API.

Worked on the code transfer of a quality monitoring application from AWS EC2 to AWS Lambda, as well as the construction of logical datasets to administer quality monitoring on snowflake warehouses.

Proficient with container systems such as Docker and container orchestration tools such as EC2 Container Service, Kubernetes, and Terraform. Worked on creating workloads HDFS on Kubernetes clusters to mimic the production workload for development purposes.

Client: Molina Healthcare, Irving, TX February 2017 to September 2018

Business Data Analyst / Power BI developer

Conducted design reviews with the Business Analysts and Content Developers to create a proof of concept for the reports.

Developed distributed high-performance systems with Spark and Scala.

Created Scala apps for loading/streaming data into NoSQL databases (MongoDB) and HDFS is preferred.

Performed T-SQL tuning and query optimization for SSIS packages.

Used MAESTRO tool to load the data into SQL servers by creating the cost jobs.

Developed distributed algorithms for detecting and successfully processing data trends.

Created an SSIS package to import data from SQL tables into various Excel sheets.

Used Spark SQL to pre-process, clean, and combine big data sets.

Developed the SQL server database system to optimize performance.

Performed migration of databases from conventional data warehouses to spark clusters.

Performed frequent cleaning and integrity tests, you can ensure that the data warehouse was only loaded with high-quality entries.

Designed ETL packages dealing with different data sources (SQL Server, Flat Files) and loaded the data into target data sources by performing different kinds of transformations using SSIS.

Modifying scripts to handle automated Loading/Extraction and Transformation (ETL) of data using SSIS.

Responsible for coding SSIS processes to import data into the Data Warehouse from Excel spreadsheets, Flat Files and OLEDB Sources.

Configured Azure Encryption for Azure Storage and Virtual Machines, Azure Key Vault services to protect and secure the data for cloud applications.

Worked on various tasks and transformations like Execute SQL Task, Execute Package Task, and Conditional split, Script Component, Merge and Lookup while loading the data into Destination.

Ingested data into Azure Blob storage and processed the data using Databricks. Involved in writing Spark Scala scripts and UDFs to perform transformations on large datasets.

SSIS packages were implemented to use in a SQL Agent Job using on-premises SQL Server and connect to Azure SQL database using an encrypted connection.

Performed Incremental load with several Dataflow tasks and Control Flow Tasks using SSIS.

Developed SQL queries to extract data from existing sources and validate format correctness.

Created automated tools and dashboards for collecting and displaying dynamic data.

Power BI Based Patient Registration Monitoring Application.

Created a Power BI data model based on analysis of the end-user workflow data provided by the client.

Imported data from SQL Server DB, Azure SQL DB to Power BI to generate reports.

Developed analysis reports and visualization using DAX functions like table function, aggregation function, and iteration functions.

Produced custom visualizations for the client to monitor all patient registration-related workflow, thus giving excess to their supervisor team to monitor high error rate departments

Presented application to Hospital’s Executive Team and put the application into a production system for them to use SQL & Tableau Projects.

Designed, developed, tested, and maintained Tableau functional reports based on client requirements

Extracted data from SQL Server database tables into flat data files and Excel sheets using table export and BCP for easy data migration

Involved in performance tuning using Activity Monitor, Performance Monitor, SQL Profiler, SQL Query Analyzer, and Index tuning wizards.

Used COSMOS streams to load the data into databases using Maestro and generated reports using that data bases on the stake holders' requirements.

Implemented batch processing using Jobs and DTS.

Involved in Mock testing, Unit Testing, and Integration testing in SSIS, SSAS.

Developed advanced SQL queries with multi-table joins, group functions, subqueries, set operations, and T-SQL stored procedures, user-defined functions (UDFs) for data analysis

Optimized data collection procedures and generated reports on a weekly, monthly, and quarterly basis.

Designed the user interface screens and implemented application logic under the Microsoft .NET framework using C# to use .NET features powered with CLR.

Client: Careator Technologies Pvt Ltd Hyderabad, India May 2014 to November 2016

SQL Database Developer

Conducted design reviews with the Business Analysts and Content Developers to create a proof of concept for the reports.

Designed & maintained databases using Python; troubleshot, fixed, and deployed Python bug fixes successfully.

Transformed logical model based on business requirements and defined various constraints using Erwin.

Analyzing the current Teradata usage and designing the integration plan.

Implemented dynamic SQL to develop customizable queries, answerable by the OLTP Server.

Optimized queries by modifying T-SQL queries and eliminated redundant data.

Involved in Migrating Objects from Teradata to Snowflake.

Maintained SQL Server Security, creation of logins and users, configured permissions, and assigned roles to users.

Worked on programming, installing, configuring, managing, monitoring, and troubleshooting SQL Server.

Clarified business requirements pertaining to database applications from JRD sessions with Business Users.

Developed stored procedures, functions, and database triggers and maintained referential integrity, and implemented complex business logic.

make technical changes to existing BI systems to enhance their working

Wrote transformation scripts in Java to facilitate the migration in an effective way, not compromising data quality.

Showcased participated in the requirement analysis, database design, and modeling.

Analyzed, designed, and implemented data marts, data warehouses, and operational databases.

Supported database applications by writing T-SQL scripts such as stored procedures/user-defined functions.

Changed the business logic and implemented it in the application using SQL Server programming techniques.

Identified the cause of failures of SQL job and report to management and respective development teams.

Resolved rendering issues with PDF and Excel rendering formats for SSRS reports.

Education: Bachelor’s

JNTUH - 2014 – Computer science



Contact this candidate