Data Engineer Analyst

Location:

Charlotte, NC

Salary:

75$ per hour

Posted:

February 21, 2023

Contact this candidate

Resume:

NISHITHA

Email: *****************@*****.*** PH: 443-***-****

Sr. Data Engineer

Professional Summary

Above 8+ years of IT Experience in Data Engineering and Data Analysis with high proficiency in developing Data Warehouse & Business intelligence professional with applied information Technology.

Extensive experience in Relational and Dimensional Data modeling for creating Logical and Physical Design of Database and ER Diagrams using multiple data modeling tools like Erwin and ER Studio.

Experienced performing structural modifications using Map-Reduce, analyzing data using Hive and visualizing in dashboards using Tableau

Experienced in Data Management solution that covers DWH/Data Architecture design, Data Governance Implementation and Big Data.

Experienced using "Big data" work on Hadoop, Spark, PySpark, Hive, HDFS and other NoSQL platforms.

Experienced in all phases of the software development life cycle (SDLC), from requirements definition through implementation and supported my models in transformation and analysis phase

Experienced in Data Modeling using Dimensional Data Modeling, Star Schema modeling, Fact and Dimensions tables including Physical, Logical data modeling

Experienced with Data Conversion, Data Quality, and Data Profiling, Performance Tuning and System Testing and implementing RDBMS features.

Experienced with Business Process Modeling, Process Flow Modeling &Data flow modeling.

Expertise in implementing Security models for Dashboards, Row level, object Level, Role Level, Dashboard Level.

Excellent experience in Normalization (1NF, 2NF, 3NF and BCNF) and De-normalization techniques for effective and optimum performance in OLTP and OLAP environments.

Excellent experience in Extract, Transfer and Load process using ETL tools like Data Stage, Informatica, Data Integrator and SSIS for Data migration and Data Warehousing projects.

Experienced in the development of Data Warehouse, Business Intelligence architecture that involves data integration and the conversion of data from multiple sources and platforms.

Expertise in Data Analysis using SQL on Oracle, MS SQL Server, DB2 & Teradata, Netezza.

Having good knowledge in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.

Strong experience working with conceptual, logical and physical data modeling considering Meta data standards.

Experience working with Agile and Waterfall data modeling methodologies.

Experience in Ralph Kimball and Bill Inmon approaches.

Expertise in UML (class diagrams, object diagrams, use case diagrams, state diagrams, sequence diagrams, activity diagrams, and collaboration diagrams) as a business analysis methodology for application functionality designs using Rational Rose and MS-Visio.

Involved in various projects related to Data Modeling, System/Data Analysis, Design and Development for both OLTP and Data warehousing environments.

Facilitated data requirement meetings with business and technical stakeholders and resolved conflicts to drive decisions.

Experience in Data transformation and Data mapping from source to target database schemas and also data cleansing.

Experience in performance analysis and created partitions, indexes and Aggregate tables where necessary.

Experience with DBA tasks involving database creation, performance tuning, creation of indexes, creating and modifying table spaces for optimization purposes.

Performed extensive Data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.

Having very Good exposure to ETL tool like Informatica.

Excellent communication skills, self-starter with ability to work with minimal guidance.

TECHNICAL KNOWLEDGE:

BigData/Hadoop Technologies

MapReduce, Spark, SparkSQL, Azure, Spark Streaming, AWS, Kafka, PySpark, Airflow, Pig, Hive, Flume, Yarn, Oozie

Scripting Languages

HTML5, CSS3, C, C++, XML, SAS, MATLAB, Scala, Python, Shell Scripting

NO SQL Databases

Cassandra, HBase, MongoDB, MariaDB

Development Tools

Microsoft SQL Studio, Azure Databricks, Eclipse, NetBeans.

Public Cloud

EC2, S3, Autoscaling, CloudWatch, EMR, RedShift

Reporting Tools

MS Office (Word/Excel/Power Point/ Visio/Outlook), Power BI, Tableau

Databases

Microsoft SQL Server, MySQL, Oracle, Teradata, Netezza

ETL Tools

Informatica

Operating Systems

All versions of Windows, UNIX, LINUX, Macintosh HD

Professional Experience

Sr. Data Engineer

Macy's, New York, NY February 2022 to Present

Project Description :

As a Senior Data Engineer at Macy's, responsibilities include designing and deploying scalable, highly available and fault tolerant systems on Azure, leading estimation, reviewing estimates and identifying complexities, and defining business objectives comprehensively through discussions with stakeholders. I will migrate on-premises environments to Cloud using MS Azure, perform data ingestion for incoming web feeds into the Data lake store, design the business requirement collection approach, and migrate data warehouses to Snowflake Data warehouse. I will be involved in developing and maintaining data pipelines, writing Spark applications for data validation, cleansing, transformations, and custom aggregations, and creating data pipelines to migrate data from Azure Blob Storage to Snowflake. The role also involves designing and generating dashboards and reports using various Power BI visualizations, developing purging scripts and routines to purge data, and maintaining data storage in Azure Data Lake.

Responsibilities

Designed and deployed scalable, highly available, and fault tolerant systems on Azure.

Involved in complete SDLC life cycle of big data project that includes requirement analysis, design, coding, testing and production.

Lead the estimation, review the estimates, identify the complexities and communicate to all the stakeholders.

Defined the business objectives comprehensively through discussions with business stakeholders, functional analysts and participating in requirement collection sessions.

Migrated on-primes environment on Cloud using MS Azure.

Performed data Ingestion for the incoming web feeds into the Data lake store which includes both structured and unstructured data.

Designed the business requirement collection approach based on the project scope and SDLC (Agile) methodology.

Migrated data warehouses to Snowflake Data warehouse.

Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.

Installed and configured Hadoop Ecosystem components.

Defined virtual warehouse sizing for Snowflake for different type of workloads.

Extensively used Agile Method for daily scrum to discuss the project related information.

Worked with data ingestions from multiple sources into the Azure SQL data warehouse

Transformed and loading data into Azure SQL Database.

Wrote Spark applications for Data validation, cleansing, transformations and custom aggregations.

Developed HIVE scripts to transfer data from and to HDFS.

Implemented Hadoop based data warehouses, integrated Hadoop with Enterprise Data Warehouse systems.

Performed reverse engineering using Erwin to redefine entities, attributes and relationships existing database.

Development and maintenance of data pipeline on Azure Analytics platform using Azure Databricks.

Created Airflow Scheduling scripts in Python.

Ingested data into HDFS using Sqoop and scheduled an incremental load to HDFS.

Worked with Hadoop infrastructure to storage data in HDFS storage and use HIVE SQL to migrate underlying SQL codebase in Azure.

Created Data Pipeline to migrate data from Azure Blob Storage to Snowflake.

Worked on Snowflake modeling and highly proficient in data warehousing techniques for data cleansing, Slowly Changing Dimension phenomenon, surrogate key assignment and change data capture.

Maintained NoSQL database to handle unstructured data, clean the data by removing invalidate data, unifying the format and rearranging the structure and load for following steps.

Participated in NoSQL database maintaining with Azure Sql DB.

Involved in Kafka and building use case relevant to our environment.

Identified data within different data stores, such as tables, files, folders, and documents to create a dataset in pipeline using Azure HDInsight.

Optimized and updated UML Models (Visio) and Relational Data Models for various applications.

Wrote Python scripts to parse XML documents and load the data in database.

Written DDL and DML statements for creating, altering tables and converting characters into numeric values.

Translated business concepts into XML vocabularies by designing XML Schemas with UML.

Worked on Data load using Azure Data factory using external table approach.

Automated recurring reports using SQL and Python and visualized them on BI platform like Power BI.

Designed and generated various dashboards, reports using various Power BI Visualizations.

Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools.

Developed purging scripts and routines to purge data on Azure SQL Server and Azure Blob storage.

Developed Python Scripts for automation purpose and Component unit testing using Azure Emulator.

Involved in T-SQL queries and optimizing the queries in SQL Server.

Maintaining data storage in Azure Data Lake.

Tools: Hadoop 3.0, Hive, Zookeeper, Erwin 9.8, SQL, PL/SQL, Agile, Snowflake, Azure Data Lake, Azure Data factory, MDM, XML, Azure Databricks, T-SQL.

Sr. Data Engineer

Truist Bank, Charlotte, NC November 2019 to January 2022

Project Description :

As a Data Engineer for Truist, I reviewed business requirements and created data mapping documents, used Agile methods for daily scrum, and worked on Big Data initiatives and engagements. I developed data pipelines with Amazon AWS and Hadoop, designed data flows, and normalized data. I also worked on disaster recovery and backup on Cassandra Data, created HBase tables, and used SQL and Python queries in Snowflake. Additionally, I created data visualizations using Python and Tableau, wrote and executed unit and system scripts, and performed data cleaning and manipulation.

Responsibilities

Massively involved as Data Engineer role to review business requirement and compose source to target data mapping documents.

Extensively used Agile Method for daily scrum to discuss the project related information.

Provided a summary of the Project's goals, and the specific expectation of business users from BI and how it aligns with the project goals.

Provided suggestion to implement multitasking for existing Hive Architecture in Hadoop also suggested UI customization in Hadoop.

Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.

Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.

Responsible for a setup of 5 node development cluster for a Proof of Concept which was later implemented as a fulltime project by Fortune Brands.

Architect and designed the data flow for the collapse of 4 legacy data warehouses into an AWS Data Lake.

Installed and Configured Hadoop cluster using Amazon Web Services (AWS) for POC purposes.

Connected to AWS Redshift through Tableau to extract live data for real time analysis.

Developed Data mapping, Transformation and Cleansing rules for the Data Management involving OLTP and OLAP.

Used AWS S3 Buckets to store the file and injected the files into Snowflake tables using Snow Pipe and run deltas using Data pipelines.

Worked on complex SNOW SQL and Python Queries in Snowflake.

Configured the above jobs in Airflow.

Resolve AML related issues to ensure adoption of standards, guidelines in the organization. Resolution of day-to-day issues and worked with the users and testing team towards resolution of issues and fraud incident related tickets.

Used Erwin tool to develop a Conceptual Model based on business requirements analysis.

Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources

Load real time data from various data sources into HDFS using Kafka.

Worked on normalization techniques. Normalized the data into 3rd Normal Form (3NF).

Involved in Data profiling in order to detect and correct inaccurate data and maintain the data quality.

Worked on configuring and managing disaster recovery and backup on Cassandra Data.

Implemented Kafka producers create custom partitions, configured brokers and implemented High level consumers to implement data platform.

Designed and developed an entire module called CDC (change data capture) in python and deployed in AWS GLUE using PySpark library and python.

Involved in writing API for Amazon Lambda to manage some of the AWS Services.

Updated Python scripts to match training data with our database stored in AWS Cloud.

Normalized the database based on the new model developed to put them into the 3NF of the data warehouse.

Running entire Big Data on AWS environments.

Involved in extensive Data validation by writing several complex SQL queries.

Performed data cleaning and data manipulation activities using NZSQL utility.

Designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.

Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.

Worked with MDM system steam with respect to technical aspects and generating reports.

Extracted Mega Data from Redshift AWS, and Elastic Search engine using SQL Queries to create reports.

Worked on Data governance, data quality, data lineage establishment processes.

Executed change management processes surrounding new releases of SAS functionality

Worked in importing and cleansing of data from various sources.

Performed Data Cleaning, features scaling, features engineering using packages in python.

Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.

Created various types of data visualizations using Python and Tableau.

Written and executed unit, system, integration and UAT scripts in a data warehouse project.

Tools: Erwin 9.7, Agile, OLTP, OLAP, Snowflake, Snow Sql, AWS, EC2, MDM, SAS, SQL, PL/SQL.

Sr. Data Modeler

Ditech Fort Washington, PA. March 2017 to October 2019

Responsibilities

Worked as a Sr. Data Modeler to generate Data Models using Erwin and developed relational database system.

Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.

Responsible for data governance rules and standards to maintain the consistency of the business element names in the different data layers.

Developed Big Data solutions focused on pattern matching and predictive modeling

Worked on analyzing Hadoop stack and different big data analytic tools.

Extensively used agile methodology as the Organization Standard to implement the data Models.

Worked in an environment of Amazon Web Services (AWS) provisioning and used AWS services.

Created semantically rich logical data models (non-relational/NoSQL) that define the Business data requirements.

Converted conceptual models into logical models with detailed descriptions of entities and dimensions for Enterprise Data Warehouse.

Involved in creating Pipelines and Datasets to load the data onto Data Warehouse.

Coordinate with Data Architects to Design Big Data, Hadoop projects and provide for a designer that is an idea-driven.

Created database objects in AWS Redshift. Followed AWS best practices to convert data types from oracle to Redshift.

Worked on NoSQL Databases as Cassandra

Identified data organized into logical groupings and domains, independent of any application or system.

Developed data models and data migration strategies utilizing sound concepts of data modeling including star schema, snowflake schema.

Created S3 buckets in the AWS environment to store files, sometimes which are required to serve static content for a web application.

Developed Model aggregation layers and specific star schemas as subject areas within a logical and physical model.

Identified Facts &Dimensions Tables and established the Grain of Fact for Dimensional Models.

Configured Inbound/Outbound in AWS Security groups according to the requirements.

Established measures to chart progress related to the completeness and quality of metadata for enterprise information.

Developed the data dictionary for various projects for the standard data definitions related data analytics.

Managed storage in AWS using Elastic Block Storage, S3, created Volumes and configured Snapshots.

Generated the DDL of the target data model and attached it to the Jira to be deployed in different Environments.

Conducted data modeling for JAD sessions and communicated data related standards.

Environment: Erwin 9.7, NoSQL, Sqoop, Cassandra 3.11, AWS, Hadoop 3.0, SQL, Pl/SQL

Cybage Software Private Limited Hyd India August 2015 to December 2016

Data Modeler

Responsibilities

Responsible for the data modeling design delivery, data model development, review, approval and Data warehouse implementation.

Extensively used Agile methodology as the Organization Standard to implement the data Models.

Provided a consultative approach with business users, asking questions to understand the business need and deriving the data flow, logical, and physical data models based on those needs.

Extracted Mega Data from Amazon Redshift, AWS, and Elastic Search engine using SQL Queries to create reports.

Worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.

Created Physical & logical data model from the conceptual model and it's conversion into the physical database with the DDL's using forward engineering options in Erwin.

Used Erwin model mart for effective model management of sharing, dividing and reusing model information and design for productivity improvement.

Used Model Manager Option in Erwin to synchronize the data models in Model Mart approach.

Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snow Flake Schemas

Involved in Dimensional modeling (Star Schema) of the Data warehouse and used Erwin to design the business process, dimensions and measured facts.

Worked on implementing and executing enterprise data governance and data quality framework.

Completed enhancement for MDM (Master data management) and suggested the implementation for hybrid MDM (Master Data Management)

Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.

Created T-SQL queries as per the business requirements.

Storing and loading the data from HDFS to AWS S3 and backing up and Created tables in AWS cluster with S3 storage.

Tools: Erwin9.6, Oracle11g, HDFS, HIVE, Sqoop, DDL, MDM, MapReduce, AWS, Redshift, ODS, OLTP, UNIX & Agile.

Data Analyst

Yana Software Private Limited Hyderabad, India April 2014 to July 2015

Responsibilities

Understood business processes, data entities, data producers, and data dependencies.

Conducted meetings with the business and technical team to gather necessary analytical data requirements in JAD sessions.

Created the automated build and deployment process for application, re-engineering setup for better user experience, and leading up to building a continuous integration system.

Involved in the Complete Software development life cycle (SDLC) to develop the application.

Used and supported database applications and tools for extraction, transformation and analysis of raw data.

Assisted in defining business requirements for the IT team and created BRD and functional specifications documents along with mapping documents to assist the developers in their coding.

Involved in building various logics to handle Slowly Changing Dimensions, Change Data Capture, and Deletes for Incremental Loads in to the Data warehouse.

Involved in designing fact, dimension and aggregate tables for Data Warehouse Star Schema.

Performed Reverse Engineering of the legacy application using DDL scripts in Erwin, and developed Logical and Physical data models for Central Model consolidation

Monitored the Data quality of the daily processes and ensure integrity of data was maintained to ensure effective functioning of the departments.

Developed data mapping documents for integration into a central model and depicting data flow across systems & maintain all files into electronic filing system.

Worked and extracted data from various database sources like DB2, CSV, XML and Flat files into the DataStage.

Developed and programmed test scripts to identify and manage data inconsistencies and testing of ETL processes

Created data masking mappings to mask the sensitive data between production and test environment.

Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.

Tools: Erwin7.3, PL/SQL, DB2, OLAP, XML, OLTP, Flat Files, UNIX.

Contact this candidate