Post Job Free
Sign in

Data Engineer Scientist

Location:
Woodbridge, NJ
Posted:
July 02, 2024

Contact this candidate

Resume:

Vinay Kumar

Title:

Data Architect / Data Engineer/ Data Scientist/ Data Analyst

Visa Type:

H1B

Email ID:

***************@*****.***

Phone:

+1-201-***-****

Experience Summary

Vinay is a Data Architect/ Data Engineer/ Data Scientist. He has 13 years of experience in Software Industry. His technical expertise is data engineer, Data Architecture with Data migration, ETL, BI Analytics and Data Science problem statement. He has functional experience across Data Migration, Fleet Management, Health Care, Renewable Energy and Finance Domain. He has also worked as Data Architect with data profiling and data quality. Vinay has a good multi-cultural working experience having worked with clients with on-site experience in India, USA, Switzerland, Australia and UK.

He has expertise in managing projects and teams with strong inter-personal and client interfacing skills and the ability to quickly understand technical and business scenarios and formulate efficient solutions to meet complex technology needs.

KEY SKILLS:

Functional

Agile Methodology (Rally & Jira), Team Leader, Strategic/Tactical Planning, Business Analysis and Development, Excellent Communication and Interpretation skills, Oracle ERP, Salesforce, SAP, Oracle Primavera, Renewables Energy, Healthcare, Finance, Media Industry, Data Catalog

Technical

Solution Architecture, Data Architecture, Data Integration, Data Governance, Data science & data analytics, DevOps (CI/CD), Data Lake, Data warehouse, OLTP, OLAP, Data Cleansing, Data Wrangling, ETL, ELT, Scrum, Kanban, Agile, Hadoop, SQOOP

Programming Languages

Python programming language (Pandas, Spark, NLP, NumPy, etc.), Terraform, SQL, PLSQL, PERL, Java Script, T-SQL, Big Query, NOSQL, Java

Cloud Platform

Azure cloud computing, Azure data factory, Azure Databricks, Azure Storage, AWS Cloud computing, Azure Synapse, Azure logic app, Azure data lake, Azure blob, Airflow

Database

AWS Redshift, Greenplum, Postgres, SQL Server, Oracle, Cassandra, Databricks, Mongo database, MY SQL, Snowflake, AWS S3, Azure blob, Azure Gen DL, DB2, Oracle 10g, Oracle 11g, Oracle 12c

Tools

GITHUB, Octopus, Talend, Tableau, Cognos BI, Cognos TM1, Dataiku, SSIS, SSRS, Bitbucket, TeamCity, Cognos Report Studio, Cognos Query Studio, Cognos Connection, Cognos Data Manager, Informix, Toad, Power BI, Attunity, Natural language processing (NLP), Predictive analytics, Open AI, SPLUNK, Visio, Erwin, Draw io, Manta, Microsoft Fabric, DBT, Alation

Training, Certifications & Affiliations

TOGAF Foundation Level Certification ( Level 1) & TOGAF Certified Level Certification ( Level 2)

AWS Cloud Practitioner Certificate

AZ 900 - Azure Data Fundamentals

DP203 - Azure Data Engineer

Machine Learning A-Z course from Udemy

Capgemini Automation Academy Certification

Developed Data Lineage and tabdeploy tool which is leveraged by multiple projects.

Highest Level

Master In Computer Applications with 8.47 CGPA (2011 batch) from ICFAI

Bachelor of computer Application ( 2007 batch) from IGNOU

Recent Projects

01

Project Name:

BBC, New York, USA

Project Description:

The best of the BBC, with the latest news and sport headlines, weather, TV & radio highlights and much more from across the whole of BBC Online. Data engineering & analytics in cloud platform.

Role/Title:

Data engineer / Data Architect

Project Duration:

July 2023 – June 2024

Responsibilities:

Analyses current business practices, processes, and procedures as well as identifying future business opportunities for leveraging Microsoft Azure Data & Analytics Services.

Engage and collaborate with customers to understand business requirements/use cases and translate them into detailed technical specifications.

Designing & development of data architecture, data modelling, data management, data lake, data cleansing, data quality checks, data validation, etc.

Creation of organization charts, network diagrams, data models using Visio, Erwin, Draw IO tools for salesforce, SAP, HR for domain data.

Designing and building of data pipelines using Azure data factory, Azure synapse and streaming ingestion methods for systems like Salesforce, SAP, finance, HR, etc.

Experience in solution using Azure Stream Analytics, SQL & PL/SQL DW, Analysis Services, Azure Functions, Serverless Architecture, ARM Templates, Azure Data Factory, Azure Synapse, Azure SQL, Azure Data Lake, Microsoft Fabric, DBT and Azure App Service.

Load transformed data into storage and reporting structures in destinations including data warehouse, real-time reporting systems and analytics applications using Azure services, SQL, PL/SQL, snowflake.

CI and CD for Azure services like Azure Synapse, Azure data factory, Azure synapse & data factory pipeline, Azure storage account, Serverless synapse analytics, SPARK pool, etc. using Terraform code.

Automation of manual work done by business

Analysis on Tableau report to Power BI migration project

Design and development of solution to convert articles, documents from text to audio file in English language. Translate audio files to different languages and further save it in audio file.

Other responsibilities include extracting data, troubleshooting, JIRA maintenance and maintaining the data warehouse.

Operating System:

Azure cloud computing

Software / Languages:

Python, Oracle, Azure data factory, Azure Synapse, Azure storage, Azure blob, Terraform, Octopus, JIRA, SQL, PL/SQL, Oracle 10g, Oracle 11g, Oracle 12c

02

Project Name:

GE Renewables, Schenectady, NY, USA

Project Description:

GE Renewable Energy is a division of General Electric headquartered in Paris, France focusing on production of energy from renewable sources. Its portfolio of products includes wind (onshore and offshore), hydroelectric and solar (concentrated and photovoltaic) power generating solutions

Role/Title:

Data Architect / Data Engineer / Data scientist

Project Duration:

Sep 2017 – June 2023

Responsibilities:

Provide technical leadership and thought leadership as a senior member of the Analytics Practice in areas such as data access & ingestion, data processing, data integration, data modelling, database design & implementation, data visualization, and advanced analytics.

Proficient in creating and managing data pipelines using Azure Data Factory & Mapping Data Flows to facilitate efficient extraction, transforming, and loading (ETL) processes, ensuring seamless data flow and integrity.

Additionally, proficient in implementing Error Handling, Data Validation, and Monitoring

functionalities to ensure reliability in the integration process.

Designed and implemented pipelines in Azure Data Factory and Azure Data Lake Gen 2 as

storage layer using Databricks for data transformation.

Utilized Agile (Rally & JIRA) methodology to streamline project development and enhance team collaboration.

Design & develop data migration solution from legacy & on-prem from source database are Oracle, DB2, SQL Server to cloud platform and salesforce by analyzing the data & PL/SQL code, analyzing business process flow, creating data model, data mapping, data pipelines, data transformation, data cleansing, etc. to transform the data to salesforce & cloud computing.

Sqoop for data integration within Hadoop platform.

Proficient in integrating Delta Lake as storage with Azure Databricks for advanced data processing, analytics and leveraging the collaborative Spark-based environment for scalable data engineering solutions.

Reduced manual intervention by 50% by developing automated ETL pipelines using Azure Data Factory, optimizing data processing workflows, SQL, PL/SQL minimizing human errors.

Enhanced data processing throughput by 50% through parallel execution and workload

balancing techniques, maximizing utilization of cluster resources.

Strong command of SQL & PL/SQL Language for comprehensive data querying and analysis across relational database systems, ensuring data accuracy, consistency and performance optimization when executing in databases like Postgres, Oracle, DB2, Greenplum, Redshift, snowflake, Oracle P6, Oracle 10g, Oracle 11g, Oracle 12c, etc.

Proactive in identifying opportunities for automation and optimization within data engineering processes, leveraging Azure services and tools to streamline operations and reduce manual efforts.

Data Lineage, data catalog, data cleansing, data governance using Manta, Alation, inhouse solution & analytics in multiple domains like health care, Oracle ERP, HR, Salesforce, Oracle P6, etc.

Requirement gathering of business requirement and ensure deliverables are on track.

Work on data science use case using Python to build data analytics for business user.

Requirement gathering, design and develop analytics using Tableau.

Work with business user to understand problem statement and provide data science analytics as a resolution and take key decisions.

Automated problem statements like Data Lineage from tableau reports to database to source system using Python and data science concepts.

Automated problem statements like deployment of solution like tableau

Identify manual work in business and work on innovative solution to automate it.

Architecture and design of self – service platform for business user to perform data analytics using Dataiku tool.

Analyze, design and develop solution for data integration and AWS cloud computing in databases like Postgres, Oracle, DB2, Greenplum, Redshift, snowflake, etc.

Design conceptual, logical and physical data model.

Develop ETL solution using Talend ETL tool and integrate data from DAAS, API, Smartsheet, etc.

Experience in data analysis, data blending & data wrangling in databases like Postgres, Greenplum, DB2, Oracle, Redshift, snowflake, etc. using SQL, PL/SQL procedures, functions.

Experience in data governance including data privacy rules - HIPPA, e Privacy, CCPA, GDPR.

Experience in solution deployment with manual approach and GITHUB.

Document solution, design, data architecture, data guidelines, data governance, data security, etc.

Operating System:

Unix in AWS cloud computing, Azure cloud computing

Software / Languages:

Talend, Python, Dataiku and Greenplum, Redshift, AWS cloud computing, Oracle, SQL Server, SQL, PL/SQL, Tableau, GITHUB, Azure Data factory, Azure Synapse, Azure Databricks, Postgres, Snowflake, DB2, Rally, JIRA, Oracle, Oracle 10g, Oracle 11g, Oracle 12c

03

Project Name:

Custom Fleet, Australia.

Project Description:

Custom Fleet, Australia provides expert leasing and fleet management services that help mining operations. The project aims to create central repository for reporting solution, future data science and machine learning use cases from 300+ source tables.

Role/Title:

Technical Solution Lead/ Data Analyst

Project Duration:

Aug 2016- Sep 2017

Responsibilities:

Project management and team management. Used JIRA tool for project management & generate reports to track the project for leadership team. Mentor and guide team.

Working with Business to understand the existing system and requirement gathering for data model and reports.

Played key role in architecture design & development of data migration project from legacy & on-prem system to cloud platform.

Design data model and reporting solution. Developed ETL and reports.

Business and data analysis, data cleansing, data quality checks. Create physical data model in SQL Server and Oracle using SQL and PL/SQL

Create stored procedure in SQL Server and oracle PL/SQL to transform the data. Performance tuning of Oracle PL/SQL and T-SQL.

Prepared technical documents like data mapping and metadata document.

Build and test the report solution and deployment to production with handover document.

Operating Systems:

Unix in On-Prem environment

Software / Languages:

Attunity, Oracle, MS SQL Server, SSIS and SSRS, DB2, SQL, PL/SQL, Oracle 10g, Oracle 11g, Oracle 12c

04

Project Name:

Custom Fleet, Australia

Project Description:

The project aims at delivering a Solution to implement data migration from existing data warehouse to Cassandra based solution and Inbound/outbound interface solution.

Role/Title:

Data Migration Consultant

Project Duration:

Feb 2016- Aug 2016

Responsibilities:

Working with stakeholders to understand existing system.

Involved in architecture design of the data migration project from legacy system.

Design ETL and inbound/Outbound solution using Talend, SQL, PL/SQL.

Data analysis of source system. Contributed in data cleansing, data profiling and data model in Oracle, Cassandra

Prepared ETL data mapping and Inbound and Outbound Interface document

Development of Cassandra and Oracle based solution using ETL process and Inbound/Outbound interface.

Preparing design and development artefacts

Operating Systems:

Unix in On-Prem environment

Software / Languages:

Cassandra, Oracle, Talend, CLEO, SQL, PL/SQL, Oracle 10g, Oracle 11g, Oracle 12c

05

Project Name:

GE Capital and healthcare, Schenectady, USA.

Project Description:

COBRA is application which recognizes the need to significantly improve budget and forecasting, resource allocation. Data analytics on healthcare domain with medical claim and healthcare products

Role/Title:

Technical lead / Developer

Project Duration:

July 2014- Feb 2016

Responsibilities:

Responding to a change in the client IT direction, promptly prepared for construction of the planning and forecasting tool.

Published the packages complex reports based on user requirements through Cognos connection.

Created List and Cross Tab reports in Report Studio using the Query Subjects and Query Items to build the Queries.

Designed complex reports having functionality like – Drill Up, Drill Down, Drill Through, Master Detail relationship and created dashboard using charts.

Design, develop, and implement Cognos TM1 applications for Financial Planning and Budgeting models with complex rules.

Development of BI reports on healthcare domain to assist business user to take decision on annual premium vs claim vs profit analysis displays the company’s overall progress.

Data analytics on healthcare domain application ranging from patient registration, electronic health records, admission, bed allocation, discharge, billing, etc.

Data analytics & report development on healthcare domain with member, provider, broker & claim system to ensure data transparency, data security, data cleansing, data quality .

Provided TM1 training and assisting team members.

TI process to write back to database. Admin activities - Creation and Maintenance of TM1 users, roles, and security. Implemented email notification functionality in TM1.

Created applications using TM1 contributor and TM1 perspective (Active form and Slices).

Integrated asp web page file with TM1 to provide web interface, file upload functionality, etc.

Scheduled Chores and Monitored Data Loads on timely manner.

Working with TM1 contributor and used MDX query to attain results.

Performance tuning and unit testing. Scheduled Chores and Monitored Data Loads on timely manner.

Regular interaction with client

Operating Systems:

Windows XP SP3

Software / Languages:

Cognos TM1, Cognos EP, Oracle and SQL Server, MY SQL, SQL, PL/SQL

06

Project Name:

Education First, UK

Project Description:

Poseidon finance recognizes the need to significantly improve budgeting and forecasting of Sales, Management, pricing, etc and allows Operation Managers to participate in a process to understand the market for forecasting and budgeting software.

Role/Title:

Senior Developer

Project Duration:

Jun 2013 -July 2014

Responsibilities:

Requirement gathering

Responding to a change in the clients IT direction, promptly prepared for construction of the planning and forecasting tool.

Design, develop, and implement Cognos TM1 applications for Financial Planning and Budgeting models with complex rules.

Development of Turbo Integrator (TI) Processes to load and retrieve data from database tables and other data sources, for creating dimensions, hierarchies, and cubes.

Creation and Maintenance of TM1 users, roles, and security.

Creating Reports & Admin activities.

Java framework for monitoring, logging, testing, validation & deployment.

Created temporary Views, Subsets using Turbo Integrator and populated data for actual.

Created complex Active Forms and enhanced the performance of existing Active forms.

Scheduled Chores and Monitored Data Loads on timely manner.

Created TI Process to transfer currency data to currency cube.

Used TM1 rules editor to create and edit rule to create various calculations.

Used Replication to copy the Cubes from one server to another server.

Development of ETL processes based upon complex SQL queries retrieving data from various sources for creating dimensions and cubes and loading data using Turbo Integrator.

Import the data from ODBC and Text files using TM1 Turbo integrator.

Performance monitoring using TM1 TOP.

Performed Unit Testing and User Acceptance Testing

Production Support, which involves solving user problems.

Regular interaction with client and Customers.

Operating Systems:

Windows XP SP3

Software / Languages:

Cognos TM1, Cognos BI, Oracle and SQL Server, MY SQL, Informix, Toad, DB2, SQL, PL/SQL

07

Project Name:

ADP, USA.

Project Description:

Total Absence Management Program minimizes the impact of employee absenteeism and allows you to manage the various levels of employee absence that you may experience, including Short-term and long-term disability services, Injury management and Workers’ Compensation, FMLA services, Return-to-work services, etc. We develop the software to understand different markets and prepare forecast.

Role/Title:

Developer

Project Duration:

Nov 2010 -Jun 2013

Responsibilities:

Analyze the requirements and design the application as per the requirements of the users.

Created Model in Framework Manager that forms a metadata layer using data source query subjects, SQL, PL/SQL and query items using multiple data source like Oracle, SQL Server.

Created model data relationships using star schema modeling for reporting.

Published the packages complex reports based on user requirements through Cognos connection.

Created List and Cross Tab reports in Report Studio using the Query Subjects and Query Items to build the Queries.

Designed complex reports having functionality like – Drill Up, Drill Down, Drill Through, Master Detail relationship and created dashboard using charts.

Used Macros in SQL query. Optimized reports by tuning SQL and PL/SQL.

Used JavaScript’s in HTML object in reports. Scheduled Reports and Jobs to run periodically.

Java framework for data ingestion, transformation, validation & deployment.

Production Support which involves solving user problems.

Created various Report Studio reports using calculations, filters, prompts and conditional formatting, drill through.

Provided technical assistance, training and mentoring to Users and Business Partners.

Used SQL & PL/SQL object in Report Studio to use Free hand SQL, PL/SQL by analyzing data model.

Bursting, creating view and scheduling of reports.

Analyzed, designed and coded ETL process to populate star schema and snowflake schema which includes Fact tables, Dimension tables and Job streams using PL/SQL procedures, functions.

Created Connections, User defined functions in Cognos Data Manager for ETL process.

Worked in Cognos Data Manager, SQL, PL/SQL to create ETL process ensuring data cleansing is done, data quality checks, data validation, etc.

Regular interaction with on-site coordinators.

Operating Systems:

Windows XP SP3

Software / Languages:

Cognos TM1, Report Studio, Query Studio, Cognos Connection, Data Manager, Oracle and SQL Server, MY SQL, Informix, Toad, DB2, SQL, PL/SQL, Oracle 10g, Oracle 11g



Contact this candidate