Data Engineer Big

Location:

Dallas, TX

Posted:

October 07, 2024

Contact this candidate

Resume:

Vijay Bhangale +1-551-***-****

***************@*****.***

SKILLS

Various tools like DataFlow,

Composer, DataProc, Big Query,

Pub/Sub on GCP

Big Data related technologies Spark

with Scala, Hive, Oozie, Sqoop,

MongoDB, Hbase, Pig, Map Reduce

on both Hortonwork and Cloudera

platform

GCP, Azure DataBricks and

AWS Multi-cloud experience

Relational Datbase like DB2,

Oracle, MySQL, Sql Server,

Linux and Unix Scripting

Programming Languages: Python,

SAS, Scala, Cobol

Automation tools like Tidal, Control-

ETL tool like Informatica, OWB

Version control tools like Github,

SVN

EXPERIENCE SUMMARY

Google Cloud Certified Professional Data Engineer with 4+ year’s multi Cloud Experience (GCP, Azure Databricks and AWS).

Cloudera Certified Hadoop developer with 4+ Years of working experience in Big Data related technologies

Worked on building complex data (batch & streaming) pipeline with dealing with huge volume of data

Vast experience in performance tuning of ETL/Harmonization data pipelines

Played various roles like Staff Data Engineer, Project Manager, Technical Lead, Architect, Application Service Manager

Extensively worked in project managed with Agile Methodology.

Expert in writing Unix shell script to atomize the whole process.

Experience in working with client from different domain like Healthcare, Banking & Financial Services, Insurance, and Manufacturing.

Worked extensively on SAS Based technologies for analysis, reporting, development, support and administrations, performance improvements.

EXPERIENCE

July 2018 – Present Staff Data Engineer, CVS – Irving, TX Design, develop and atomize batch as well streaming Data pipelines for Interoperability using various tools and technologies on Google cloud platform like Python, Dataflow, Dataproc, Composer, Unix, Big Query, Pub/Sub, Apache Beam, Whistle transformation and loaded data to FHIR stores Fine tuned and improved performance of batch pipelines to load huge historical data in timely manner. Created composer workflows (dag’s) for handling job dependencies and end to end execution. Done cost optimization for Dataflow jobs to reduce job cost by approx. 20% Updated all jobs for using CMEK (Customer Managed Encryption Keys) for enhance security. Developed batch pipelines to retry data failed to load to FHIR Store and scheduled them to run every 15 minutes. Enabled different Implementation Guide (USCore 3.1.1 and Carin BB) on FHIR store. Created rule based framework (IOR Strategic solution) for loading data from external provider to FHIR Store. Designed detailed solution for CMS Audit – Data Merge Performance Improvement with unique requirement to merge 1-3 million delta claims transactions with 38 TB base data (incremental 3.7 billion records every year) in 2 hours on daily basis, supporting partitions based on Fill dates and Event dates in Hadoop Data Lake to facilitate rapid retrieval of merged data using Kyvos Insight. Implemented Reconciliation component to verify merged data using Audit table. Fine-tuned Spark cluster parameters and Spark SQLs to support elasticity on demand and generate optimum results. Improved overall performance of Data Merge/Upsert from 4 hours to 1.5 hours, by using internal bucketing structure and applying various Spark performances constructs. Setup Infoworks ingestion process to load claims data into partitioned structures. Implemented archival process to archive ingested claims data. Automated ingestion, merge, and archival process using Tivoli/Maestro jobs and shell scripts

For Refrigerator Temperature Data Server (IoT) designed detailed solution in Azure Databricks for near real time acquisition, transformation, load, and aggregation of IoT sensors data, generated by 90,000 devices mounted in the Refrigerators and Coolers in Pharmacies/Minute Clinics/Front stores across the USA, to avert costly medicine/drugs losses incurred due to fluctuation in Refrigerators temperatures. Assisted in implementing acquisition layer to capture Sensor, Gateway, Device Metadata, and Configuration raw data from third party provider, assisted in implementing loading of raw Sensors & Gateway data, and type 2 updates to Device Metadata and Configuration data by applying different quality rules. Developed Semantic/Aggregate views on top of raw data to generate 50+ metrics, including slope, temperature differences of last six readings, most recent ambient temperature, out of range temperature, temperatures beyond various thresholds over 30 minute, 60 minute, 2 hour, 4 hour, 12 hour, 24 hour and 48 hour windows. Assisted in designing and implementing Gap analysis component for Sensors data Involved in automating acquisition, loading, transformation and aggregation layers using Control-M Jan 2008 – June2018 Data Engineer, Cognizant, NJ

Worked on Toyota Big Data Platform migration (Toyota TBDP) project to migrate the customer C360 application onto Toyota’s Big Data Platform to onboard all analytic applications to Toyota’s single enterprise Big Data platform. Worked as Technical lead and carried out activities like working with Hadoop admin to setup the Enterprise Data Lake, Worked with security team to setup secure-privileged access using Centrify tool, Involved in end-to-end data and code migration, Unit, SIT and Integration testing

Worked on Enterprise customer survey center (Toyota ECSC) project to move the offline reports generated by vendor to in house. Reports were generated at various level like dealer, regional, district & national for various time frames monthly, quarterly & yearly. Involved in requirement gathering and implementation using Hadoop framework. Managed the offshore team on technical aspects, Reviewed the code and performed Unit, SIT testing, Worked with customer to perform UAT, Involved in Production deployment and performed post-deployed support April 2001 – Dec2007 IT Analyst, Tata Consultancy Services - Mumbai, India Designed Developed Data Warehouse for International Masters Publishers using SAS based technology. IMP had decided to retire its Siebel implementation, including the Siebel data warehouse. There are important business functions supported by this data warehouse that must be continued. SAS was used to replace the Siebel implementation. Had working session with client to understand the business requirement, technical discussion with client for finalizing the Infrastructure. Developed SAS programs for ETL. Atomization of the Jobs to update the data in Data Warehouse on daily basis. Done performance tuning of the loading process. Worked as team lead for Credit Card module of SAP-ETL project for ABSA bank. Source data was extracted from Mainframe into flat file and placed on UNIX server. OWB was used as ETL tool. The data was finally loaded in to BAPI. Analyzed MTN existing SAS based ETL and reporting system. MTN had 2 ETL system SAS & Informatica based which causing problems to customers. Analyzed their existing SAS programs, SAS datasets & other source systems. Statistical analysis and reporting using SAS for Pfizer India Ltd. The reports were created in a web-based system and developed in Base SAS language in Unix Environment that confirms the FDA (Food and Drug Administration, USA) standard guidelines. The raw data is stored in Oracle Clinical Database and extraction codes written in SAS are used to extract the data into a UNIX server. The raw datasets are then converted to Summarized and Aggregated Datasets using SAS programs.

CERTIFICATION

Google Cloud Certified Profession Data Engineer

Cloud Hadoop Certified Big Data Developer

Base SAS

PMP (Expired not renewed after 2015)

EDUCATION

Bachelor of Electrical Engineering from Govt. college of Engg, Karad

Contact this candidate