Data Engineer Senior

Location:

Guntur, Andhra Pradesh, India

Posted:

March 21, 2024

Contact this candidate

Resume:

RaghuNathChowdary.Kolla

****

RaghuNathChowdary.Kolla

Mail Id: *****************.*****@*****.*** Mobile No: +91-897-***-**** Senior Data Engineer

Professional

Synopsis

• Senior Data Engineer having 7.6 years of IT experience. Majorly areas of expertise in Solutioning, Architecting, Designing and Development of Parallel Processing Bigdata Data pipelines using Spark, Cloud and Hadoop eco system components.

• Cloud Certified Data engineer in AWS Certified Data Analytics Specialty.

• Exclusive experience included in data modelling, data processing and building data pipelines using data warehouse concepts for Bigdata applications modernization using frameworks like Spark with programming languages Python, Scala, Java.

• Main areas of enterprise are on building pipelines, modelling, data analytics and executing script developed using Spark, Hive, AWS EMR, AWS Glue, EC2, S3, Lambda, CFT, Elastic cache, Sqoop, NiFi, Impala, Athena, Kylo, Pandas.

• Recent assignments focused on gathering the requirements, designing and development of Pyspark framework in AWS for processing and Analysis of Diseases, drugs data and assessing functional and technical architecture which includes closely working with healthcare and life sciences clients to define future state models and strategic roadmaps.

• Maintained Code quality and security as per the standards of tools like Sonar cube and Veracode and experience in writing unit testcases to maintain coverage for the code.

• Gained good knowledge in Shell Scripting, Kafka, HBase and Spark streaming.

• Proficient knowledge on Agile Sprint Methodology, Jira board, Git branching strategies, Code Deployment, Jenkins, Docker, Informatica and AbInitio ETL tools.

• Had experience in working with different Industry domains like Healthcare, Life Sciences, Banking & Financial services, and Technology. Key Projects

EPI Dashboard

Role: Senior Data Engineer

Technologies Used: PySpark, AWS Glue, S3, Athena, AWS SSM, Jenkins, Docker, Sonar cube, Veracode, elastic cache, Postgres SQL. Programming Languages: Python and SQL

Description: Data related to Epidemiology disease cases and drugs related information for diseases is collected from snowflake is pushed to S3 and data related to drugs is collected by hitting API. S3 data is processed by creating glue catalogue table and glue jobs collect data from catalogue table and aggregations and transformations are applied on data using Pyspark and loaded to postgres table by applying fact dimension relationships and API json data is modelled as per requirement and stored in postgres used as source for Visualization in Power BI and to create UI applications.

• Involved in gathering the technical requirement, designing and development.

• Developed Pyspark code to model data using transformation, aggregation rules to perform data analysis, data processing, storing data with fact dimension relationships.

• Data Cleansing, Standardization, and validation rules are performed on data using to maintain data quality standards in storage and using as source for visualization.

• Maintained code quality by allowing quality gate passing in sonar cube without issues, security using Veracode, and code deployment is done using Jenkins.

• Unit test cases are written to have code coverage that allows quality gate way.

• Incremental data is handled by using delta lake and enabling bookmark of glue jobs.

• API data is modelled in the required format and stored as jsonb data in postgres which is used as source by backend services to flow data for visualization in UI. RaghuNathChowdary.Kolla

2023

• Glue workflow is created for glue jobs and workflow is scheduled as per requirement.

• Athena query engine is used as query mechanism for querying data stored in Catalogue. Collaborate Development and Management

Role: Senior Data Engineer

Technologies Used: PySpark, AWS EMR, Lambda, Athena, CFT, SM, SQS, S3, AWS Code Pipeline, Cloud watch, SQL.

Programming Languages: Python and SQL

Description: Assets, Trades, Transactions, and party related data collected from source S3 buckets is analyzed and processed using Pyspark and stored in S3 buckets. Once source data gets ingested in S3 bucket accelerator code is executed for Data Movement, Data Quality Analysis and Match Merge Process etc. and data is stored back to various layers of S3 bucket. Deployment of code is managed using AWS Code pipeline and management of job applications are done by using AWS Lambda functions.

• Involved in gathering the technical requirement, designing and development.

• Developed Pyspark accelerator code to handle transformation and aggregation rules and to perform data analysis and data processing.

• Data Wrangling, Cleansing, Standardization and validation rules are performed on data using Pyspark for improvement of quality of data.

• Developed AWS Lambda functions for management of various services in AWS and execution on EMR cluster based on the events received for SQS.

• Involved in building cloud formation templates for spin up of AWS services.

• Athena query engine is used as query mechanism for querying data stored in S3.

• Pyspark scripts are executed on top of spin up EMR cluster after deployment of code.

• Pytest code for test cases is developed and executed in order to maintain code quality and the percentage of code coverage.

Government Programs Division

Role: Data Engineer

Technologies Used: PySpark, Python, Hive, Pheonix, Sql. Programming Languages: Python and Sql

Description: Institutional and Professional source data JSON files from various vendors related Medicare Claims is processed using JSON parsing in PySpark for various Medicare encounter processes. Data Analysis is performed on processed data to calculate count of encounters per claim. Parsed and Analyzed encounter count per claim is generated as Report. Encounter claim data is also stored in Hive using PySpark for future Analysis.

• Involved in gathering the requirements, designing and development of Pyspark frame work for processing and Analysis of Medicare data.

• Developed JSON Parsing Scripts using PySpark code.

• Performed Data Analysis and transformations on processed data as per business inventory rules like aggregations, standardization and analytical data modeling using Python and Spark functions.

• Data from Phoenix tables is read and performed joins as per business rules.

• Data cleansed after aggregation is processed Hive table using Pyspark.

• Prepared hive queries for loading data in to Data Mart with Fact Dimension relationship.

RaghuNathChowdary.Kolla

2023

DROID

Role: Data Engineer.

Technologies Used: Spark, Hive, Impala, NiFi, Kylo, Sqoop. Programming Languages: Scala, R and SQL

Description: Leading Pharmaceutical organization manufactures various drugs and markets those drugs. Data related to various Drugs related to sales, prescriptions and branding details are collected ingested to data mart by processing the data using Spark and performing analytics on the data and reporting data. Data collected from various sources is ingested using ingestion tools Kylo and NiFi on the ingested data analytics and modeling is performed and processed to Hive Data Mart using Spark.

• Involved in gathering the requirements, designing and development.

• Written Spark codes with transformations to model and perform analytics the data need to be processed using spark to the Data Mart.

• Written Scala UDF’s to generate Date series and transform data to model data as per requirements to load to Data Mart.

• Made the Spark code compatible to Impala tables as to perform operations on loaded data in Impala shell.

• Analyzed, enhanced, and ran R scripts for few applications of Data to model and generate files required for Ingestion.

• Developed Pipelines in NiFi and Kylo to collect data in various formats from source and ingest the Data to raw layer by triggering using scheduler.

• Performed operations in Hive with dimension tables to load data into the fact table.

• Developed Spark code to perform analytics on raw data as per requirements and processed to business layer.

• Provided input feeds to Kylo using templates to ingest the data-to-Data Mart. Dun & Bradstreet HMRCVAT:

Project Name: HMRC VAT

Role: Developer

Environment: Spark 2.2, Java 1.8, AWS EMR, MySQL.

Cloud Platform: AWS.

Description: Used to get data from UK stock exchange as a HMRCVAT file which is stored in AWS S3 buckets the files from S3 are processed to MySQL database using Spark and Java. File Data is collected and required validations are made on the file and processed data to database Processed data is used for matching purpose with the HMRC data available. The data from file is processed for matching to API as request and validated for matching data and the response data is stored back to MySQL database.

• Involved in gathering the requirements, designing and development.

• Involved in technical discussions, architecture design meetings.

• Creating EMR clusters to run the spark jobs and moving files between S3 and local.

• Processed data from S3 to MySQL database by running Spark jobs using Java on EMR.

• Created JSON files using java by taking the input data from the files.

• Validated VAT file with different validations as per the required file formats and data requirements needed.

• Logged data in to files and logged data if the data is not valid as per the requirements in order to process the data. Used JDBC to connect to database and to query the database.

RaghuNathChowdary.Kolla

2023

Professional Highlights:

• Gathering requirements and understanding, analyzing designing, building applications using spark scripts for data analysis, data processing, data modelling using scalable frame works, data integration, data engineering, data ingestion and data governance.

• Developed Pyspark Scripts for parsing JSON files and achieving data in structure format for processing and Analysis of Medicare data and assessing functional and technical architecture.

• Performed complex transformations, data manipulation and data analysis on top of raw data and processed to various data marts for storage, written unit test cases to maintain code coverage.

• Developed Lambda functions in AWS cloud for managing various services in AWS cloud.

• Maintained code quality as per quality gate standards without security threats, complexity, and issues

• Developed glue workflows and scheduled glue jobs end to end reading from S3, performing analytics and loading to storage.

Skill Set

Hadoop Ecosystem Spark, HDFS, Map Reduce, Hive, NiFi, Sqoop, Pig, Oozie, Impala Hadoop Distribution Horton works, Cloudera

Programming Languages Python, Scala, Java, and basics of R programming Scripting Unix

Data Base My SQL, Postgres, Oracle, SQL Server

Tools Kylo, Informatica, R Studio, Eclipse, Note Pad++, Sonar cube, Veracode AWS Services EMR, Glue, Lambda, S3, EC2, CFT, SM, SQS, SSM, Elastic Cache, Athena, Redshift. Project Name Domain Organization

EPI Slicer Health Care & Life sciences EPAM Systems Collaborate Development and Management Banking and Financial services Deloitte Government Programs Division Health Care Deloitte

Droid Health Care & Life sciences Cognizant

HMRC Banking and Financial services Cognizant

Claims Health Care Cognizant

Certifications

Certification Specialty Validation Number Reference Link AWS Certified Data Analytics - Specialty RFRQN1XCZJBQ18W0 https://www.credly.com/users/raghunath- chowdary-kolla/badges

Educational Background

B.Tech 2015 JNTUK 74%

Contact this candidate