Resume

Data Engineer Big

Location:

Charlotte, NC, 28299

Posted:

October 04, 2023

Contact this candidate

Resume:

RUDRA TEJA ANNANGI

adz5ly@r.postjobfree.com 660-***-**** LinkedIn id: LinkedIn

SUMMARY

More than 5 years of experience as a Data Engineer with demonstrated professional working proficiency in SQL, Python, and cloud platforms such as GCP, Azure, and AWS. Passionate about leveraging data to drive business value and committed to staying up to date with emerging technologies and industry trends. Adept at collaborating with cross-functional teams, troubleshooting data-related issues, and implementing data governance policies and security protocols.

TECHNICAL SKILLS

Programming Languages: Python, Java, C++, Scala, SQL

Amazon Web Services: S3, EC2, Glue, EMR, Redshift, SQS, SNS, Lambda, Snowflake/DW

Azure: Blob Storage, Databricks, Data Factory, Data Lake, Azure Synapse

Big Data: Hadoop, Hive, Apache Spark, PySpark, HBase, Kafka, Yarn, Sqoop, Impala, Oozie, Map Reduce, Zookeeper, Flume

Databases: Oracle, SQL (Postgres, MySQL, SQL Server), No SQL (MongoDB), Teradata

Scripting and tools: Shell scripting, Eclipse, IntelliJ

File formats: Parquet, Avro, JSON, CSV

Methodologies: Agile and SDLC Waterfall

Build Tools and Version Control: Maven, Jenkins, Bit Bucket, GIT, and GitHub

Operating System: Mac OS, UNIX, Linux, Windows 2003/7/8/10/XP

BI & Data Visualization: ETL – Informatica, SSIS, Talend, Excel, Tableau, Power BI, Google Sheets

PROFESSIONAL EXPERIENCE

Data Engineer, Cigna Health Care Bloomfield, Connecticut March 2023 – Present

Successfully executed data ingestion, cleansing, and transformation processes using AWS Lambda, AWS Glue, and Step Functions.

Implemented a serverless architecture utilizing API Gateway, Lambda, and DynamoDB, with seamless deployment of Lambda code from S3 buckets.

Configured comprehensive monitoring, alarms, notifications, and logs for Lambda functions and Glue Jobs using CloudWatch.

Implemented CI/CD tools such as Jenkins and Git Bucket for efficient code repository management, build automation, and deployment of the Python codebase.

Developed PL/SQL packages, database triggers, and user procedures, and prepared comprehensive user manuals for new programs.

Developed Kafka consumer API in Python for consuming data from Kafka topics.

Worked with streaming ingest services, combining batch and real-time processing using Spark Streaming and Kafka.

Specialized in building Data Lakes, HUB, and Data Pipelines using Big Data technologies and tools such as Apache Hadoop, Cloudera, HDFS, MapReduce, Spark, YARN, Delta-lake, and Hive.

Optimized existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frames, and Pair RDDs.

Proficient in using Python for data manipulation, cleaning, and analysis, leveraging libraries such as pandas and NumPy.

Use Control-M to keep track of the performance of your data processing workflows.

Integrated Lambda with SQS and DynamoDB using Step Functions to enable efficient message iteration and status updates.

Extracted EDI files and performed data transformation to convert the raw EDI data into a JSON format.

Extracted data from multiple source systems, including S3, Redshift, and RDS, and created tables/databases in Glue Catalog using Glue Crawlers.

Developed Glue ETL jobs in Glue Studio for streamlined data processing and performed transformations before loading data into S3, Redshift, and RDS.

Worked on performance tuning for data transformation and ETL process.

Used AWS MSK to replicate data and for ETL workflows to transform and move data.

Used HVR to replicate data from EHRs to Amazon S3 and to replicate Amazon Redshift.

Conducted comprehensive architecture and implementation assessments of key AWS services such as Amazon EMR, Redshift, and S3.

Utilized AWS EMR for large-scale data transformation and movement between AWS S3.

Configured S3 event notifications, SNS topics, SQS queues, and Lambda functions to enable efficient Slack message notifications.

Automated data storage from streaming sources to AWS data lakes (S3, Redshift, RDS) using AWS Kinesis (Data Firehose).

Worked on BI tools such as Tableau to create dashboards like weekly, monthly, and daily reports using Tableau desktop and publish them to the HDFS cluster.

Prototyped analysis and joining of customer data using Spark in Scala and processed it to HDFS.

Created Kinesis Data Streams, Kinesis Data Firehose, and Kinesis Data Analytics for streamlined streaming data capture, processing, and storage in S3, DynamoDB, and Redshift.

Developed Terraform scripts for seamless automated deployment of AWS resources, including EC2 instances, S3, EFS, EBS, IAM Roles, Snapshots, and Jenkins Server.

Proficient in working with different file formats, such as CSV, JSON, Flat, and fixed width, for efficient data loading from diverse sources into raw tables.

Data Engineer, Truist Bank Charlotte, North Carolina October 2021 – July 2022

Worked with different feeds data like JSON, CSV, XML, and DAT and implemented the Data Lake concept.

Used AWS Lambda to perform data validation, filtering, sorting, or other transformations for every data change in a DynamoDB table and load the transformed data to another data store.

Developed the PySpark code for AWS Glue jobs and for EMR.

Using Spark, perform various transformations and actions, and the result data is saved back to HDFS from there to target database Snowflake.

Written and executed several complex SQL queries in AWS glue for ETL operations in Spark data frame using Spark SQL.

Defined AWS Lambda functions for making changes to Amazon S3 buckets and updating the Amazon DynamoDB table.

Good exposure to IRI end-end analytics service engine, and big new data platform (Hadoop loader framework, big data Spark framework, etc.)

Use of Docker and Kubernetes to manage microservices for the development of continuous integration and continuous delivery.

Integrated DBT with AWS step functions to create automated data pipelines.

Developed diverse and complex MapReduce and Streaming jobs using Scala and Java for data cleansing, filtering, and aggregation, and possess a comprehensive understanding of MapReduce.

Extensive experience in developing Stored Procedures, Functions, Views and Triggers, and Complex SQL queries using SQL Server, TSQL, and Oracle PL/SQL.

Used Kafka producer to ingest raw data into Kafka topics and run the Spark Streaming app to process clickstream events.

Responsible for data gathering from multiple sources like Teradata, and Oracle and created Hive tables to store the processed results in a Tabular format.

Worked on migration for large amounts of data from OLTP to OLAP by using ETL Packages.

Created Data frames from different data sources like Existing RDDs, Structured data files, JSON Datasets, Hive tables, and External databases.

Loaded terabytes of different levels of raw data into Spark RDD for data Computation to generate output response.

Configured Hive metastore with MySQL, which stores the metadata for Hive tables.

Environment, utilizing Kubernetes and Docker for the runtime environment for the CI/CD system to build and test, and deploy.

Used Scala components to implement the credit line policy based on the conditions applied on spark data frames.

Involved in converting HiveQL into Spark transformations using Spark RDD and Scala programming.

Python was used in the automation of Hive and Reading Configuration files.

Data Engineer, IBM India Pvt Ltd/ PayPal Bangalore, India April 2019 – May 2021

Experienced in implementing the Data warehouse on AWS Redshift.

Involved in designing and deploying multi-tier applications using all the AWS services like (EC2, S3, RDS, Dynamo DB, SNS, SQS, and IAM) focusing on high- availability, fault-tolerance, and auto-scaling in AWS Cloud Formation.

Used Cloud EC2 instances to execute the ETL jobs and publish data to S3 buckets for external vendors.

Evaluated Snowflake Design considerations for any change in the application and defined virtual warehouse sizing for Snowflake for different types of workloads.

Created and managed data catalog using AWS Glue to ensure accurate and consistent metadata for data in the AWS Data Lake.

Built the Logical and Physical data model for Snowflake as per the changes required.

Written Templates for AWS infrastructure as a code using Terraform to build staging and production environments and defined Terraform modules such as Compute, Network, Operations, and Users to reuse in different environments.

Created, modified, and executed DDL in table AWS Redshift and Snowflake tables to load data.

Worked with Spark to improve the performance and optimization of the existing algorithms in Hadoop.

Worked on performance tuning for data storage and query performance optimization.

Developed Scala-based applications for big data processing using Apache Spark.

Used Spark Streaming APIs to perform transformations and actions on the fly to build a common learner data model that gets the data from Kafka in real-time and persists it to Cassandra.

Expertise in Creating, Debugging, Scheduling, and Monitoring jobs using Airflow for ETL batch processing to load into Snowflake for analytical processes.

Extensive experience on Terraform to map complex dependencies and identify network issues and implement Terraform key features such as Infrastructure as code, Execution plans, Resource Graphs, Change Automation.

Worked on Microservices for Continuous Delivery environment using Docker and Jenkins.

Developed Kafka consumer API in Python for consuming data from Kafka topics.

Developed a Pre-processing job using Spark Data frames to flatten JSON documents to a flat file.

Experienced in writing live Real-time processing and core jobs using Spark Streaming with Kafka as a Data pipeline system.

Leveraged Scala and related libraries, such as Apache Spark, for data processing, analytics, and

distributed computing tasksLoaded data into S3 buckets using AWS Glue and PySpark. Involved in filtering data stored in S3 buckets using Elastic search and loaded data into Hive external tables.

Developed solutions for import/export of data from Teradata, Oracle to HDFS, S3, and S3 to Snowflake.

Data Engineer, Wipro IT Corporation Bangalore, India June 2017 – February 2019

Developed and performed Sqoop import from Oracle to load the data into HDFS.

Created Partitions, Buckets based on State to further process using Bucket-based Hive joins.

Created Hive tables to store the processed results in a tabular format.

Load and transform data using scripting languages and tools (Ex: Python, Linux shell, Sqoop)

Imported data sets with data ingestion tools like Sqoop, Kafka, and Flume.

Proficient in using Python for data manipulation, cleaning, and analysis, leveraging libraries such as pandas and NumPy.

Extensively worked with moving data across cloud architectures including Redshift, hive, and S3 buckets.

Developed HBase java client API for CRUD operations.

Involved in working with Hive for the data retrieval process and writing Hive queries to load and process data in the Hadoop file system.

Developed data pipeline using Flume, Sqoop, Hive, and Spark to ingest subscriber data, provider data, and claims into HDFS for analysis.

Working knowledge of Spark RDD, Data Frame API, Data set API, Data Source API, Spark SQL, and Spark Streaming.

Load D-Stream data into Spark RDD and do in-memory data Computation to generate output response.

Designed and implemented Extract, Transform, Load (ETL) pipelines using Scala and AWS services.

Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations on the fly to build the common learner data model and persistence the data in HDFS.

Being confident in and well-experienced as a SQL server database administrator is also beneficial.

Deployed the packages in the solution explorer to catalog which is in the management studio.

constraints table in DW to load data to catch many errors like data type violation, NULL constraint violation, foreign key violation, and duplicate data.

Generated reports to maintain zero percent errors in all the data warehouse tables.

Developed SSIS Packages for migrating data from the Staging Area of SQL Server 2008 to SQL Server 2012.

Qualifications – Qualified with a relevant technical tertiary qualification and/or relevant professional experience.

EDUCATION

Trine University, Detroit, MI Aug 2021- Dec 2022

Master of Science in Information Systems

Jawaharlal Nehru Technological University, Kakinada, India Aug 2013 – May 2017

Bachelor of Technology in Information Technology

Contact this candidate