Data Engineer Processing

Location:

DeKalb, IL

Posted:

January 26, 2024

Contact this candidate

Resume:

PRUDHVI RAJ VUTUKURU

Data Engineer

**************@*****.*** 815-***-**** www.linkedin.com/in/pr-v PROFESSIONAL EXPERIENCE

Fidelity – Contract Southlake, TX

Data Engineer January 2023 – Current

• Orchestrated the development and rollout of a cloud-based database infrastructure, amplifying data accessibility by 80% and slashing operational expenses by 50%.

o Managed and monitored AWS resources—EC2 instances, S3 buckets, and Redshift clusters—to optimize performance and cost-effectiveness.

o Constructed and executed PySpark in AWS Glue and AWS EMR data pipelines, yielding a 30% uptick in processing velocity.

• Sqoop jobs to incrementally load Hive tables, resulting in 10% faster data processing and heightened data integrity.

• Engineered an ETL pipeline handling over 10 million rows, culminating in an 80% enhancement in data retrieval for analysts. o Orchestrated end-to-end ETL framework using AWS services: Glue, PySpark scripts, Spark SQL, S3, EMR, Data Pipeline, Athena, SNS, EC2, Kinesis, Redshift, IAM, and VPC. o AWS Lambda functions in Python (Boto3 API) to dynamically activate data pipelines.

• Architected and deployed a robust streaming infrastructure leveraging Kafka, HDFS, HBase, and Hive. Optimized data flow, resulting in a 50% increase in real-time analytics efficiency and a 40% reduction in processing time.

• Extracted data from diverse systems (S3, Redshift, RDS), and populated Glue Catalog tables/databases via Glue Crawlers.

• Optimized Power Query (Power BI) for interactive visualizations, resulting in a 20% acceleration in report generation.

• Developed Stored procedures/views in Snowflake and facilitated data sharing between two Snowflake accounts. Tata Consultancy Services Hyderabad, India (Offshore) Data Engineer September 2018 – December 2020

• Streamlined ETL (Extract, Transform, and Load) pipelines via Azure Data Factory, Azure Databricks and Azure Analytics, culminating in a 75% reduction in overall execution duration.

• Leveraged Microsoft Azure services including HD Insight Clusters, Azure SQL, BLOB, Azure Data Lake Storage, Azure Synapse, Data Factory, and Azure Databricks to optimize data processing and storage workflows, resulting in a time reduction in data ingestion time and improved data accuracy for reporting and analytics purposes.

• Migration of data from an On-premises Sales force server to Cloud databases (Azure Synapse Analytics/Data Warehouse & Azure SQL DB) and Data Lake Storage.

o Python and Pyspark within Databricks to transform data from SQL server to Azure Data Lake Storage. o Devised Stored Procedures tailored to precisely store data as necessitated within SQL Server.

• Orchestrated the development and implementation of data pipelines using PySpark, significantly truncating data processing time and empowering real-time data analysis for the BI team.

• Crafted Power BI reports employing an array of visualization techniques, including line charts, doughnut charts, tables, matrices, KPIs, scatter plots, and box plots.

Broadcom Hyderabad, India (Offshore)

Big Data Engineer May 2017 – September 2018

• Impacted business by designing and implementing end-to-end Big Data flow (data ingestion upstream to HDFS and processed the data for analysis), reducing data processing time by 50%.

• Built the data ingestion system using Nifi to process data in Avro, Snappy, bzip2, and JSON formats, improving data processing speed by 2x.

• Amplified data ingestion efficiency by crafting optimized Spark-SQL and Python queries for loading JSON data, generating schema RDD, and populating Hive tables. This refinement significantly enhanced data accuracy and fortified business insights.

• Engaged in fine-tuning volumes and managing EC2 instances across multiple VPC instances.

• Innovated data ingestion modules—enabling both real-time and batch data loads—into various layers in S3, Redshift, and Snowflake utilizing AWS Kinesis, AWS Glue, AWS Lambda, and AWS Step Functions. PROJECTS

Business Intelligence Power BI Visual Studio SQL Database. August 2022 – December 2022 Analysis and Visualization of real-time data sets to understand business needs and improvements. Big Data Analysis for Business R-studio’s Spark January 2022 – May 2022 R-Programming for visualizing the car's data set, Spark for the million rows operations, and Graphic approach to finalize the finding.

Business System Analysis January 2022 – May 2022

Applying the agile approach, creating a test case and use cases, Five Factor evaluation report, and defect reports. TECHNICAL SKILLS / INTERESTS / MISCELLANEOUS

Big Data Ecosystem: HDFS, Yarn, MapReduce, Spark, Kafka, Hive, Airflow, Sqoop, HBase, Flume, Pig, Oozie. Hadoop Distributions: Hadoop, Cloudera CDP.

Cloud Environment: AWS (EMR, EC2, EBS, RDS, S3, Athena, Lambda, SQS, DynamoDB, Cloud trail, and Redshift), Azure (Data Factory, Databricks, ADLS, Kubernetes, Synapse, and BLOB). Scripting Languages: Python, PySpark, MySQL, NoSQL, MSSQL, TSQL, Scala, PostgreSQL, Shell Script, Pig Latin, HiveQL. ETL/BI: Snowflake, SSIS, Talend, SSRS, SSAS, Tableau, Power BI. Operating Systems: Linux (Ubuntu, Centos, RedHat), Unix, and Windows. Others: Docker, JIRA.

Languages: English, Telugu, Hindi.

Interests: Cooking, Exercise, Gaming, and Travel.

Miscellaneous: Volunteering Chi-Care.

EDUCATION

Northern Illinois University, M.S (3.83) USA. January 2021 – December 2022 Master of Science in Operation management & Information Systems. Vellore Institute of Technology, (7.93) INDIA. January 2014 – May 2018 Bachelor of Engineering in Electronics and Communications. CERTIFICATIONS

• Business Process Integration with SAP S/4 HANA 1809, SAP Certified Associate.

• Azure Fundamentals: AZ-900 Certification.

• Snowflake Decoded – Fundamentals and hands-on Training.

Contact this candidate