Resume

Azure Data Engineering

Location:

Chicago, IL

Posted:

January 13, 2024

Contact this candidate

Resume:

Venkata Narayana Reddy Vemireddy

Email: ad2qqm@r.postjobfree.com

Linkedln : https://www.linkedin.com/in/narayanareddyv/ Professional Summary

Overall, 2+ years of diverse experience in the field of Data Space, which includes experience in requirement gathering, design and development, and Implementation of various applications as well as experience as a Data Engineer with cloud platforms like AWS, AZURE, GCP, SNOWFLAKE and DATABRICKS. Proficient in Druid and other big data technologies using Spark for data cleansing, data analysis, structured and unstructured data transformations, data modeling, data warehousing, and data visualizations using PySpark, Spark SQL, Python, SQL, Airflow, Kafka, SSIS, Sqoop, Oozie, Hive, Tableau and Power BI. In addition, I hold a DP-203 Azure data engineering certification.

Having experience in designing and developing Azure Cloud solutions, specializing in data migrations, Business Intelligence, ETL, ELT, Data Integration, and BI Reports development. In addition, I hold a DP-203 Azure data engineering certification.

Experience in dealing with Apache Spark and Apache Hadoop components like RDDs, Data Frames, Spark- Streaming, HDFS, MapReduce, HIVE, HBase, PIG, and SQOOP.

Proficient in working with Azure non-relational data solutions like Azure Data Factory, Azure Cosmos DB,Large-Scale Data Processing with Azure Data Lake Storage, Data Streaming Solutions with Azure Streaming Analytics, Azure Synapse Analytics, and Data engineering with Databricks, Spark, and languages such as Python.

Experience in data ingestion and processing, utilizing a combination of Azure services including Azure Data Factory, T-SQL, Spark SQL, and Azure Data Lake Analytics.

Proficient in responsible for designing, developing, and maintaining data transformation pipelines using DBT, enabling data- driven decision-making for the organization.

Experience in Developing Spark applications using Spark - SQL/PYSPARK in Databricks for ETL from multiple files

(Parquet, CSV file, AVRO) formats for analyzing & transforming the data to customer usage patterns.

Working experience in creating ADF Pipeline using Linked Services/Datasets/Activities to Extract and load data from different sources like Azure SQL, ADLS, Blob storage, Azure SQL Data warehouse/Synapse.

Successfully migrated data from on-premises SQL Server to Azure Synapse Analytics (DW) and Azure SQL DB, enabling streamlined access to data from the cloud. Having Experience in migrating Jenkins to Harness using Kubernetes and Terraform for CICD in Azure DevOps.

Well-versed in PySpark, Databricks Clusters, Delta Lake, Unity catalog, Workflows, SQL DWH Endpoint, and Azure Data Factory's Dynamic Expression Language, with a strong understanding of how to use them in data migration, processing, and analysis tasks.

Data Migration Specialist with hands-on experience in successfully executing large-scale data migration projects from On- premises to Google Cloud Platform (GCP) using Talend.

Proficient in building and executing SSIS packages within Azure Data Factory using SSIS integration runtime, facilitating seamless data movement across on-premises and cloud environments.

Proficient in working with GitHub, and Azure DevOps Repo, and successfully deployed the Azure DevOps Release Pipeline with ADF ARM Templates and Databricks Notebooks.

Utilized PySpark within Azure Databricks to process and analyze data stored in one or more Azure services, providing actionable insights for stakeholders.

Experienced in Azure Data Purview, Data Governance, Risk Management, and Data Flows, providing a comprehensive approach to data management.

Implemented Data Lineage in Azure Purview from ADF Pipelines.

Created Data Sources like Oracle, Azure Data Lake Storage Gen2, and Snowflake in Azure purview.

Implemented various Glossary creations in Azure Purview.

Experience in managing Azure resource groups, subscriptions, Azure blob, File Storage, and Azure active directory users, groups, and Service Principles.

Hands-on experience with Apache Airflow, an open-source platform for orchestrating and scheduling data workflows. Developed and maintained Airflow DAGs (Directed Acyclic Graphs) for automating data pipelines, including task dependencies, scheduling, and monitoring.

Experience in configuring Spark Streaming to receive real-time data from Apache Kafka and store the stream data to HDFS. and expertise in using spark-SQL with various data sources like JSON, Parquet, and Hive.

Experienced in Optimizing the Virtual warehouses and the SQL Query in terms of the cost in Snowflake.

Well-versed with Hadoop distribution and ecosystem components like HDFS, YARN, MapReduce, Spark, Sqoop, Hive, and Kafka.

Experienced in utilizing AWS utilities such as EMR, S3, and CloudWatch to run and monitor Hadoop and Spark jobs on Amazon Web Services (AWS).

Proficient in working with Amazon EC2 to provide a solution for computing, query handling, and storage across a wide range of applications.

Experienced in utilizing AWS S3 to support data transfer over SSL and ensure automatic encryption of data whenever it is uploaded.

Proficient in Docker containerization technology, leveraging it to package applications and dependencies into lightweight, portable containers.

Experienced in deploying Docker containers to streamline application development, deployment, and scalability processes.

Skilled in Kubernetes orchestration platform, enabling the deployment, scaling, and management of containerized applications across clusters.

Knowledgeable in leveraging Kubernetes to achieve high availability, fault tolerance, and automated scaling for distributed systems.

Expertise in configuring and deploying Hadoop clusters on AWS EC2 instances.

Proficient in using AWS Glue to extract, transform, and load data from various sources to target destinations.

Experienced in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling, and SCD (Slowly changing dimension).

Experienced in working with Google Cloud Platform, Cloud Function, and OKTA-Integration. Developed and maintained data pipelines using Airflow, reducing data processing time.

Strong RDBMS concepts created and maintained Views, Stored Procedures, User-Defined Functions, and System Functions using SQL Server, T-SQL, and worked on the design of star and snowflake schemas. Technical Summary

Programming Python,PySpark,Scale,Java,Shell Script,Perl,TCL,Power Shell,SQL,JavaScript,HTML,CSS. Databases Oracle,MySQL,SQL Server,Mongo DB,Cassandra,Dynamo DB,PostgreSQL,Informatica,Teradata,Cosmos,Neo4j. Cloud

Technologies

Microsoft Azure,Azure Data Factory v2,Data Bricks,Azure Purview,Azure Analysis,Azure Data Lake Store,Storage Blob,Azure SQL DB,Power Bi,AWS(Lambda,EC2,Amazon S3,Redshift,Glue,DynamoDB and Cloud watch),Google platform

Big data

Technologies

Hadoop,MapReduce,HDFS,Sqoop,PIG,Hive,Hbase,Oozie,Zookeeper,Yarn,ApacheSpark,Flume Frameworks Flask,Django

Tools Jupyter,DataBricks Notebook,Pycharm,Eclipse,DBT,Visual Studio,OKTA,SQL*Plus,SQL Developer,SQL Navigator,Query Analyzer,SQL Server Management Studio,Talend,SSIS,Eclipse,Postman Versioning

Tools

SVN,Git,GitHub(Version Control)

Network

Security

Kerberos

Database

Modelling

Dimension Modelling,ER Modeling,Star Schema Modeling,Snowflake Modeling Monitoring

Tool

Apache Airflow,Agile

Visualization Tableau,ggplot2,Matplotlib,PowerBi

Web

Technologies

HTML,CSS,BootStrap,JavaScript

Machine

Learning

Pandas,Numpy,SparkML

Experience

Data Scientist Sep 2021-June 2022

Infowiz Software Solution,India

Key Responsibilities

Conduct data regression analyses of the relationship between company stock prices and industry trends, achieving a 15% more accurate prediction of performance than previous years

Utilize web scraping techniques to extract and organize competitor data

Update company data warehousing techniques such as data recall and segmentation, resulting in a 20% increase 5 in usability for non-technical staff members

Modernize data streamlining processes, resulting in a 47% redundancy reduction

Identify valuable data sources and automate collection processes

Undertake preprocessing of structured and unstructured data

Analyze large amounts of information to discover trends and patterns

Build predictive models and machine-learning algorithms

Combine models through ensemble modeling

Present information using data visualization techniques

Propose solutions and strategies to business challenges

Collaborate with engineering and product development teams

Create analytics solutions for businesses by combining various tools, applied statistics, and machine learning

Lead discussions and assess the feasibility of AI/ML solutions for business processes and outcomes

To boost general effectiveness and performance, stay current with the newest tools, trends, and technologies

I have knowledge of various programming languages, such as python, Perl, C/C++, SQL, and Java it helps data scientists organize unstructured data sets .

Understanding analytical tools such as SAS, Hadoop, Spark, Hive, Pig, and R is one of the most helpful data scientist skills for extracting valuable information from an organized data set.

Adept at Working with Unstructured Data

Web scraping is the automated process of extracting data from webpages.

Skills for Solving Issues the capacity to evaluate challenging issues and develop workable answers

Statistics and probability is the study of randomness and uncertainty in statistics, and the application of mathematical tools to decision-making.

Advanced mathematical ideas used in machine learning and data analysis include multivariate calculus and linear algebra.

The procedure of arranging, saving, and accessing data in a database system is known as database management.

Microsoft Excel is a spreadsheet program used for data display and analysis

Data collection, cleansing, and preparation for analysis is known as data extraction, transformation, and loading.

The fundamental ideas in computer science that underpin effective data storage, retrieval, and computational problems are known as data structures and algorithms.

Data Scientist January 2021 – June 2021

CLOUDYML, India

Key Responsibilities

Worked with management to develop and apply digital marketing plans with a focus on driving acquisition and conversion

Devised and implemented robust digital acquisition plans, ensuring precision in financial reporting, budgets, and forecasts

Increased conversions by 15% from paid sources (PPC, Grant, Display, and VOD)

Enhanced conversion rates by 12% via A/B testing landing pages for a better performing conversion funnel

To analyze statistical data analysis on large sets and identify trends by using advanced mathematical an computer science skills.

It leads research projects to extract valuable information from big data and is skilled in technology, mathematics, business, and communications.

solve real-world problems by analyzing large amounts of business data, defining new metrics and business cases, designing simulations and experiments, creating models, and collaborating with colleagues. Data Engineer Feb 2021- May 2021

PWC Bangalore,India

Key Responsibilities

Built a GCP cost optimization dashboard and automated GCP instance shutdown/startup based on Stack driver metrics.

Created dashboard using Python Flask RESTful API.

Integrated Flask-Application into OKTA.

Exported data from DB2 to HDFS using Sqoop and Developed MapReduce jobs using Python API. Used Spring AOP to implement Distributed declarative transactions throughout the application.

Designed and developed Java batch programs in Spring Batch.

Proficient in writing Stored Procedures, User Defined Functions, and Views as per project requirements.

Experience in database testing to ensure data accuracy and consistency.

Developed datasets using stored procedures and created reports with multi-value parameters.

Deployed SSIS packages and reports in Dev and QA environments and created deployment documentation. Created SSRS reports and dashboards based on customer specifications, ensuring satisfaction with data visualization.

Generated technical documentation, gathered business requirements, and analyzed data for SSIS and SSRS development.

Deployed SQL Databases in the cloud (Azure) and, designed SSIS packages to import data from multiple sources to control upstream and downstream of data into SQL Azure database. Environment:Python,Flask,JavaScript,MySQL

ACADEMIC PROJECTS

Movie Recommendation Using Content Based Algorithm,Collaborative Algorithm and HybridBased Algorithm

Our project is concerned with predicting the user’s interest on movies and recommending the movies to the users according to their interest in the movies.

The main aim is predicting the user’s choice and taste in the movies recommending those movies to the users.

Our project also takes the ratings and reviews from the IMDB ratings which helps to recommend the movies to the users. Datascience/ML/Imageprocessing/DeepLearning/Docker PowerBI/Tableu/AWS/Azure/GCP/DataModeling PHP / Pyspark / Java / Python MySQL / SQLite3 / MongoDB / ETL Git Data Mining Brain ECG Signal Detection Using Machine Learning

electrocardiography (ECG) has become a trusted approach for checking in on how the heart is doing.

In this analysis, researchers compare cardiac output and activity in the brain during times of physical and mental stress Accuracy values of 97% or higher on the validation set show a high level of efficiency.

This paper proposes a method for diagnosing drowsiness utilising a combination of electroencephalogram (EEG) and heart rate (HR) data.

PAPER PUBLICATIONS

Geospatial crime and accidents analysis using advanced data analytics -ICOEI Conference June2022

We will utilize AI and information investigation strategies to blend our outcomes.

The principle motivation behind the dynamic cycle is to remove the information from the information and convert it to information.

The framework naturally screens the idea of the mishap and the wrongdoing, and the model predicts the hot zone in light of the different offenses carried out

CERTIFICATIONS

Proficient in Advanced Data Analysis,MachineLearning,and data engineering -Certified Data Scientist Educational Qualifications

Masters in Computer Science,University Of Central Mussori,Kansas City in 2024

Bachelor of Technology in Computer Science and Engineering,Satyabhama Institute of Science and Technology,Tamil Nadu,India in 2022

Board of Intermediate Education,Vijayawada in 2018

Board of Secondary Education,Vijayawada in 2016

Contact this candidate