Azure Cloud Software Development

Location:

San Francisco, CA, 94102

Posted:

February 05, 2024

Contact this candidate

Resume:

Name: Priya Reddy Email: **********.*****@*****.*** Phone:614-***-****

LinkedIn: https://www.linkedin.com/in/sai-mallu-abc

PROFESSIONAL SUMMARY:

●Around 5+ years of Professional experience in IT Industry, experience with specialization in Data Warehousing, Decision support Systems and extensive experience in implementing Full Lifecycle Data Engineering Projects and in Hadoop/Big Data related technology experience in Storage, Querying, Processing, analyzing the data.

●Software development involving cloud computing platforms like Amazon Web Services (AWS), Azure Cloud.

●Hands-on experience on AWS Services like S3 for Storage, EMR for running spark jobs and hive queries, Glue – ETL pipelines, Athena – creating external tables.

●Skilled in data cleansing, preprocessing using Python and creating data workflows with SQL queries.

●Design & implemented solutions on Azure cloud by creating pipelines using ADF – Azure data factory, linked services, data sets, Azure Blob Storage, Azure Synapse, Azure Databricks.

●Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node.

●Led successful migration initiatives from Google Cloud Platform (GCP) to Amazon Web Services (AWS), overseeing end-to-end processes from initial assessment to implementation.

●Strong experience in migrating other databases to Snowflake. In-depth knowledge of Snowflake Database, Schema and Table structures.

●Strong knowledge of various data warehousing methodologies and data modeling concepts.

●Experience in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Zookeeper and Flume.

●Experience in analyzing data using Spark, Python, Sql.

●Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems like Teradata, Oracle, SQL Server and vice-versa.

●Developed Apache Spark jobs using Python in a test environment for faster data processing and used Spark SQL for querying.

●Managing application code using Azure GIT with required security standards for .Net and java applications.

●Experienced in Spark Core, Spark RDD, Pair RDD, Spark Deployment Architectures.

●ExperiencedwithperformingrealtimeanalyticsonNoSQLdatabaseslikeHBaseandCassandra.

●Worked on AWS EC2, EMR and S3 to create clusters and manage data usingS3.

●Experienced with Dimensional modelling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.

●Strong understanding of the entire AWS Product and Service suite primarily EC2, S3, Lambda, Redshift, EMR(Hadoop) and other monitoring service of products and their applicable use cases, best practices and implementation, and support considerations.

●Experience in developing enterprise level solution using batch processing(using Apache Pig) and streaming framework (using Spark Streaming, Apache Kafka & Apache Flink)

●Extensive experience in designing and implementation of continuous integration, continuous delivery, continuous deployment through Jenkins.

●Installed and configured Apache airflow for workflow management and created workflows in python.

●Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow and Oozie.

TECHNICAL SKILLS:

Big Data Technologies

Hadoop (Yarn), HDFS, Map Reduce, Spark, Hive, Sqoop, Flume,Zookeeper, Oozie.

Programming Languages

SQL, HQL, MS SQL, Python, pyspark, Java

Distributed computing

Amazon EMR (Elastic MapReduce), Horton Works (Ambari), Cloudera (HUE), PuTTY

Relational Databases

Oracle 11g/10g/9i, MySQL, SQL Server 2005,2008

NoSQL Databases

HBase, MongoDB, Cassandra, PostgreSQL

Cloud Environments

AWS (EC2, EMR, S3, Kinesis, DynamoDB), Amazon Redshift,Azure data lake storage, Data fatory, Databricks

Data File Types

JSON, CSV, PARQUET, AVRO, TEXTFILE

PROFESSIONAL EXPERIENCE:

Client: Mitchell International, CA June 2022 – Till date

Role: Data Engineer

Responsibilities:

●Involved in loading and Transforming sets of Structured, Semi Structured and Unstructured data and analyzed them by running Hive queries and Spark sql.

●Involved in migrating SQL database to Azure data lake, Datalake Analytics, Databricks and Azure SQL data warehouse.

●Responsible for the design, implementation, and architecture of very large-scale data intelligence solutions around Snowflake Data Warehouse.

●Hands on experience in Azure cloud services(PaaS & laaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis services, Application Insights, Azure monitoring, KeyVault, Azure Data Lake.

●Controlling and granting database access and migrating on premise databases to Azure data lake store using Azure Data Factory.

●Demonstrated ability to design, build, and maintain RESTful interfaces, ensuring secure and reliable data interactions.

●Comprehensive experience spanning data mart, warehousing, and OLTP environments.

●Migrated data from on-premises SQL Database to Azure Synapse Analytics using Azure Data Factory, designed optimized database architecture.

●Proficient in Azure Data Factory to perform incremental Loads from Azure SQL DB to Azure Synapse.

●Involved in the development of real time streaming applications using PySpark, Apache Flink, Hadoop Cluster.

●Data Ingestion to one or more Azure Services and processing the data in Azure Data Bricks and write the data in the form of Text files, Parquet Files.

●Created the RDD's, Data frame for the required input data and performed the data transformations using PySpark..

● Developed custom Python scripts to perform data transformations, cleaning, scripting, data manipulation, and automation and processing based on specific project requirements.

● Collaborated with cross-functional teams to integrate Python scripts into ETL pipelines for enhanced functionality.

● Developed and executed complex SQL queries and stored procedures for data extraction, transformation, and loading (ETL) processes.

● Experience in seamlessly integrating REST APIs within data engineering workflows, facilitating efficient communication and data exchange between diverse applications and systems.

● Proficient in leveraging Databricks for advanced analytics and data engineering tasks, including the efficient processing and transformation of large datasets.

● Experience with Delta Live Tables, showcasing expertise in managing and optimizing data lakes, ensuring real-time data updates and maintaining transactional consistency.

● Advanced proficiency in Python programming for data engineering tasks, including data manipulation, ETL processes, and automation of data workflows.

● Extensive experience working with MongoDB, demonstrating skills in NoSQL database design, data modeling, and querying for scalable and flexible data storage solutions.

● Proficient in designing, implementing, and optimizing data solutions within the Snowflake cloud-based data warehousing platform.

● Demonstrated ability to leverage Snowflake's features for efficient data modeling, querying, and management of complex data structures.

● Extensive experience in constructing end-to-end ETL/ELT pipelines using industry-leading tools such as Informatica, Oracle Data Integrator (ODI), and Azure Data Factory.

● Proven track record of orchestrating seamless data workflows, ensuring the efficient extraction, transformation, and loading of diverse datasets.

● Hands-on involvement in the complete lifecycle of ETL/ELT processes, including designing, building, testing, and migrating systems, showcasing adaptability to varied data engineering challenges.

● Adept at utilizing various data movement tools to create and maintain scalable and high-performance data interfaces.

● Demonstrated capability in optimizing and fine-tuning data movement processes to meet performance and scalability requirements.

●Involved in supporting a cloud-based data warehouse environment such as Snowflake

●Involved in requirement analysis, design, coding and implementation. Using Linked Services connected to Sql server, Teradata and get the data into ADLS and BLOB storage.

●Controlling and granting database access and migrating on premise databases to Azure data lake store using Azure Data Factory.

●Optimized SQL queries for better performance, utilizing indexing and query execution plans. Proficient in developing complex stored procedures, triggers, and functions using T-SQL.

●Designed and implemented ETL workflows using Informatica PowerCenter to extract, transform, and load data from heterogeneous sources.

● Familiarity with Redis, showcasing the ability to implement in-memory data structures and utilize caching mechanisms to enhance data retrieval performance.

●In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames.

●Experience managing Azure Delta Data Lakes (ADLS) and Delta Data Lake Analytics and an understanding of how to integrate with other Azure Services

●Developed spark application by using python (PySpark) to transform data according to business rules.

●Involved in creating Hive scripts for performing ad hoc data analysis required by the business teams.

●Used GitHub for branching, tagging, and merging, Confluence for documentation.

Environment: Azure Data Factory (ADF), Azure DataBricks, Alteryx, Azure Data Lake Storage (ADLS), Snowflake, Blob storage, Java, Delta Lake, Python, SSIS, Flink 1.14, PySpark.

Client: Change Healthcare, Chicago, Illinois Mar 2021 – May 2022

Role: Data Engineer

Responsibilities:

●Worked with building data warehouse structures, and creating facts, dimensions, aggregate tables, by dimensional modeling, Star and Snowflake schemas.

●Worked on Exporting and analyzing data to the Snowflake using for visualization and to generate reports for the BI team. Use Amazon Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as storage mechanism.

●Used Spark-SQL to Load data into Hive tables and Written queries to fetch data from these tables. Implemented partitioning and bucketing in hive.

●Implemented automated data pipelines using AWS Glue, reducing manual intervention and improving overall data processing efficiency.

●Responsible for building scalable distributed data solutions using Snowflake cluster environment with Amazon EMR.Created external tables in hive and Athena when the data is stored in s3 buckets.

●Experienced in performance tuning of Spark Applications for setting correct level of Parallelism and memory tuning.

●Expert in designing ETL data flows using creating mappings/workflows to extract data from SQL Server and Data Migration and Transformation from Oracle/Access/Excel Sheets using SQL Server SSIS.

●Worked on Exporting and analyzing data to the Snowflake using for visualization and to generate reports for the BI team.

●Use Amazon Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as storage mechanism.

●Utilized AWS Glue for automated schema evolution, ensuring flexibility in handling changes to data structures.

●Orchestrated complex workflows and data pipelines using AWS Step Functions.Integrated Step Functions with other AWS services to create scalable and fault-tolerant workflows.

●Developed complex spark applications for performing various denormalization of the datasets and creating a unified data analytics layer for downstream teams.

●Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.

●Implemented continuous integration and deployment using CI/CD tools like Jenkins, GIT, Maven.

●Involved in loading data from rest endpoints to Kafka Producers and transferring the data to Kafka Brokers.

●Automation tools like Airflow was used for scheduling jobs. Using Bash Operator, S3 Operator, Python Operator Etc. to run python applications.

Environment: AWS S3, EMR, Athena, Druid, IAM, Teradata, PySpark, Oracle, Airflow, Flink, Snowflake, Hadoop

Client: CapGemini, Hyderabad, India June 2018 – Dec 2020

Role: Big Data Developer

Responsibilities:

●Involved in Requirement gathering, writing design specification documents and identified the appropriate design pattern to develop the application.

●Worked on Oracle PL/SQL, a procedural programming language embedded in the database, along with SQL itself.

●Created Talend jobs to load data into various Oracle tables. Utilized Oracle stored procedures and wrote few

●Worked on ER/Studio for Conceptual, logical, physical data modeling and for generation of DDL scripts.

●Analyzed current processes and technologies, contributing through to the delivery/ integration of new solutions.

●Handful of experience in optimization of SQL statements and PL/ SQL blocks by analyzing the execution of SQL plan statements and created and modified triggers, SQL queries, stored procedures for the better performance.

●Designed Use Case Diagrams, Class Diagrams and Sequence Diagrams and Objects.

●Involved in designing user screens using HTML as per user requirements.

●Used Spring-Hibernate integration in the back end to fetch data from Oracle and MYSQL databases.

●Installed Web Logic Server for handling HTTP Request/Response.

●Used Subversion for version control and created automated build scripts.

Environment: CSS, HTML, JavaScript, Oracle, Eclipse IDE, Java, MYSQL, Stored Procedures, DevOps, Triggers, PL/SQL.

Education :

Completed Masters in Data Science from University of New Haven Jan 2021-Aug 202

GPA: 3.4

Completed Bachelors in ECE from JNTUH June 2015- July 2019

GPA: 7.5

Contact this candidate