PRANAY KUMAR REDDY
******.***@*****.***
Professional Summary:
Having 5+ years in the software industry and a specialization as an Azure Data Engineer, I have extensive experience in real-time data processing using Kafka and Flume, and building data pipelines with Azure Data Factory and Databricks. I’ve managed and migrated SQL databases and on-premises systems to Azure services including Data Lake, SQL Database, and Synapse Analytics. My skills encompass Spark application development using Spark-SQL and PySpark, data visualization with Power BI, and handling JSON and Parquet formats. I have strong knowledge across the full tech stack, including Java, JavaScript, and various databases (SQL Server, NoSQL, HBase, Cassandra). I’m proficient in ETL processes, cloud solutions on Azure, and BI tools, with a solid grasp of Agile methodologies, CI/CD in Azure DevOps, and MDM solutions.
Education:
Masters in Data science- Pace University
Technical Skills:
Microsoft Azure
Azure Databricks, Azure Data Factory, Synapse Analytics, HDInsight, ADLS, Azure Storage, Data Explorer, Azure Functions, Event Hub, IOT Hub, Logic apps, Stream Analytics, Azure Web App, Azure Analysis Services, Application Insights, Azure Active Directory, Key vault, Angular, Hadoop, React
Hadoop/Big Data Technologies
Apache Hadoop 2.x/1.x, Cloudera CDP, Hortonworks HDP, HDFS, Map Reduce, Sqoop, Hive, Oozie, Spark, Zookeeper, Kafka, Airflow, Flume.
Scripting Languages
Python, SQL, Scala, Spark Scala, Snowflakes, Pig, HiveQL, C,
NOSQL Database
Azure Cosmos DB, MongoDB, DynamoDB, HBase, Cassandra, (SSIS/SSRS/SSAS)
Relational Databases
Azure SQL Database, SQL Server, Oracle, MYSQL, PL/SQL, RDBMS, API
Monitoring/Reporting
Power BI, Tableau, Azure Monitor, Log Analytics, Custom shell scripts
Development/Build Tools
PyCharm, VS Code, IntelliJ, log4J
Operating Systems
Linux, Unix, Windows10, Windows8, Windows7
Version Control
GIT, Bitbucket, SVN
Methodologies
Agile, Waterfall
Data Modeling
Erwin, TOD, Fact and Dimensional Data Modeling, UML, Logical data modeling and Physical data modeling
Professional Experience:
Client: CVS Health, Dallas, TX Dec 2023 - Till Date
Role: Azure Data Engineer
Leveraged Azure Data Factory, Data Lake, HDInsight, Synapse Analytics, Cosmos DB, and Databricks for comprehensive data analysis and management within a Big Data framework.
Worked with the analysis teams and management teams and supported them based on their requirements
Migrated SQL databases to Azure platforms using DDL statements and configured Linked Servers for data transfer between SQL servers.
Developed Spark applications in Databricks for data extraction, transformation, and aggregation, enhancing ETL processes.
Utilized Python, PySpark and Linux Shell scripting for data loading, transformation, and integration efforts, ensuring seamless workflow.
Automated cluster creation in Azure HDInsight’s with PowerShell scripts, streamlining deployment processes.
Designed, developed, and maintained scalable data pipelines using AWS services such as EMR, Glue, and Redshift to support real-time analytics.
Implemented data lake solutions on AWS S3, integrating data from multiple sources for big data analytics and machine learning models.
Orchestrated batch and real-time data processing using Apache Spark on AWS EMR, improving data processing speeds by 40%.
Developed server less data processing solutions using AWS Lambda and Kinesis, enabling near real-time data streaming and analytics.
Continuously monitor and manage data pipeline (CI/CD) performance alongside applications from a single console with Azure Monitor.
Designed and deployed production-grade data solutions on platforms like Snowflake Data Warehouse and Azure Data Lake, enhancing data analytics capabilities.
Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) Informatica and processing the data in In Azure Databricks.
Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
Responsible for Azure Data Factory job monitoring and troubleshooting the failures and providing the resolution for the ADF jobs failures.
Continuously monitor and manage data pipeline (CI/CD) performance alongside applications from a single console with Azure Monitor.
Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
Worked on Parquet file format and other kind of different file types.
Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster.
Working with APIs unstructured data.
Involved in Database Design and development with Business Intelligence using SQL Server 2014/2016, Integration Services (SSIS), DTS Packages, SQL Server Analysis Services (SSAS), DAX, OLAP, RDBMS Cubes, Star Schema and Informatica Snowflake Schema.
Extensively utilized SSIS packages to create complete ETL process and load data into database which was to be used by Reporting Services.
Identified the dimension, fact tables and designed the data warehouse using star schema. Developed Multi-Dimensional Objects (Cubes, Dimensions) using SQL Server Analysis Services (SSAS).
Client: Deutsche Bank, India Jun 2018 – Aug 2022
Role: Azure Data Engineer
Extensively used Azure DevOps for code check-in and checkouts for version control
Utilized GCP with Python for Big Data Analytics and Machine Learning, implementing Spark ML for enhanced insights.
Designed SQL database structure with Django Framework using agile methodology, ensuring scalability and flexibility.
Built dataflow pipelines to migrate Hedis medical data from multiple sources to target data platforms efficiently.
Developed complex SSIS packages using SQL Server and Teradata, optimizing data integration processes.
Designed custom Spark REPL application to handle similar datasets Creating Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Data Bricks, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
Utilized AWS Athena for querying large datasets directly from S3, reducing the need for complex data warehousing solutions.
Implemented CI/CD pipelines for data engineering projects using AWS Code Pipeline and Cloud Formation, ensuring seamless deployment and version control. Configured and optimized AWS Redshift clusters, ensuring cost-effective and high-performance query execution for complex datasets.
Involved in Database Design and development with Business Intelligence using SQL Server 2014/2016, Integration Services (SSIS), DTS Packages, SQL Server Analysis Services (SSAS), DAX, OLAP, RDBMS Cubes, Star Schema and Informatica Snowflake Schema.
Managed Microsoft SQL Servers and implemented maintenance jobs for 10 instances, ensuring data reliability and performance.
Used Python, PySpark to extract weekly information from XML files.
Implemented real time data injection using Apache Kafka.
Developed Spark SQL to load tables into HDFS to run select queries on top.
Utilized GIT for version control, JIRA for project management, and Jenkins for continuous integration and Deployment (CI/CD) processes.
Experience writing Shell scripts in Linux OS and integrating them with other solutions.
Implemented ETL processes from various sources like Kafka, NIFI, Teradata, and DB2 using Hadoop Spark, enhancing data processing capabilities.
Environment: ETL, Kafka 1.0.1, Hadoop 3.0, HDFS, MDM, Agile, Airflow, MS Azure, Apache Tableau NiFi, Spark Scala Pyspark, Spark 2.3, SQL, Python Erwin 9.8, CI/CD, Hive 2.3, NoSQL, HBase 1.2, Pig 0.17, OLTP, HDFS, MySQL, Sqoop 1.4, OLAP