Banda Saiprasad Reddy
*********************@*****.***
470-***-**** Sr. Azure/Data Engineer
PROFESSIONAL SUMMARY:
9+ years of broad IT experience in Data Engineer such as, Hadoop, Java/J2EE development, Informatica, data modelling, and system analysis, in the banking, finance, Healthcare, insurance, and retail industries.
Working on multiple stages of Software Development Life Cycle (SDLC) including Development, Component Integration, Performance Testing, Deployment and Support Maintenance.
Knowledgeable in the Hive query language, with experience in Static-Partitioning, Dynamic-Partitioning, Bucketing, and Parallel Execution ideas for improving Hive performance.
Hands on experience on AWS infrastructure services Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2) and Microsoft Azure.
Hands on Experience in VPN, Putty, WinSCP, Eclipse, SBT etc. Responsible for creating Hive tables based on business requirements.
Proficient with container systems like Docker and container orchestration like EC2 Container Service, Kubernetes, worked with Terraform.
Experience in working on PySpark data frames to perform data validations, data analytics on the cloud on Python 3.12/3.11 third party libraries like java pandas and NumPy
Worked on Supply chain KPI Dashboard for executive management where data is ingested from the Teradata, SharePoint, various sales, manufacturing and finance KPIs. Data from these sources ingested and staged into Azure Blob storage and some KPIs into ADLS Gen2 using Azure Data Factory (ADF).
Creating an Azure SQL database and managed its monitoring and restoration. performed a Microsoft SQL server 2022 to Azure SQL database transfer.
Expertise in using core Java, J2EE, Multithreading, JDBC, Shell Scripting and proficient in using Java API's Collections, Servlets, JSP for application development.
Experience in Java, J2ee, JDBC, Collections, Servlets, JSP, Struts, Spring, Hibernate, JSON, XML, REST, SOAP Web services, Groovy, MVC, Eclipse, WebLogic, WebSphere, and Apache Tomcat severs.
Expertise in Python 3.12 scripting, worked in stats function with NumPy, visualization using Hadoop DataStage Pandas for organizing data.
Practical knowledge of the AWS family of services, including Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SES, SQS, Lambda, and EMR.
Experience building data warehouse applications with Hadoop, Informatica, Oracle, Teradata 17.2/17, and MS SQL Server 2022/2019/2017/2016 on UNIX and Windows platforms, as well as expertise building complex mappings with a variety of transformations and developing Extraction, Transformation, and Loading (ETL) strategies with Informatica 9.x/8.x
Proven ability to extract, manipulate, and load data from a variety of heterogeneous systems, including flat files, Excel, Oracle, Teradata 17.2/17, and MSSQL Server 2022/2019/2017/2016.
Working extensively in Data Engineering platforms like Python 3.12, PySpark, SQL, Shell Scripting, Apache Airflow, and Microsoft Azure.
Hands on experience in Big data Hadoop Ecosystem components such as Spark, PySpark, Hive, Sqoop, Kafka, Zookeeper, HBase, Oozie, airflow and MapReduce.
Extensively understanding of Hadoop Architecture, workload management, schedulers, scalability and various components such as YARN, Map Reduce, Strong SQL programming, HiveQL for Hive and HBase.
Experience with snowflake cloud date warehouse and AWS S3 bucket for integrating data from multiple source system which included nested JSON formatted data into snowflake table.
Experience in Server infrastructure development on AWS Cloud, extensive usage of Virtual Private Cloud (VPC), IAM, EC2, RDS, S3, SQS, SNS, and Lambda focusing on high-availability, fault tolerance, and auto-scaling with Cloud watch monitoring.
TECHNICAL SKILLS:
Analytical Tools
Tableau (Desktop/Server/Reader), SSRS 16.0/15.0/14.0, SSAS 16.0/15.0/14.0, OBIEE, Business Objects, Google Analytics (Adobe Site Catalyst).
ETL Tools
Informatics, Talen, Elkview, BI Reporting and Analytic, Apteryx, SSIS 16.0/15.0/14.0
Process
Development, Support, Dynamic Parameters, Dashboard Optimization and Maintenance
Reporting Tools
Spot fire, Tableau 2018.1/2/3, 2019.1/2/3,9.0,9.3/8.2,8.1/8.0/7.0, 10.5
Programming Languages
SQL, PL/SQL, JavaScript, HTML 5, XML, C, T-SQL, Python 3.12/3.11 Python, Scala, Linux, shell scripts, Java Scripting, PL/SQL, Java, Pig Latin, HiveQL
Databases
Teradata 17.2/17, Oracle, DB2 UDB, MS SQL Server 2022/2019/2017/2016, Oracle 12c/10g, MS Access, Active Directory Oracle, MySQL, DB2, MongoDB, HBASE
Technologies
HTML, XML, JSON, CSS, JQUERY, JavaScript
Java Technologies
Core Java, Servlets, JSP, JDBC, Java Beans, J2EE
Business Tools
Tableau, Power BI
Operating System
Windows, UNIX (HP-UX, Solaris, AIX), Linux.
PROFESSIONAL EXPERIENCE:
Client: Lumen Technologies, Denver, CO Jul 2022 – Till Date
Role: Sr. Azure/Data Engineer
Responsibilities:
Having experience in creating pipeline jobs, scheduling triggers, mapping data flows using Azure Data Factory and using Azure Key Vaults to store credentials.
Creating data sharing between two snowflake accounts and redesigned the views in snowflake to increase the performance.
Hands on experience in development of Azure cloud DWH & data lakes using Azure technologies like Spark (Scala or Python), ADF, Azure Databricks, Terraform & CI/CD.
Worked with ETL operations in Azure Databricks by connecting to different relational databases using Kafka and used Informatica for creating, executing, and monitoring sessions and workflows
Developing scalable and fault-tolerant data pipelines using Apache Airflow, enabling workflow orchestration, scheduling, and monitoring of data processing tasks.
Strong experience with Azure SQL Database, Azure Data Factory, Azure SQL Data Warehouse, Azure Synapse and Azure Storage Account
Rapid model creation in Python 3.12 using pandas, NumPy 1.26, Scikit-learn, and plot.ly for data visualization. These models are the
Working on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
Designing complex SQL queries to extract and analyze data from various relational databases, including Oracle, MySQL, PostgreSQL, and Teradata 17.2.
Proficient in Administrating Microsoft Azure Iaas/Paas services like Azure Virtual Machines (VMs), Virtual Network (VNET), Azure Storage, SQL Databases, Azure Active Directory (AAD), Monitoring, DNS, Autoscaling and Load Balancing.
Experience in developing and designing scalable systems using Hadoop technologies, including HDFS, MapReduce, and Hive.
Experience in Big Data Integration and Analytics based on Hadoop, PySpark and No-SQL databases like HBase and MongoDB.
Designing and implementing data warehouses and data marts using components of Kimball Methodology, like Data Warehouse Bus, Conformed Facts & Dimensions, Slowly Changing Dimensions, Surrogate Keys, Star Schema, Snowflake Schema, etc.
Implement Data Migration of Multistate level data from SQL Server 2022 to Snowflake by using Python, Spark and Snow SQL.
Responsible for identifying and implementing new SQL Server 2022 features designed to improve query performance; this included utilizing new T-SQL language enhancements, performing in-memory optimization, and performance tuning with Extended Events.
Hands on experience in Azure Development, worked on Azure web application, App services, Azure storage, Azure SQL Database, Virtual machines, Azure AD, Azure search, and notification hub.
Implement end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using Hadoop.
Strong experience with Azure SQL Database, Azure Data Factory, Azure SQL Data Warehouse, Azure Synapse and Azure Storage Account.
Environment: Azure, PySpark, DynamoDB, S3, Lambda, SQL Server 2022, Airflow, ETL, Snowflake, Python 3.12, Hadoop, SNS, EC2, SQS, NumPy 1.26, MySQL, NoSQL, Pandas, Hive, EMR, Redshift, Glue, Kinesis, Athena, PySpark, Data Warehouse, MapReduce,
Client: Verra Mobility, Mesa, AZ Apr 2020 – Jun 2022
Role: Azure/Data Engineer
Responsibilities:
Worked on a high impact and high visibility project that enables data availability, encompasses data analytics, machine learning, and petabyte scale datasets, and provides reliable and timely access to thousands of data sources.
Used responsible for ingesting data into our data lake and providing frameworks and services for operating on that data including the use of Spark & Databricks.
Design, develop, and test dimensional data models using Star and Snowflake schema methodologies under the Kimball method.
Good knowledge in Database Creation and maintenance of physical data models with Oracle, Teradata, Netezza, DB2, MongoDB, HBase and SQL Server 2019.
Experienced in writing complex SQL Queries, Stored Procedures, Triggers, Views, Cursors, Joins, Constraints, DDL, DML and User Defined Functions to implement the business logic.
Created data frames from different data sources like Existing RDDs, Structured data files, JSON Datasets, Hive tables, and External databases.
Proficient in Administrating Microsoft Azure Iaas/Paas services like Azure Virtual Machines (VMs), Virtual Network (VNET), Azure Storage, SQL Databases, Azure Active Directory (AAD), Monitoring, DNS, Autoscaling and Load Balancing.
Leadership of a major new initiative focused on Media Analytics and Forecasting will have the ability to deliver the sales lift associated the customer marketing campaign initiatives.
Developed Custom ETL Solution, Batch processing and Real-Time data ingestion pipeline to move data in and out of Hadoop using Python 3.11 and shell Script.
Skilled in structuring cluster Auto Scaler for Azure Kubernetes Service (AKS) using Terraform and worked with scheduling, deploying and managing pods and replicas in AKS
Worked on design, architecture and support of new and existing data and ETL pipelines and recommended improvements and modifications.
Created optimal data pipeline architecture and systems using Apache Airflow.
Coordinated with the team and Developed a framework to generate Daily ad-hoc reports and Extracts from enterprise data from Big Query.
Developed Streaming pipelines using Azure Event Hubs and Stream Analytics to analyze data for dealer efficiency and open table counts for data coming in from IOT enabled poker and other pit tables.
Good knowledge in RDBMS concepts (Oracle 12c/11g, MS SQL Server 2019) and strong SQL, query writing skills (by using Erwin & SQL Developer tools), Stored Procedures and Triggers and Experience in writing Complex SQL Queries involving multiple tables inner and outer joins.
Environment: Azure, DynamoDB, Apache Airflow, Big Query, Python 3.11, Databricks, Redshift, Lambda, API Gateway, ETL, JSON, S3, ETL, Aurora, Hive, Shell Script, Snowflake, Spark, SQL Server 2019.
Client: Symphony Software Pvt. Ltd, India Jan 2017 – Jul 2019
Role: Data Engineer
Responsibilities:
Loaded data from Source systems and sent to JMS queue for loading into Target systems using XML Generator and Parser Transformations
Created various UNIX shell scripts for Job automation of data loads on all phases of SDLC from requirement, design, and development and testing.
Worked with Informatica Designer, Repository Manager, Repository Server, Workflow Manager/Server Manager and Workflow Monitor
Reviewed the design and requirements documents with architects and business analysts to finalize the Azure design in mapplets for use in mappings thereby saving valuable design time and effort.
Created Logical objects in Informatica Developer tool and exported them to Informatica Power Center 9.1 and used them in PC mappings.
Interacted with clients to understand their requirements Scala and created DataStage architectural flows to help them move to the Azure.
Built common rules in analyst tools for analyst to use in mapping specifications and profiling on tables Pre/Post session to save the last generated numbers for SK’s.
Used Informatica workflow manager, monitor and log files to detect errors SQL Override in Sorter, Filter & in Source Qualifier Transformation.
Employed Normal Join, Full Outer Join, Detail Outer Join, and master Outer Join in the Joiner Transformation Extensively worked on various re-usable tasks, workflows, Worklets, mapplets, and re-usable transformations
Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and Hive.
Developed workflow sequences to control the execution sequence of various DataStage jobs and to email support personnel Worked on Slowly Changing Dimension Type2.
Used SSIS, NIFI, Python scripts, and Spark Applications for ETL Operations to create data flow pipelines and was involved in transforming data from legacy tables to Hive, HBase tables, and S3 buckets for handoff to business and Data scientists to create analytics over the data.
Responsible for writing Hive and Pig scripts as ETL tools to do transformations, event joins, filter both traffic and some pre-aggregations before storing into the HDFS. Developed the Vertica UDF's to pre-process the data for analysis.
Involved in unit testing and documenting the jobs and workflows. Set Standards for Azure DataStage Naming Conventions and Best Practices for Informatics Mapping Development
Used database objects like Sequence generators and Stored Procedures for accomplishing Scala Complex logical situations.
Environment: SQL Server 2017, Enterprise, SSRS, SSIS, Azure, Crystal Reports, Windows Enterprise, Python Server 2000, DTS, SQL Profiler, Snowflake and Query Analyzer
Client: DigiCollect, India Sep 2014 – Dec 2016
Role: Software Developer
Responsibilities:
Experienced in support and management of databases, systems, applications and Application Development in Implementation of Production Environment Hardware requirements and application support
Worked extensively on installation, upgrade and configuration of MS SQL Server 2017 and MS SQL Server 2016 databases/servers
Implemented of Point in time Backup and recovery plans of all databases and maintain documentation. Schedule the Administrative Tasks (like database integrity checks and maintenance plans).
Proficient in SQL Server and T-SQL (DDL and DML) in constructing Tables, Normalization/Demoralization Techniques on DataStage Tables.
Experienced in creating and using Stored Procedures, Triggers, Views, User Defined Functions, Sub-Queries and Joins for complex queries involving multiple tables and exception handlers.
Experienced in Error and Event Handling: Precedence Constraints, Break Points, Check Points, Logging, Archive data files from different legacy systems using on SQL Server 2016 environment, and deploy the data, Scheduled the jobs to execute these tasks periodically.
Extended exposure to creation of databases, objects, stored procedures, triggers, security, DTS (i.e. Import and Export), BCP, Transact-SQL, and SQL Server Agent and SQL Server profiler
Experienced on Database administration like creating the roles, DataStage users and giving the permissions for the database, tables, functions etc.
Experienced in planning and implementing the Replication of MS SQL Server 2016 Databases and databases Backup recovery strategy
Implemented disaster recovery for SQL Server Production DataStage as Fail-Over or Standby servers. Used Log Shipping Technique for warm backup solution for MS SQL Server databases
Very good experienced in the maintaining production servers, monitoring performance issues, troubleshooting and solving production issues
Environment: Advanced Server, SQL Server 2016, T-SQL, Enterprise Manager, Databases, DataStage, Management Studio, SQL Profiler, SQL Query Analyzer.
EDUCATION:
Master’s in Computer Science from Texas Tech University, USA 2021
Bachelor’s in Computer Science from CMR Engineering College, India 2014