Siva Ramakrishna
Lead Data Engineer Azure AWS Python
Email Id: ***************@*****.*** Contact number: +1-571-***-**** PROFESSIONAL SUMMARY:
• Experienced Data and Analytics Engineering Leader with a proven track record in designing and developing robust data solutions that drive business growth and enable informed decision-making.
• With over 15+ years of IT experience, specialized in designing and implementing scalable data architectures with AWS, Azure, and GCP services, including S3, Redshift, Azure Data Factory, Cosmos DB, and GCP Dataflow.
• Expertise includes developing efficient ETL/ELT pipelines, managing data warehouses and lakes, ensuring compliance with data governance and security standards, and leveraging Terraform and CloudFormation for infrastructure deployment.
• Successfully led and mentored teams of 15+ members, driving performance and delivering client-focused results while driving successful data-driven insights and solutions.
• Expertise in planning and executing projects with budgets of up to $5M while meeting quality and timeline goals.
• Proficient in managing relationships with 10+ stakeholders, resolving conflicts, and driving alignment.
• Implemented data integration solutions with AWS Glue, Lambda, Azure Data Factory, Functions, and GCP Dataflow/Dataproc, reducing processing time by 30%.
• Enforced data governance and security best practices, ensuring compliance with internal policies and industry standards
• Optimized cloud resources, achieving 20% reduction in costs while enhancing performance and scalability across AWS, Azure, and GCP platforms.
• Managed data quality, leveraging data quality tools and ETL/ELT frameworks to ensure integrity and consistency across 100+ data sets.
• Proficient in SQL, Python, Java, Scala, with experience in Hadoop, Spark, and Kafka, processing over 1PB of data annually.
• Mentored and collaborated with 15+ cross-functional teams, Troubleshot, monitored, and optimized 30+ data pipelines and workflows, improving system reliability and performance by 40%.
TECHNICAL SKILLS
Cloud Platforms: AWS (Amazon Web Services), Azure, Google Cloud Platform (GCP) Data Storage Solutions: AWS S3, Redshift, RDS, DynamoDB, Azure Data Lake Storage, Azure Cosmos DB, Azure SQL DB, GCP Cloud Storage
ETL/ELT Tools: AWS Glue, AWS Lambda, Azure Data Factory, Azure Functions, GCP Dataproc, Dataflow
Data Integration Solutions: AWS Glue, Azure Data Factory, GCP Dataflow, GCP Functions Data Governance & Security: IAM (Identity and Access Management) roles and policies, Data security best practices
Infrastructure as Code (IaC): AWS CloudFormation, Azure Resource Manager, Terraform Data Quality Tools: DQ tools, ETL/ELT frameworks for maintaining data quality rules Data Architecture & Modeling: Data flow diagrams, Data models, Data architecture guidelines Programming Languages: Python, Java, Scala
Big Data Technologies: Hadoop, Spark, Kafka
Work Experience:
Client: Optum (UHC) Till Date
Role: Principal Data Engineer
Project: Data@Scale
Description:
The Claims Data@Scale project leverages AWS technologies to standardize and streamline the intake, storage, transformation, and validation of health insurance claims data across the organization. Claims data, including patient information, coverage details, and claim amounts, is ingested into Amazon S3 as the primary storage layer, with metadata management handled by AWS Glue Data Catalog.
AWS Glue facilitates scalable ETL jobs to process and transform claims data, supporting both batch and real-time data transformations.
Real-time claims validation and data processing are managed through AWS Lambda, while row-level data checks are performed using AWS Glue and Lambda, ensuring compliance with healthcare regulations and reducing claim errors.
AWS Step Functions orchestrate workflows for claim adjudication, and the transformed data is either stored in S3 or loaded into Amazon Redshift for in-depth analytics and reporting. Security and governance are ensured through IAM roles, KMS encryption, HIPAA compliance, and AWS CloudTrail for auditing, providing full control over sensitive health insurance claims data. Responsibilities:
Currently working as a Principal Data Engineer for Optum to handle Claims data across Commercial Data Engineering (CDE) Team
Led the design and implementation of data architecture strategies for healthcare claim processing, ensuring alignment with business requirements for intake, adjudication, and payment workflows.
Developed and documented 20+ healthcare data models, data flow diagrams, and architecture guidelines for seamless integration and efficient lifecycle management of medical reimbursements.
Ensured healthcare data architecture adhered to governance and security policies, reducing non- compliance by 25% while safeguarding sensitive patient and billing information.
Optimized cloud resources for healthcare data processing and storage, achieving a 30% cost reduction while maintaining high performance and scalability in reimbursement workflows.
Mentored 15+ team members, guiding them in the development of data pipelines and reporting tools to improve medical reimbursement and fraud detection processes.
Communicated effectively with stakeholders, aligning technical solutions with business needs and operational goals in medical billing and healthcare compliance.
Provided coaching to 10+ team members, boosting reimbursement processing efficiency by 15%, improving overall resolution times for medical claims. Technologies and Tools: AWS (S3, Redshift, RDS, DynamoDB), Azure Data Factory, Azure Data Lake, Cosmos DB, Snowflake, Databricks, Terraform, Python, Java, SQL, Spark, Hadoop, Kafka, Jenkins, GitHub, Power BI, Azure Synapse, Event Hub, Docker, Kubernetes, IAM, and DevOps services (AWS, Azure).
Client: PWC Limited, Tampa, FL Nov 2021 - Dec 2023 Role: Lead Data Engineer
Project: Information Technology General Controls (ITGC) Automations Description:
The ITGC (Information Technology General Controls) Automation presents an invaluable opportunity for the Trust Solutions business for forecasting potential control failures within the organization's internal controls framework. By leveraging historical data and advanced analytical techniques, the objective is to proactively identify areas of risk and optimize internal controls testing procedures. This project aims to enhance risk management practices, streamline audit processes, and ensure compliance with regulatory requirements.
Responsibilities:
Worked as a Technical Architect for IT General Controls (ITGC) under Assurance, Digital Assurance and Transparency (DAT) Team
Designed and implemented data architecture strategies using Azure tools, aligning with business needs and ensuring compliance with data governance and security standards for over 10 projects.
Developed and documented over 15 data models, flow diagrams, and architecture guidelines to ensure scalability, consistency, and effective communication of technical solutions across teams.
Built, optimized, and maintained more than 20 ETL/ELT pipelines using Azure Data Factory, Databricks, and SQL DB, and implemented data integration solutions via Azure Functions and Logic Apps.
Ensured data quality by creating and maintaining 30+ validation rules, utilizing Azure Data Quality services, Azure Purview for governance, and managing 100+ IAM roles and security policies with Azure Active Directory.
Developed scalable data storage solutions using Azure Data Lake Storage and Cosmos DB, optimizing cost, performance, and scalability for over 50 TB of data, while troubleshooting and optimizing 10+ data pipelines.
Managed multiple tasks, utilizing problem-solving skills to analyze and communicate technical solutions effectively to clients and management.
Mentored junior team members, providing feedback and guidance to ensure timely delivery of high-quality data solutions.
Recommended innovative tools such as Azure AI and Power BI to improve data visualization and support better decision-making for business stakeholders. Technologies and Tools: Azure Data Factory, Azure Data Lake Storage, Azure Databricks, Azure Data Privacy, Azure DevOps, Power BI, Azure Cosmos DB, Azure Event Hub, Azure Synapse Analytics, Data Modeling, Data Analytics,Python, Pyspark,, SQL, GitHub, Confluence. Client: HPE Limited, Bengaluru, India Dec 2017 – Nov 2021 Role: Lead Data Engineer
Project 1: CRISP – Contract Review Integrated system for PRS. Description:
Contract Audits focusses on review of supplier contracts to identify non- compliances with agreed terms and conditions. It also entails contact with stakeholders ranging from Procurement, Business and Suppliers to get their concurrence on audit findings. The benefits of this exercise range from recoup of dollars, process fixing and in enhancing control environment. The early benefits have been in the range of 60-70 % reduction in time spent on data consolidation. This will give a massive scalability edge to address 10 X increase in contract audits from the current base of 200 contracts per year. It is projected to result in potential windfall gains in recovery dollars. Besides, giving the Analysts more time in communicating and collaborating with the stakeholders rather than worrying about data consolidation issues. There could be potential latent use of this application in the Legal & Compliance Industry.
Responsibilities:
Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
Collaborate with data architect / engineers to establish data governance / cataloging for MDM/security (key vault, network security schema level & row level), resource groups, integration runtime setting, integration patterns, aggregated functions for data bricks development.
Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
Migrate Events system from Cosmos to Azure Databricks and ADLS Gen2. Design and orchestrate the Migration
Design and Development of Scala code/PySpark Jobs for Data Ingestion into ADLS
Provided Azure technical expertise including strategic design and architectural mentorship, assessments, data modeling etc., in support of the inventory supply chain project.
Automated reports to compare the data from Teradata and ADLS. Technologies and Tools : Python programming, Azure Data Lake Store, ADLS Gen2, Scala, Azure Databricks, Logic Apps, Web Jobs, SQL Server, Azure Data Factory, Blob Storage, Rest API calls, Azure Log Analytics, CI/CD using VSO, C#., Azure storage, Cosmos DB, PySpark, SQL, Django Framework. Client: Mindtree Limited, London, UK Dec 2010 - Dec 2017 Role: Senior Data Engineer
Project: ABG – Car rental/reservations - Data Analytics Description:
This Project (proof of concept) aims to do ETL, Data Aggregation, and Data Analysis on top of car bookings across ABG, Enterprise and Hertz. ABG is one of the top car rental companies in the UK. The data set contains millions of records about car bookings and reservations. This entire data is in Oracle System, we migrated the data from Oracle to Hadoop using Sqoop. We wrote several MapReduce/Pig programs to do ETL and Link Analysis. This analysis includes trends in bookings of different segments, where ABG needs to change their pricing system when compared with their competitors, We used Hive/HBase to do adhoc querying, reporting and real time querying. Responsibilities:
Involved in several client meetings to understand the problem statements and other requirements.
Data Migration from Source Systems to Hadoop.
Wrote MapReduce/Pig programs for ETL and Link Analysis of data.
In-depth knowledge of PySpark and experience in building Spark applications using Python.
Created Hive/HBase table schemas, data loading, report generation.
Optimized pipeline implementation using Databricks workspace configuration, cluster, and notebook optimization.
Analysis and Design for the Migration, CI/CD, Engineering items
Delivering the reports and getting feedback from customer to enhance the Analysis.
Worked with Python, Databases (SQL, NoSQL), Big Data Technologies (Hadoop, HBase, Hadoop Ecosystem)
Technologies and Tools: SQL, Apache Hadoop, HBase, Pig, Hive, Sqoop, Spark. Client: Infosys Limited, Bengaluru, India Feb 2006 - Dec 2010 Role: Senior Software Engineer
Project: Play.com
Description:
Play.com is the UK’s favorite online retailer of DVDs, CDs, books, gadgets, video games, mp3 downloads, and other electronic products.PLAY.COM differentiates itself from competitors by focusing mainly on entertainment and pioneering free delivery on the internet. Drop Ship Scheduler: The main purpose of the Drop Ship Scheduler is to facilitate the drop shipping of customer orders to 3rd party drop ship suppliers. This is done by converting orders placed via the web site into structured messages that can be transmitted and understood by the supplier. Responsibilities:
Worked as a senior software Engineer in .Net technologies.
Coding of the modules in ASP.Net. Creating Stored Procedures/Functions/Views/Triggers in SQL server
Estimation and Requirement analysis for the enhancement requests
Low level design of the new enhancements without impacting existing functionality. Fixing the existing defects, support to testing/DBA and onsite team.
Lead the team towards the project delivery and resolve/timely escalated the technical issues encountered by them so that delivery is on-time
Implemented framework components for handling cross cutting concerns like logging and exception handling
Analyzing and validating user requirements. Identify and map Active Directory groups with custom SharePoint groups.
Technologies and Tools: ASP.Net 4.0, MVC 4.0, Visual Studio 2010, SQL Server 2008 R2, TFS 2010, HTML5, Javascript and XML.