Gunda Sai Kiran phone :203-***-****
Sr. Data Engineer mail id: *****************@*****.***
LinkedIn: www.linkedin.com/in/sai-kiran-23b793355
Summary
Experience in spearheading senior-level data engineering initiatives with over 10 years of progressive expertise, driving innovative solutions across healthcare, telecom, manufacturing, retail, and banking domains.
Proficient in architecting scalable BI solutions on the Azure Data Platform by leveraging Azure Data Lake, Azure Data Factory, Data Lake Analytics, and Azure SQL Data Warehouse to optimize reporting and patient analytics in healthcare.
Hands-on expertise in designing and managing logical and physical data models with Erwin Data Modeler and advanced dimensional modeling (using Star Schema, Snowflake Schema, and Kimball Methodology) to ensure data accuracy and regulatory compliance.
Experience in orchestrating seamless data migrations from legacy systems to modern Snowflake environments while establishing robust data conversion strategies.
Proficient in developing comprehensive data models that correlate key performance indicators, driving actionable insights for healthcare administrators.
Hands-on expertise in crafting complex stored procedures, views, and triggers in SQL Server to significantly reduce query execution times for critical data analyses.
Experience in engineering Python-based RESTful APIs to enhance revenue tracking and patient billing analysis in dynamic environments.
Proficient in developing PySpark scripts to encrypt sensitive data using advanced hashing algorithms, ensuring data security and compliance.
Hands-on expertise in designing and implementing robust SSIS packages and interactive SQL Server Reporting Services reports to deliver deep clinical data insights.
Experience in authoring and optimizing Bash scripts for automating data pipeline triggers, thereby reducing manual interventions.
Proficient in configuring and optimizing AWS Data Pipeline and integrating AWS Glue to streamline data transfers from S3 to AWS Redshift efficiently.
Hands-on expertise in executing comprehensive ETL processes using T-SQL for extracting, transforming, and loading diverse clinical data sets.
Experience in integrating and transforming large volumes of data using Azure Data Factory, Spark SQL API, and U-SQL within Azure Data Lake Analytics.
Proficient in facilitating the migration of data from on-premise systems (Oracle, SQL Server, DB2, MongoDB) to modern cloud data platforms with minimal disruption.
Hands-on expertise in leveraging the Spark SQL API within Apache Spark to execute complex queries on vast clinical data sets, enhancing processing efficiency.
Experience in tuning and optimizing SQL queries through refined indexing and execution plan analysis, reducing runtime in high-volume settings.
Proficient in employing Hive and Pig for pre-processing large-scale datasets, improving data readiness for subsequent analytics.
Hands-on expertise in conducting rigorous ETL testing and data validation to ensure high-quality data loads in enterprise data warehouses.
Experience in developing interactive Tableau dashboards and Power BI reports that deliver real-time insights for operational decision-making.
Proficient in architecting robust cloud infrastructures on both Google Cloud Platform and AWS to support secure data management and telecom analytics.
Hands-on expertise in automating ETL workflows using Apache Airflow and designing Directed Acyclic Graphs (DAGs) to boost processing efficiency.
Experience in engineering end-to-end data pipelines and robust ingestion workflows using Cloud Dataflow, Apache Beam, and Google Cloud Functions in manufacturing environments.
Proficient in deploying scalable solutions for retail and banking sectors by leveraging Hadoop, Kafka, Docker, and Kubernetes to enhance data processing and deployment reliability.
Technical Skills
Programming Languages & Scripting: Python, SQL, PL/SQL, TSQL, Bash (including Bash Scripting), PySpark, USQL, Spark SQL API
Databases & Data Warehousing: Oracle (Oracle 12c), SQL Server 2019, DB2, MongoDB, AWS RDS, AWS Redshift, HBase, Hive, HDFS, Enterprise Data Warehouse (EDW), Azure SQL DW, Snowflake Enterprise Edition
Big Data Technologies & Processing Frameworks: Apache Spark, Hadoop, Big Data (concept), Sqoop, MapReduce, Apache Beam, Spark Streaming, Apache Airflow, Pig
ETL & Data Integration Tools: Informatica, SQL Loader, SSIS, ETL/ELT Processes, AWS Data Pipeline, AWS Glue, Oozie,GCP, DataPlex, Java, Python, RBAC, Big Query.
Data Modeling & Design: Erwin Data Modeler (Erwin), UML, Data Modeling Concepts (fact modeling, star schema, snowflake schema, SCD, dimensional modeling), Star Schema, Snowflake Schema, MB MDR, Kimball Methodology
Reporting, Analytics & BI Tools: Microsoft Power BI (including Power BI Desktop), Tableau, SQL Server Reporting Services (SSRS), Azure SQL Reporting Services, Azure Databricks, SAS, SharePoint
Cloud Platforms & Services:
Amazon Web Services (AWS):EC2, S3, AWS RDS, AWS Redshift, AWS Data Pipeline, AWS Glue
Google Cloud Platform (GCP): Cloud Dataflow, Cloud Functions, BigQuery, Google Pub/Sub, Cloud SQL, Google Cloud Environment, Google Cloud Shell, Google Cloud Storage (GCS)
Microsoft Azure: Azure Data Lake Gen2, Azure Data Factory, Azure Data Lake Analytics, Azure SQL DW, Azure SQL Reporting Services, Azure Databricks, HDInsight/Databricks.
Containerization & CI/CD Tools: Docker, Kubernetes, Git, Jenkins, GitHub, GitHub Actions
RealTime Data Streaming: Kafka
Operating Systems: UNIX
Methodologies & Project Management: Agile Scrum, Confluence, Jira Software, JAD Sessions
Machine Learning & AI: TensorFlow, GPU computing technologies on AWS and GCP
Additional Skills & Technologies: MDM (Master Data Management), Data Lake (platform/concept), API Integration, Web Scraping, RESTful APIs, Data Migration Strategies, Directed Acyclic Graphs (DAGs)
CompTIA + security certified
PROFESSIONAL EXPERIENCE
Sr. Data Engineer
HCA Healthcare 2023 Oct to till date
Responsibilities:
Architected and implemented scalable BI solutions on the Azure Data Platform for a large healthcare analytics project—leveraging Azure Data Lake Gen2, Azure Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, and HDInsight/Databricks—to optimize reporting and patient data analytics.
Orchestrated data conversion migration from legacy healthcare systems to a target Snowflake Enterprise Edition database, designing dimensional models using Star and Snowflake schemas under the Kimball methodology.
Designed and managed comprehensive logical and physical data models using ERWIN and MB MDR for healthcare datasets, enhancing metadata accuracy and ensuring HIPAA compliance.
Developed a comprehensive data model correlating key performance metrics in patient care and operational efficiency, enabling actionable insights for healthcare administrators.
Developed and maintained complex stored procedures, views, and triggers in SQL Server 2019, reducing query execution times by 15% for critical healthcare data analyses.
Engineered Python-based RESTful APIs to track revenue and analyze patient billing data, enhancing financial reporting precision for the healthcare client.
Developed PySpark scripts with Python to encrypt sensitive patient data using advanced hashing algorithms, ensuring compliance with stringent healthcare data security standards.
Designed and implemented SSIS packages and interactive SSRS reports in SQL Server 2019, delivering drill-down and parameterized dashboards for clinical data insights.
Authored and optimized Bash scripts to automate data stage job triggers in healthcare data pipelines, reducing manual intervention by 50%.
Configured and optimized AWS Data Pipeline and integrated AWS Glue to load patient records from S3 into AWS Redshift, improving data transfer efficiency by 25%.
Executed robust ETL processes using AWS Redshift and T-SQL to extract, transform, and load diverse clinical data sources, boosting data reliability and integrity.
Utilized AWS Data Pipeline connectors to integrate clinical data from Access, Excel, CSV, Oracle and flat files, streamlining multi-source ingestion.
Integrated and transformed healthcare data using Azure Data Factory, T-SQL, Spark SQL, and U-SQL in Azure Data Lake Analytics, enhancing data ingestion processes.
Facilitated migration of healthcare records from on-premises databases (Oracle, SQL Server) to Azure Data Lake Gen2 and ADLS via advanced Azure Data Factory V2 pipelines.
Leveraged the Spark SQL API in PySpark with Apache Spark to execute complex queries on clinical datasets, accelerating processing by 30%.
Tuned and optimized SQL queries by refining indexes and analyzing execution plans, reducing runtime by up to 40% in high-volume clinical environments.
Employed pre-processing techniques with Hive and Pig to prepare large-scale healthcare datasets for analytics, improving data readiness.
Conducted rigorous ETL testing—including job execution and data validation—to ensure high-quality loads in the healthcare data warehouse.
Created interactive Tableau dashboards and advanced Power BI Desktop reports for ad-hoc analysis, delivering real-time insights into patient care metrics.
Designed and maintained dynamic KPI calculators within SharePoint to monitor healthcare operational performance and clinical outcomes.
Compiled and validated cross-departmental healthcare data sets, presenting actionable insights to the Director of Operations for data-driven decision-making.
Collaborated with service developers to integrate content from existing healthcare reference models, ensuring seamless connectivity and robust data security.
Environment: AWS Data Pipeline, AWS Glue, S3, AWS Redshift, Azure Data Lake Gen2, Azure Data Factory V2, Azure Data Lake Analytics, Azure SQL DW, HDInsight/Databricks, Snowflake Enterprise Edition, Python 3.10, T-SQL, Bash, PySpark, U-SQL, Spark SQL API, Apache Spark 3.3, Hive, Pig, SQL Server 2019, Oracle, DB2, MongoDB, SSIS, ETL Processes, AWS Data Pipeline (also under Cloud Platforms), Tableau, Power BI Desktop, SSRS, SharePoint, ERWIN, MB MDR, Dimensional Modeling (Star Schema, Snowflake Schema), Kimball Methodology, RESTful APIs, Bash Scripting, Data Migration Strategies
Sr. Data Engineer
Lumen Technologies, Irvine TX 2021 June to 2023 Sep
Responsibilities:
Engineered robust cloud infrastructure on the Google Cloud Environment using Google Cloud Shell and Cloud SQL to support the secure management of telecom customer data and network analytics.
Managed real-time data services and movement infrastructures with Google Cloud Functions to ensure secure and reliable telecom data transfers.
Leveraged AWS EC2 and AWS S3 to deliver scalable storage and compute solutions critical for telecom data processing.
Architected ETL transformation layers and optimized Apache Spark jobs to enhance the processing speed of telecom network usage data by 25%.
Designed and implemented efficient ETL solutions and data models using fact dimensional modeling—including Star Schema and Snowflake Schema—to underpin telecom analytics and reporting.
Executed end-to-end ETL processes with Python and Apache Spark, ensuring accurate extraction, transformation, and loading of telecom operational data.
Implemented Apache Airflow for automated scheduling and monitoring of telecom data pipelines, streamlining workflow operations.
Developed multiple Directed Acyclic Graphs (DAGs) with Apache Airflow to automate ETL workflows, achieving a 30% increase in processing efficiency.
Automated scheduled data loading for telecom analytics using Python scripts, ensuring timely and accurate data updates.
Aggregated daily telecom data using Apache Spark clusters to generate executive-level reports for telecom operations teams.
Directed a seamless data migration to Google Cloud Platform (GCP), integrating BigQuery and Google Cloud FDA-
Led architecture and implementation of an AWS-based data lake for a healthcare SaaS platform used in clinical trials.
Designed secure ingestion pipelines (AWS Glue, S3, Lambda) for EHR, lab, and wearable data while ensuring HIPAA and 21 CFR Part 11 compliance.
Implemented audit-ready logging and data retention policies aligned with FDA validation standards.
Designed a multi-tenant Redshift-based analytics platform for pharma clients to analyze R&D and trial data.
Served as technical lead and customer liaison for platform onboarding and customization, coordinating across product, QA, and compliance teams.
Provided data lineage documentation, validated transformation logic, and supported customer audits.
Collaborated directly with client QA/Validation teams to support CSV documentation and inspection readiness.
Collaborated with QA/Validation teams to prepare documentation and system evidence for FDA audits.
Participated in mock audits, producing audit logs, risk assessments, and access control policies.
Ensured SOP alignment with data pipeline procedures and change control processes.
Environment: Python 3.9, Google Cloud Environment, Google Cloud Shell, Cloud SQL, Google Cloud Functions, Google Cloud Platform (GCP), AWS EC2, AWS S3, AWS, BigQuery, Google Cloud Storage (GCS), Apache Airflow 2.3, Directed Acyclic Graphs (DAGs), Apache Spark 3.1, Star Schema, Snowflake Schema, Fact dimensional modeling and slowly changing dimensions (SCD), TensorFlow 2.6, GPU computing technologies on AWS and GCP, Confluence, Jira Software, GitHub Actions
Sr. Data Engineer
Electrolux, India 2018 Dec to 2020 Dec
Responsibilities:
Architected and designed end-to-end data pipelines for a leading manufacturing firm, enabling real-time production monitoring and quality control using GCP and AWS.
Developed robust data ingestion workflows with Cloud Dataflow and Apache Beam to process real-time production, sensor, and supply chain data.
Designed comprehensive data models—including fact and dimensional modeling (Star, Snowflake, and SCD)—to underpin critical manufacturing analytics and process optimization.
Engineered complex SQL scripts and optimized PL/SQL procedures to validate data integrity across manufacturing datasets, ensuring 99.5% accuracy in production metrics.
Performed data cleansing and transformation using Python, improving the reliability of operational manufacturing data by 20%.
Built predictive models with Python and Scikit-learn to forecast equipment maintenance needs and identify production bottlenecks.
Developed custom scripts for web scraping and API integration to aggregate data from manufacturing execution systems and ERP platforms.
Leveraged Apache Spark and Spark Streaming for high-speed processing of large-scale sensor data, reducing latency by 30% in production analytics.
Engineered secure data migration solutions using AWS EC2 and AWS S3 to transfer critical production and quality control data between cloud environments.
Automated data ingestion processes via Google Cloud Functions with Python, cutting manual intervention by 40% in production monitoring.
Orchestrated ETL workflows by designing Directed Acyclic Graphs (DAGs) in Apache Airflow, streamlining manufacturing data processing by 35%.
Developed a continuous integration and delivery pipeline using Docker and GitHub to accelerate deployment of manufacturing analytics solutions.
Integrated near real-time data processing with Google Pub/Sub and BigQuery to ensure timely data availability for production line optimization.
Optimized database performance by creating materialized views and implementing advanced indexing strategies, reducing query response times by 30% in supply chain analytics.
Enhanced deployment reliability and scalability by integrating Kubernetes for container orchestration within the manufacturing data environment.
Environment: Python 3.8, SQL, PL/SQL, Apache Beam 2.25.0, Apache Spark 2.4, Spark Streaming 2.4, Apache Airflow 1.10, Hadoop, Hive, Google Cloud Platform (GCP), Cloud Dataflow, Cloud Functions, BigQuery, Google Pub/Sub, Amazon Web Services (AWS), EC2, S3, Docker 19.03, Kubernetes 1.15, GitHub, ETL/ELT Processes, Data Modeling (fact, star schema, snowflake schema, SCD), API Integration, Web Scraping
Spencer's, India 2016 Sep to 2018 Nov
Data Engineer
Responsibilities:
Architected and designed a retail sales data mart using a star schema in Erwin Data Modeler, enhancing sales analytics and reporting capabilities.
Engineered robust data pipelines with Python to efficiently extract, transform, and load retail data into a centralized Data Lake.
Implemented comprehensive ETL and ELT processes for seamless integration of diverse retail system data.
Extracted and ingested data from multiple retail databases into Hive using Sqoop import queries, ensuring high data accuracy.
Processed and transformed large volumes of structured and unstructured retail data using Hadoop and Big Data frameworks.
Ingested real-time retail transaction data into HDFS via Kafka, enabling timely analytics and decision making.
Developed and optimized MapReduce jobs in Python for effective retail data cleanup and transformation.
Built complex data processing jobs using Hive and Pig to support scalable retail analytics operations.
Bulk-loaded cleaned retail data into HBase after processing in HDFS, improving data retrieval performance.
Conducted ad-hoc analysis on the Azure Databricks platform to generate actionable insights for retail performance improvement.
Created interactive reports using Azure SQL Reporting Services 2017 and dashboards in Tableau to drive retail business decisions.
Translated business requirements into SAS code to develop predictive models supporting retail analytics.
Developed and optimized SQL scripts and Oracle PL/SQL routines for backend database operations in the retail project.
Managed and scheduled Hadoop jobs using the Oozie workflow scheduler to streamline retail data workflows.
Debugged production issues in SSIS packages and participated in JAD sessions to refine ETL processes within the retail environment.
Integrated MDM hubs with retail data warehouses to ensure data consistency across systems.
Leveraged Apache Spark for accelerated in-memory processing of retail data transformations.
Utilized Docker for containerizing and deploying retail data engineering workloads in isolated environments.
Environment: Python 3.6, Hadoop 3.0, Big Data 3.0, Apache Spark 2.3, Sqoop 1.4.7, MapReduce, ETL/ELT Processes, Hive 2.3, HBase 1.2, HDFS, Oracle PL/SQL (with Oracle 12c implied), SQL, Erwin Data Modeler 9.8, Star Schema / Snowflake Schema (concepts), Azure SQL Reporting Services 2017, Azure Databricks, Tableau 10, SAS 9.4, Oozie 4.3, SSIS, JAD Sessions (methodology), Kafka 1.1, Docker 17.09, Master Data Management (MDM), Data Lake (platform/concept)
HSBC, India 2014 June to 2016 Aug
Data Analyst/Engineer
Responsibilities:
Spearheaded the architecture and development of a unified Enterprise Data Warehouse (EDW) solution for a major banking client, leveraging Jenkins for deployment automation.
Designed and implemented dimensional data models using Erwin to support banking analytics and regulatory reporting.
Developed subject area data models aligned with strategic data architecture for banking operations.
Translated complex business requirements into standardized XML vocabularies by designing XML Schemas with UML to ensure seamless data exchange across banking systems.
Executed advanced data analysis and profiling using complex SQL queries on Oracle 12c databases, driving insights for risk management.
Authored robust SQL scripts for critical database modifications, optimizing banking data workflows under tight deadlines.
Automated static table repopulation in the data warehouse using advanced PL/SQL procedures and SQL Loader, improving data refresh rates for timely reporting.
Formulated and optimized complex Hive queries to support advanced analytics and visualization of key banking performance metrics.
Engineered and implemented HBase table structures to efficiently ingest large volumes of structured, semistructured, and unstructured financial data from UNIX and NoSQL systems.
Developed scalable data ingestion pipelines by loading data into Hive tables from HDFS, enabling realtime analytical access to banking transactions.
Created and refined Informatica mappings, mapplets, sessions, and workflows—integrating Apache Spark for enhanced real-time data processing.
Conducted a comprehensive ATS diagnosis and optimized data models on AWS Redshift and AWS RDS, utilizing Python for automation tasks.
Leveraged Agile Scrum methodologies and integrated Git version control to facilitate rapid development cycles in compliance with banking standards.
Managed the end-to-end SSIS lifecycle, including package creation, deployment, and execution across diverse environments.
Collaborated with the MDM systems team to enforce data governance policies and generate technical reports, incorporating Microsoft Power BI for enhanced visualization.
Developed interactive dashboards and reports using Tableau, empowering banking executives with realtime insights for decision-making.
Designed and deployed interactive reports using SQL Server Reporting Services (SSRS), incorporating advanced data slicing with Pivot tables to monitor key banking metrics.
Defined and implemented comprehensive data governance, transformation, and cleansing rules to ensure high data integrity and compliance with banking regulations.
Performed extensive data profiling on multiple financial data sources to address critical business inquiries and support strategic decision-making.
Environment: SQL, PL/SQL, Python 3.5, Oracle 12c, AWS Redshift, AWS RDS, HBase, Hive, HDFS, Enterprise Data Warehouse (EDW), Erwin 9.7, UML, Informatica, SQL Loader, SSIS, Apache Spark 1.6, Git 2.0, Jenkins 2.0, Microsoft Power BI, Tableau 9.0, SQL Server Reporting Services (SSRS), UNIX, Agile Scrum, MDM (Master Data Management).