Sai Manoj Alluri
512-***-**** # **************@*****.*** ï linkedin.com/in/saialluri § github.com/saimanoj45 Professional Summary
Data Engineer with 6 years of experience delivering robust, cloud-native data platforms across finance, healthcare, and insurance domains. Specialized in architecting end-to-end ETL pipelines and real-time data processing using Apache Airflow, Kafka, and Spark. Proficient in modern data warehousing with Snowflake and Redshift, and in building analytics solutions using Power BI, Tableau, and Looker. Skilled in Python, SQL, and Scala, with hands-on experience across AWS, Azure, and GCP cloud platforms. Experienced in Agile methodologies, sprint planning, and using tools such as JIRA to streamline development processes and enhance business outcomes. Experience
Citibank May 2024 – Present
Azure Data Engineer Dallas, TX, USA
• Designed esilient ETL pipelines using Apache Airflow, Python, and Azure Data Factory to ingest securities and trading data from Oracle and PostgreSQL, adding logging, retries, and failover to ensure high reliability.
• Implemented real time data processing by setting up Spark Structured Streaming with Azure Databricks, ingesting financial data from Kafka and reducing latency by 30%.
• Engineered SCD Type 2 logic in PySpark on Azure Databricks to track historical changes and migrated data from SQL to PostgreSQL and MongoDB, improving compliance, retrieval efficiency, and availability for banking applications.
• Validated data integrity across 15+ ETL pipelines using Airflow and Azure Data Factory which improved data accuracy and reduced downstream data errors using unit tests.
• Optimized data consistency and historical tracking in trading pipelines by leveraging Delta Lake on Azure Databricks to enable ACID-compliant transactions, schema evolution, and time travel features.
• Established data governance practices by integrating data lineage, audit logging, and role-based access controls across Azure Synapse and Databricks, ensuring regulatory compliance, data integrity, and secure access for trading and compliance teams.
• Deployed ML models in Azure Databricks to detect trading anomalies and built Power BI dashboards using Azure SQL and PostgreSQL to visualize compliance metrics, enabling real-time fraud detection and data-driven decision-making.
• Used Python and Azure Monitor for data quality checks and anomaly detection in ETL pipelines, triggering alerts for schema drift, null spikes, and volume anomalies, which improved data reliability and reduced quality issues.
• Automated CI/CD workflows using Git, Azure DevOps and Bash to streamline ETL deployments and Azure environment management, improving release consistency and boosting pipeline reliability.
• Integrated Unity Catalog with Azure Databricks to centralize data governance, enforce access controls, and simplify auditing across structured and unstructured trading datasets, improving compliance and cross-team collaboration. Baylor Scott and White Health Jun 2023 – Apr 2024
AWS Data Engineer Dallas, TX, USA
• Developed robust ETL pipelines using AWS Glue, Lambda, and Python to ingest healthcare data from HL7 feeds and RESTful APIs, ensuring accurate mapping to Snowflake and HIPAA-compliant EMR processing.
• Leveraged Apache Flink to analyze clinical telemetry streams, reducing analytics latency and improving response time for patient monitoring insights by 40%.
• Built data transformation workflows using DBT and Spark on AWS EMR to process hospital and telemetry data, reducing ETL runtime and enabling efficient analytics in Redshift.
• Architected a Snowflake-based data warehousing solution to centralize clinical and telemetry data, enabling secure, high performance analytics for patient outcomes, operational metrics, and regulatory compliance in healthcare.
• Managed metadata and schema definitions using AWS Glue Data Catalog, improving data discoverability and enabling seamless integration across healthcare ETL workflows.
• Automated near real-time data ingestion into Snowflake via Snowpipe, supporting continuous loading from staging areas and timely healthcare analytics.
• Configured AWS CloudWatch with custom Python alerts to monitor Glue and Lambda workflows, ensuring issue detection and reliable ETL operations.
• Devised MLOps workflows on AWS SageMaker to train, deploy, and monitor models analyzing patient records and hospital operations data, enabling scalable predictions and improving healthcare decision-making efficiency.
• Delivered Tableau dashboards to visualize patient outcomes and hospital metrics, supporting real-time monitoring and data-driven decision-making across clinical teams.
• Created CI/CD pipelines using Jenkins, Docker, and Kubernetes, and provisioned AWS infrastructure with Terraform for services like S3, Glue, RDS, and IAM to enable consistent, automated deployments. American International Group Nov 2020 – Dec 2022
Data Engineer Hyderabad, India
• Executed batch ETL workflows using Talend on Azure to transform underwriting and claims data, supporting seamless integration across insurance systems.
• Orchestrated real-time streaming workflows with Apache Kafka on Azure to capture underwriting, claims, and policyholder transactions, enabling scalable, event-driven data processing.
• Developed SSIS packages to integrate financial risk and claims data from Oracle and SQL Server into a centralized SQL based data mart for regulatory reporting.
• Authored Python scripts for data profiling and validation, reducing reporting errors and enhancing audit readiness across financial reporting workflows.
• Optimized Azure Databricks jobs to process actuarial data with large joins and window functions, improving model preparation speed and pipeline stability.
• Integrated Snowflake to support large scale analytics and reporting for insurance data, improving query performance and scalability.
• Delivered self-service Looker dashboards for actuaries and underwriters to track claims, premium trends, and risk scores, improving decision-making and reducing dependency on data teams.
• Introduced Datadog monitoring to track performance and health of data pipelines and infrastructure, enabling real-time alerting, root cause analysis, and issue resolution.
• Collaborated with data science and machine learning teams in Agile sprints to deliver reliable data pipelines and feature-ready datasets for underwriting risk models and claims prediction. Glenmark Pharmaceutical Dec 2018 – Oct 2020
Jr Data Engineer Mumbai, India
• Constructed batch processing workflows using Informatica to ingest and transform large volumes of pharmaceutical data from multiple source systems, ensuring data accuracy and timely delivery.
• Used Hadoop HDFS to manage large pharmaceutical datasets, enhancing data access and storage efficiency.
• Assisted in configuring incremental load strategies in Informatica, ensuring efficient processing of updated records and minimizing resource usage during daily batch runs.
• Ran simple SQL and Spark SQL queries to verify data and supported QA teams during UAT for new data pipelines.
• Utilized Hive for querying and summarizing large datasets in HDFS, improving query efficiency and enabling timely insights for pharmaceutical operations.
Technical Skills
Programming Languages: Python, SQL, Scala.
Big Data Technologies: Hadoop, Spark, Hive, HBase, Kafka,Delta Lake. Cloud Platforms: AWS(S3, EMR, EKS, Glue, Lambda, Redshift, Athena, RDS, SageMaker), Azure(Data Factory, Databricks, Synapse, Blob Storage, Cosmos DB, Microsoft Fabric, Azure SQL, Azure Monitor, Unity Catalog), GCP. Databases: MySQL, PostgreSQL, Microsoft SQL Server, MongoDB, Cassandra, DynamoDB,Oracle. ETL & Data Processing Tools: Apache Airflow, Talend, Informatica, DBT (Data Build Tool). Data Warehousing & Analytics Platforms: Snowflake, Redshift, Databricks. Business Intelligence & Analytics: Power BI, Tableau, Looker, MSBI(SSIS, SSAS, SSRS). DevOps & CI/CD: Docker, Kubernetes, Jenkins, Git, Terraform. Certifications
• Microsoft Certified: Fabric Data Engineer Associate.
• AWS Certified Data Engineer Associate.
Education
Masters in Computer and Information Science, Sacred Heart University — Fairfield, CT Jan 2023 – Mar 2024 Bachelors in Computer Science and Engineering, GITAM University — Visakhapatnam, India Jun 2016 – Apr 2020