Sairam Kollu
• Sunnyvale, CA, United States • ************@*****.*** • +1-913-***-****• LinkedIn • Open to Relocate
“Data Engineer with over 5+ years of experience building and designing complex Big Data platforms and applications. My background includes hands-on work in Cloud Data Engineering, Data Warehousing, ETL/ELT processes, Reporting, and BI” SKILLS
Programming Languages: Python (PySpark, Pandas, NumPy,Matplotlib,Seaborn), SQL (T-SQL, PL/SQL, Spark SQL), Scala,Java Big Data Technologies: Spark, Airflow, Kafka, HDFS, Hive, Presto, Hadoop, YARN, MapReduce, HBase, Sqoop ETL & Visualization: Airflow, Azure Data Factory, Dagster, DBT, Databricks, Fivetran, Power BI, Tableau, Looker, Athena Databases & Storage: Snowflake, PostgreSQL, Redshift, ADLS, Azure Synapse, AWSS3, MySQL, MSSQL, MongoDB (NoSQL) Other Tools: Git, Agile, JIRA, Docker, VS code, Kubernetes, CI/CD Pipelines, MLflow EXPERIENCE
Data Engineer May 2024 – Present
T-Mobile
• Improved reporting speed by 35% by building ELT pipelines for 10+ sources using Medallion architecture and Data Mesh principles, enabling Azure-based Data Lakehouse migration and self-serve analytics.
• Enabled seamless downstream reporting by orchestrating daily incremental batch pipelines from a centralized data lake to ADLS using Azure Data Factory, transforming data with DLT in Databricks, and loading into Snowflake.
• Reduced system dependency and enhanced marketing segmentation by designing a scalable ADLS data system that integrated 4+ external sources and enriched business logic for targeted strategies.
• Reduced data discrepancies by 15% by deploying 5+ live Power BI dashboards powered by curated KPIs from the Delta Lakehouse, ensuring accurate and actionable business insights.
• Achieved 25%+ reduction in failed app installs by analyzing device-level performance in Snowflake, identifying vendor inefficiencies, and recommending upgraded contract modifications. Data Engineer Jul 2023– Apr 2024
Fifth Third Bank
• Executed the migration of over 10TB of FP&A data from Teradata to Microsoft Azure, orchestrating workflows with Azure Data Factory and performing transformations in Azure Databricks using Delta Lake’s Medallion architecture (Bronze, Silver, Gold). Delivered curated datasets through an Azure Synapse layer for consumption by analytics and finance teams.
• Converted legacy Teradata stored procedures into efficient PySpark-based transformations within Azure Databricks notebooks, achieving a 65% boost in processing performance and improving pipeline maintainability.
• Refined Databricks compute strategy to reduce costs by 35% by implementing autoscaling, leveraging spot instances, and aligning job schedules with the financial close calendar, improving resource efficiency without impacting SLAs.
• Integrated Azure Logic Apps with Azure Data Factory to automate real-time alerts for pipeline failures, data quality issues, and SLA breaches, reducing response time by 75%, and delivered insights via Power BI dashboards to enhance visibility across teams. Data Engineer Jun 2020 –Dec 2022
ICICI Prudential Life Insurance
• Enabled scalable, cross-departmental data access by spearheading the development of a centralized data lake on GCP, integrating BigQuery, Cloud Storage, and Cloud Composer for teams across Marketing, Finance, and Operations.
• Optimized large-scale data performance by engineering and maintaining dynamic ETL pipelines using SSIS, ODI, and Teradata, ensuring timely and accurate delivery for department-specific analytics.
• Improved customer retention strategies by performing topic modeling on 6,000+ feedback records, identifying sentiment trends that guided the creation of 3+ targeted engagement initiatives.
• Implemented monitoring and alerts in Cloud Composer using Stack driver, reducing pipeline downtime by 40% . Data Engineer Jun 2019 – May 2020
Hetero Healthcare
• Designed and deployed 25+ scalable ETL pipelines using AWS Glue, integrating data from MongoDB, MySQL, WooCommerce, and APIs into MS SQL Server and Amazon Redshift, enabling real-time analytics for pharma operations across 100+ data tables.
• Reduced data ingestion latency by 25% through real-time data pipelines built with Apache Kafka and Amazon Kinesis, enabling parallel processing and accelerating data availability for analytics and operations.
• Implemented incremental load logic and event-based orchestration using modular, Lambda-style triggers, reducing daily pipeline runtime by over 60% and improving system throughput without downtime.
• Revamped internal Tableau dashboards by optimizing SQL transformations in Redshift, reducing data redundancy by 38%, improving refresh speeds by 24%, and contributing to cost optimization through more efficient resource utilization. EDUCATION
University Of Central Missouri, Master of Science in Computer Science Warrensburg, MO CERTIFICATIONS
Azure AI Data Fundamentals, Microsoft