Sirimalla Teja
Email: ****.*******@*****.***
Mobile: 770-***-****
LinkedIn: https://www.linkedin.com/in/ravit-teja/
Senior Data Engineer
PROFESSIONAL SUMMARY
Senior data engineer with 6 years of experience delivering production grade data platforms and analytics solutions across life sciences, banking, and insurance domains.
Specializes in building scalable batch and streaming pipelines with Python, SQL, and Spark on cloud native platforms including Azure, AWS, and GCP
Expert in modern lakehouse and warehouse architectures using Databricks, Snowflake, BigQuery, Azure Synapse, Delta Lake, and Fabric to support advanced analytics and BI
Strong background in orchestration and data modeling using Airflow, dbt, Azure Data Factory, SSIS, and leading ingestion tools such as Fivetran, Matillion, and NiFi.
Proven track record collaborating with product, analytics, and business stakeholders to improve data quality, shorten delivery cycles, and enable self service analytics in regulated environments.
Facilitated team meetings using excellent written oral communication skills, enhancing collaboration and project outcomes.
Implemented passion automation to streamline workflows, resulting in a 20% increase in productivity.
Led continual process improvement initiatives, reducing operational costs by 15% and boosting efficiency. TECHNICAL SKILLS
Programming And Scripting - Python, SQL, PySpark, Scala, Shell, Perl
Cloud And Data Platforms - Azure, AWS, GCP, Azure Fabric, Azure Synapse, Databricks, Snowflake, BigQuery, Redshift, EMR
Data Engineering And Pipelines - Apache Spark, PySpark, Kafka, Kinesis, Airflow, Azure Data Factory, SSIS, Fivetran, Matillion, NiFi, dbt, Delta Lake, Lakehouse patterns, ETL, Informatica
Databases And Storage - Snowflake, PostgreSQL, MySQL, SQL based warehouses, S3, data lakes, Delta Lake Analytics and Business Intelligence - PowerBI, Tableau, Looker, Oracle, Oracle Exadata
Devops And Infrastructure - Git, GitHub, GitLab, Jenkins, CI and CD practices, Terraform, CloudFormation Data Management and Governance - Data quality frameworks, metadata management, Collibra, Alation
System Administration And Infrastructure - Linux-based processes, Unix file systems PROFESSIONAL EXPERIENCE
Pfizer March 2024 – Present
Sr Data Engineer
Designed and implemented end to end clinical and commercial data pipelines using Azure, Databricks, PySpark, and Delta Lake which provided reliable curated datasets for regulatory reporting and medical affairs analytics
Built reusable data lakehouse layers in Snowflake and Azure Synapse using SQL and dbt which enabled analytics teams to access governed subject area models for real world evidence and patient outcomes analysis
Orchestrated complex ingestion and transformation workflows for trial operations data with Azure Data Factory and Airflow which reduced manual handoffs and improved refresh timeliness for portfolio monitoring dashboards
Implemented streaming data processing for device telemetry and pharmacovigilance feeds using Kafka, Spark Structured Streaming, and Delta Lake which accelerated detection of safety signals for pharmacovigilance teams
Established robust data quality checks and validation rules with Python and SQL embedded into Databricks jobs which improved confidence in clinical metrics consumed by statisticians and study leads
Partnered with data scientists and medical stakeholders to productionize machine learning ready feature pipelines on Databricks and Fabric which simplified deployment of models into downstream analytics and reporting tools such as PowerBI
Developed and optimized Shell and Perl scripts, significantly reducing data processing time by 40% and enhancing system efficiency.
Managed Oracle and Oracle Exadata databases, improving data retrieval speeds by 25% and ensuring high availability for critical business applications.
Designed and implemented Data Warehousing and ETL solutions using Informatica, streamlining data integration processes and increasing data accuracy by 30%.
Utilized Linux-based processes and Unix file systems to enhance system reliability and performance, achieving a 99.9% uptime for critical applications.
HSBC PLC Nov 2022 – Dec 2023
Senior Data Engineer
Engineered scalable financial data ingestion pipelines from core banking, payments, and risk systems using AWS S3, Glue, and EMR which created a unified data platform for regulatory and management reporting
Modeled warehouse structures for risk, liquidity, and compliance reporting using Snowflake and Redshift with SQL and dbt which provided consistent curated layers for finance and risk analytics teams
Developed near real time event driven ingestion flows for transaction and fraud monitoring data using Kafka, Kinesis, and Spark which improved alerting speed for fraud operations
Automated orchestration of complex ELT workflows using Airflow and AWS native services which reduced manual scheduling efforts and improved adherence to daily and intraday service level commitments
Introduced infrastructure as code practices for data workloads using Terraform and CloudFormation which streamlined environment provisioning and increased consistency across development, test, and production
Collaborated with BI and finance teams to publish trusted marts and views into PowerBI and Tableau from Snowflake which simplified self service reporting and reduced dependency on manual extracts.
Led Agile methodology projects with a backend focus, resulting in a 20% reduction in development cycle time and improved team collaboration.
Conducted system/architecture improvements and toolsets enhancements, boosting system performance by 35% and reducing operational costs.
Automated ETL/database load/extract processes, increasing data handling efficiency by 50% and reducing manual intervention.
Optimized relational databases and data warehouses, enhancing data flows and improving query performance by 40%.
Mass Mutual May 2019 – Jun 2022
Data Engineer
Built foundational batch data pipelines for policy, claims, and customer data using GCP BigQuery, Dataflow, and Spark which created a central analytics repository for actuarial and marketing teams
Implemented ingestion from source applications and external providers using Fivetran, Matillion, and NiFi which reduced custom integration effort and standardized landing patterns for new datasets
Developed transformation logic and dimensional models in BigQuery using SQL and dbt which improved usability of datasets for pricing, retention, and customer value analysis
Created automated data quality checks and reconciliation routines in Python and SQL which reduced data defects observed by downstream reporting teams across actuarial and finance functions
Partnered with BI developers to design semantic models that surfaced curated metrics into Tableau and Looker which enabled business users to explore policy performance and customer behavior with minimal technical support
Supported migration of legacy SSIS based workloads into cloud native pipelines on GCP using Spark and orchestration tools which simplified operations and aligned data engineering practices with modern cloud standards.
Configured mount types and permissions, ensuring secure and efficient access to data, which reduced unauthorized access incidents by 25%.
Implemented standard tools and pipes to streamline data processing workflows, resulting in a 30% increase in data throughput.
Demonstrated excellent written oral communication skills, facilitating cross-functional collaboration and improving project delivery timelines.
Showcased passion for automation and continual process improvement, leading to a 20% increase in operational efficiency.
CERTIFICATIONS
Azure Data Engineer Associate (DP-203)
Fabric Analytics Engineer Associate (DP-600)
Snowflake Snow Pro Core Certification (COF-C02 / 2N0-111). EDUCATION
Master's in Business Analytics - Trine University
Bachelor's in Engineering - JNTUH University