Sirimalla Teja
Email: ****.*******@*****.***
Mobile: 770-***-****
LinkedIn: https://www.linkedin.com/in/ravit-teja/
Senior Data Engineer
PROFESSIONAL SUMMARY
Senior data engineer with 6 years of experience delivering production grade data platforms and analytics solutions across life sciences, banking, and insurance domains.
Specializes in building scalable batch and streaming pipelines with Python, SQL, and Spark on cloud native platforms including Azure, AWS, and GCP
Expert in modern lakehouse and warehouse architectures using Databricks, Snowflake, BigQuery, Azure Synapse, Delta Lake, and Fabric to support advanced analytics and BI
Strong background in orchestration and data modeling using Airflow, dbt, Azure Data Factory, SSIS, and leading ingestion tools such as Fivetran, Matillion, and NiFi.
Proven track record collaborating with product, analytics, and business stakeholders to improve data quality, shorten delivery cycles, and enable self service analytics in regulated environments.
Facilitated team meetings using excellent written oral communication skills, enhancing project clarity and collaboration.
Implemented innovative solutions with passion automation continual process improvement, boosting operational efficiency by 20%.
TECHNICAL SKILLS
Programming And Scripting - Python, SQL, PySpark, Scala, Shell, Perl
Cloud And Data Platforms - Azure, AWS, GCP, Azure Fabric, Azure Synapse, Databricks, Snowflake, BigQuery, Redshift, EMR
Data Engineering And Pipelines - Apache Spark, PySpark, Kafka, Kinesis, Airflow, Azure Data Factory, SSIS, Fivetran, Matillion, NiFi, dbt, Delta Lake, Lakehouse patterns, ETL, Informatica
Databases And Storage - Snowflake, PostgreSQL, MySQL, SQL based warehouses, S3, data lakes, Delta Lake Analytics and Business Intelligence - PowerBI, Tableau, Looker, Oracle, Oracle Exadata
Devops And Infrastructure - Git, GitHub, GitLab, Jenkins, CI and CD practices, Terraform, CloudFormation Data Management and Governance - Data quality frameworks, metadata management, Collibra, Alation
System Administration And Infrastructure - Linux, Unix PROFESSIONAL EXPERIENCE
Pfizer March 2024 – Present
Sr Data Engineer
Designed and implemented end to end clinical and commercial data pipelines using Azure, Databricks, PySpark, and Delta Lake which provided reliable curated datasets for regulatory reporting and medical affairs analytics
Built reusable data lakehouse layers in Snowflake and Azure Synapse using SQL and dbt which enabled analytics teams to access governed subject area models for real world evidence and patient outcomes analysis
Orchestrated complex ingestion and transformation workflows for trial operations data with Azure Data Factory and Airflow which reduced manual handoffs and improved refresh timeliness for portfolio monitoring dashboards
Implemented streaming data processing for device telemetry and pharmacovigilance feeds using Kafka, Spark Structured Streaming, and Delta Lake which accelerated detection of safety signals for pharmacovigilance teams
Established robust data quality checks and validation rules with Python and SQL embedded into Databricks jobs which improved confidence in clinical metrics consumed by statisticians and study leads
Partnered with data scientists and medical stakeholders to productionize machine learning ready feature pipelines on Databricks and Fabric which simplified deployment of models into downstream analytics and reporting tools such as PowerBI
Developed and optimized Shell and Perl scripts to automate data processing, reducing manual workload by 40% and increasing efficiency in data handling operations.
Engineered Oracle and Oracle Exadata solutions to enhance data warehousing capabilities, leading to a 25% improvement in query performance and data retrieval times.
Implemented Informatica ETL processes to streamline data flows, resulting in a 30% reduction in data load times and improved data accuracy.
Utilized Linux and Unix systems to configure and enhance backend processes, ensuring robust system performance and 99.9% uptime.
HSBC PLC Nov 2022 – Dec 2023
Senior Data Engineer
Engineered scalable financial data ingestion pipelines from core banking, payments, and risk systems using AWS S3, Glue, and EMR which created a unified data platform for regulatory and management reporting
Modeled warehouse structures for risk, liquidity, and compliance reporting using Snowflake and Redshift with SQL and dbt which provided consistent curated layers for finance and risk analytics teams
Developed near real time event driven ingestion flows for transaction and fraud monitoring data using Kafka, Kinesis, and Spark which improved alerting speed for fraud operations
Automated orchestration of complex ELT workflows using Airflow and AWS native services which reduced manual scheduling efforts and improved adherence to daily and intraday service level commitments
Introduced infrastructure as code practices for data workloads using Terraform and CloudFormation which streamlined environment provisioning and increased consistency across development, test, and production
Collaborated with BI and finance teams to publish trusted marts and views into PowerBI and Tableau from Snowflake which simplified self service reporting and reduced dependency on manual extracts.
Collaborated with Agile methodology teams to drive system/architecture improvements, leading to faster deployment cycles and increased project delivery speed by 20%.
Designed and maintained file systems, mount types, and permissions to ensure data integrity and security, reducing unauthorized access incidents by 15%.
Leveraged standard tools and pipes to optimize data flows, significantly improving data processing speed and reliability.
Mass Mutual May 2019 – Jun 2022
Data Engineer
Built foundational batch data pipelines for policy, claims, and customer data using GCP BigQuery, Dataflow, and Spark which created a central analytics repository for actuarial and marketing teams
Implemented ingestion from source applications and external providers using Fivetran, Matillion, and NiFi which reduced custom integration effort and standardized landing patterns for new datasets
Developed transformation logic and dimensional models in BigQuery using SQL and dbt which improved usability of datasets for pricing, retention, and customer value analysis
Created automated data quality checks and reconciliation routines in Python and SQL which reduced data defects observed by downstream reporting teams across actuarial and finance functions
Partnered with BI developers to design semantic models that surfaced curated metrics into Tableau and Looker which enabled business users to explore policy performance and customer behavior with minimal technical support
Supported migration of legacy SSIS based workloads into cloud native pipelines on GCP using Spark and orchestration tools which simplified operations and aligned data engineering practices with modern cloud standards.
Configured and enhanced load/extract processes within relational databases, achieving a 35% increase in data processing efficiency.
Demonstrated excellent written and oral communication skills by documenting toolsets, scripts, and processes, facilitating knowledge transfer and team collaboration.
Exhibited a passion for automation and continual process improvement by implementing innovative solutions, resulting in a 50% reduction in operational costs. CERTIFICATIONS
Azure Data Engineer Associate (DP-203)
Fabric Analytics Engineer Associate (DP-600)
Snowflake Snow Pro Core Certification (COF-C02 / 2N0-111). EDUCATION
Master's in Business Analytics - Trine University
Bachelor's in Engineering - JNTUH University