Data Engineer Real-Time

Location:

Tampa, FL

Posted:

July 08, 2025

Contact this candidate

Resume:

SAI RAMYA ANNAPUREDDY

DATA ENGINEER

Location: FL, USA *******************@*****.*** Phone: +1-941-***-**** LinkedIn SUMMARY

Data Engineer with around 4 years of experience in designing and optimizing ETL pipelines, building data models, and implementing automation.

Expertise in processing large-scale datasets leveraging (over 5 terabytes) Hadoop and integrating structured and unstructured data into cloud-based data lakes and warehouses on AWS and Azure platforms.

Knowledge in data versioning and lineage using tools like Apache Atlas and AWS Glue Catalog, ensuring traceability and governance of data assets.

Incorporated real-time data processing and stream analytics with Apache Flink, enabling low-latency data ingestion and real-time analytics for operational decision-making.

Skilled in applying Agile/Scrum methodologies to ensure timely delivery of data solutions, collaborating effectively with cross- functional teams.

Knowledgeable in developing data pipelines using Apache Kafka and implementing workflows to ensure compliance with industry standards, including HIPAA and GDPR.

Proficient in Python, SQL, HiveQL, and Pig Latin, building data transformations, ensuring data accuracy and delivering insights through interactive dashboards in Tableau and Power BI.

Familiar with GCP services like Big Query, and Cloud Storage, aligning with modern data warehousing and analytics needs.

SKILLS

Programming and Scripting Languages: Python (NumPy, Pandas, PySpark, Scikit-learn, TensorFlow), SQL, HiveQL, Pig Latin, R, Java, TypeScript, C, Shell Scripting (Linux/Unix)

Visualization Tools: Tableau, Power BI

Version Control & & CI/CD: Git, JIRA, Jenkins, Confluence

ERP Systems: SAP MM (Material Management), SAP Charm

Data Analysis: Data Encryption, Data Security, GDPR Compliance

Data Modelling: Star Schema, Snowflake Schema, Dimensional Modeling

Databases: MySQL, PostgreSQL, MS SQL Server, MongoDB, Oracle DB

Machine Learning & Statistical Analysis: Supervised/Unsupervised Learning, Spark ML, Predictive Modeling, Statistical Analysis

Data Engineering & ETL Tools: Apache Airflow, AWS Glue, SSIS, Informatica, Snowflake, DBT, Redshift

Big Data Technologies: Hadoop (HDFS, Hive, MapReduce), PySpark, Apache Kafka, Spark Framework

Cloud Platforms: AWS (S3, Lambda, RDS, Glue, Athena, EC2, Kinesis, IAM, EMR), Azure (Data Factory, Databricks, Synapse Analytics, Blob Storage), Terraform EXPERIENCE

Lorhan Corporations Inc Middlesex, NJ

Data Engineer Jan 2025 – Current

Managed AWS cloud infrastructure resource provisioning with AWS services such as EC2, RDS, and S3, reducing deployment times and ensuring the reliability of financial data systems critical for trading, risk management, and compliance.

Orchestrated real-time data processing using AWS Kinesis and Apache Kafka to synchronize transactional data across distributed systems, supporting over 200 financial transactions per second for real-time fraud detection and payment processing.

Conceptualized AWS EMR, Hadoop, and Spark to process and analyse million records per day, ensuring efficient handling of high-volume datasets and providing insights into market trends, portfolio management, and risk assessment.

Developed and maintained ETL workflows with Apache Spark and AWS Glue, ensuring seamless data transformations and

increased processing speeds for large-scale datasets.

Migrated 10TB+ of financial data from legacy systems to Snowflake, leveraging dbt for scalable transformations and high- performance querying to support real-time analytics for investment decisions and compliance reporting.

Automated routine data operations and system monitoring tasks using Unix shell scripts, improving operational efficiency and ensuring 24/7 availability of critical financial systems for real-time data analysis and reporting.

Utilized Python and SQL to create data transformation scripts for cleaning, processing, and enriching transaction data, ensuring accuracy and integrity in downstream analytics.

Developed and optimized ETL pipelines using Scala, DBT and Apache Spark to process large-scale datasets (10TB+), improving data transformation efficiency by 30%.

Integrated Apache Kafka to handle over 500 events per minute, ensuring reliable real-time streaming of financial data with minimal latency, supporting high-frequency trading platforms and fraud detection systems. Accenture India

Application Development Associate Aug 2021 – July 2022

Designed and deployed 5+ scalable data pipelines leveraging Java, AWS (Glue, S3, Lambda), and Redshift, enabling daily ETL/ELT processing of 2TB+ data from 3+ heterogeneous sources (databases, APIs, logs) to support enterprise analytics.

Automated CI/CD workflows for data infrastructure using Jenkins, Terraform, and Azure DevOps, reducing pipeline deployment cycles by 50 hours/year and manual intervention by 120 hours/year.

Engineered real-time streaming solutions with Kafka, processing 50,000+ events/minute to power dashboards and downstream applications, improving data freshness for inventory and procurement analytics.

Integrated SAP MM (procurement/inventory) and SAP BW data into cloud analytics ecosystems by building automated extraction jobs, ensuring real-time synchronization of financial and operational datasets for reporting.

Developed Tableau dashboards for cross-functional teams by modeling complex datasets into 10+ KPIs, reducing time-to-insight by 30% and enabling data-driven decisions for leadership.

Optimized data storage and query performance in Redshift through partitioning, indexing, and cost-efficient schema design, cutting warehouse costs by 15% while maintaining SLA compliance.

Implemented containerized data workflows using Docker and Kubernetes, reducing deployment time by 40% and enhancing scalability.

Mathashree Healthcare Narasaraopet,India

Data Engineer Jan 2019 – May 2021

Resolved incremental load workflows by transitioning healthcare data from traditional RDBMS to Azure Data Lake, ensuring daily synchronization of healthcare data, such as patient records, treatment histories, and clinical observations, across systems.

Developed RESTful APIs using Spring Boot, enabling secure data exchange and processing 50,000+ requests daily for real-time healthcare analytics.

Executed ETL processes with Azure Data Factory, T-SQL, and U-SQL to transfer data between more than 5 source systems and Azure Storage services.

Leveraged Databricks for scalable data processing, optimizing ETL workflows and integrating healthcare, including EHR systems and patient records, into a unified data platform, processing over 1TB of data daily.

Devised and employed SQL and NoSQL databases (SQL Server, MongoDB) for healthcare applications, ensuring the efficient processing of 10+ terabytes of patient and healthcare operational data.

Enhanced query performance by implementing optimized Snowflake data models and dbt transformations.

Coordinated ETL workflows using Apache Airflow to automate data pipelines, ensuring secure, compliant data processing per HIPAA, and streamlining data integration, reducing processing time by 250 hours annually for analytics and reporting.

Incorporated healthcare data from MongoDB and MS SQL using Azure Data Factory, consolidating data from over 100 sources and managing up to 500GB of data daily to support analytics and reporting. ACADEMIC PROJECTS

Library Management System Database

Developed a library management database in Microsoft SQL, optimizing 900+ book records and reducing retrieval time by 40%.

Designed a web-based analytics dashboard using React and JavaScript, hosted on AWS EC2, and integrated with Azure Blob Storage for real-time data visualization and distributed storage. Prediction of Heart Disease using Machine Learning:

Crafted Python libraries (Matplotlib, Scikit-learn, and NumPy) to build a machine learning model predicting heart disease.

Achieved high accuracy and precision through algorithms like Logistic Regression, Random Forest, and Decision Tree. EDUCATION

Indiana Wesleyan University IN, USA

Master of Science in Data Analytics Jan 2023 – Aug 2024 SIR CRR College of Engineering and Technology Hyderabad, India Bachelor of Technology in Computer Science and Engineering Aug 2017 – May 2021 CERTIFICATIONS

AWS Data Engineer Associate, AWS

Hacker Rank Certified: SQL(Advanced)

Python for Data Science

Understanding the Statistics for Data Science by Internshala

Contact this candidate