Srinath Palleboina
Data Engineer
Athens, OH 740-***-**** ******************@*****.*** LinkedIn SUMMARY
Data Engineer with 5+ years of experience architecting and optimizing large-scale data pipelines, real-time streaming systems, and cloud- native data platforms across AWS, Azure, and on-premise environments. Skilled in building robust ETL/ELT workflows using Apache Spark, dbt, Airflow, and Snowflake, enabling high-volume data processing. Proficient in designing event-driven architectures leveraging Kafka, Flink, and Python for low-latency analytics. Experienced in developing data warehouses and feature stores using PostgreSQL, ClickHouse, and Redshift to support advanced analytics and ML pipelines. Adept at MLOps implementation with Terraform, Docker, Kubernetes, and CI/CD automation, enhancing model deployment reliability and speed. Strong expertise in feature engineering, LLM fine-tuning, and predictive modeling using XGBoost and TensorFlow. Known for driving data quality, cost optimization, and self-service analytics solutions with Looker and Power BI in Agile development environments.
PROFESSIONAL EXPERIENCE
Apple CA Senior Data Engineer September 2024 – Present
• Designed robust ETL/ELT workflows with dbt, Snowflake, and Apache Airflow, transforming large amounts of data daily, reducing pipeline execution time by 25%, and improving data observability with Great Expectations.
• Built and optimized real-time, event-driven data pipelines using Python, PySpark, Kafka, and AWS services (Glue, Lambda, SQS), increasing data throughput by 40% and enabling low-latency analytics.
• Automated CI/CD workflows using Terraform and Jenkins, reducing deployment time by 40% and enhancing infrastructure reliability through Infrastructure as Code (IaC).
• Conducted advanced feature engineering (e.g., aggregations, embeddings, anomaly detection) and model optimization using XGBoost and TensorFlow, improving fraud detection precision by 15% and reducing false positives by 20%.
• Developed and maintained a scalable on-premise data warehouse using PostgreSQL and ClickHouse, implementing partitioning, indexing, and materialized views to reduce query response time by 30%.
• Actively participated in Agile/Scrum development cycles, contributing to sprint planning, daily stand-ups, and retrospectives, resulting in improved team collaboration and project delivery efficiency. Goldman Sachs NY Data Engineer December 2023 – August 2024
• Built and maintained real-time data pipelines using Apache Flink and Apache Kafka, reducing trade settlement latency by 50% and enabling millisecond-level risk analytics for fixed-income and equities trading systems.
• Architected and optimized AWS data infrastructure (Redshift, Glue, EMR) for high-frequency trading analytics, implementing auto- scaling and cost-efficient storage tiers that reduced cloud spend by 40%.
• Fine-tuned large language models (LLMs) including GPT and LLaMA to extract structured insights from unstructured financial documents
(earnings reports, SEC filings, news), increasing data extraction accuracy by 35%.
• Created SQL-based feature stores from 10TB+ of historical market data by cleaning, normalizing, and deduplicating time-series data, enabling efficient hyperparameter tuning and improving model convergence during backtesting by 30%.
• Automated trade data reconciliation using Python (Pandas, NumPy) and created real-time dashboards with Looker, reducing trade breaks by 30% and saving over 15 hours weekly in manual validation.
• Streamlined CI/CD pipelines for containerized ML microservices using GitOps, Docker, Kubernetes (EKS), and ArgoCD, enabling zero- downtime deployments with automated rollback, reducing production incidents by 60%. Accenture India Data Engineer October 2017 – December 2021
• Developed and automated ETL fraud workflows using Azure Data Factory and Informatica, integrating data from Oracle, APIs, and flat files; improved processing efficiency by 40% and streamlined data transformations.
• Implemented a hybrid big data architecture leveraging Azure Synapse, Hadoop, HDFS, and Hive, optimizing query performance by 30% through effective partitioning and caching strategies.
• Designed and deployed interactive dashboards and reporting tools using Power BI and Python visualization libraries (Plotly, Seaborn), empowering executives with real-time business insights.
• Engineered hybrid database environments combining SQL systems (Oracle, PostgreSQL) and NoSQL databases (MongoDB, Cassandra), reducing API response times by 45% with optimized query structures. TECHNICAL SKILLS
Programming & Scripting: Python, Java, Scala, SQL (PL/SQL, T-SQL), NoSQL, Shell Scripting Big Data Technologies: Apache Spark, Hadoop (Hive, Pig, Sqoop, MapReduce, HDFS), Kafka, Kinesis, AWS EMR, Apache Flink ETL / Data Pipelines: dbt, Apache Airflow, AWS Glue, Talend, Informatica, SSIS, Luigi, Prefect, Oozie, Azure Data Factory Data Warehousing & Databases: Snowflake, ClickHouse, PostgreSQL, SQL Server, Oracle, MySQL, MongoDB, DynamoDB, Cassandra Cloud Platforms: Amazon Web Services (EC2, S3, Lambda, Glue, Redshift, SQS), Microsoft Azure (Synapse, Databricks, Event Hubs) DevOps & Infrastructure as Code: Git, GitLab CI/CD, Jenkins, Docker, Kubernetes (EKS), Terraform, IAM, VPC, GitOps, ArgoCD Machine Learning & MLOps: TensorFlow, PyTorch, Scikit-learn, MLflow, Feature Store, Model Deployment, Model Monitoring, XGBoost Data Analytics & Visualization: Power BI, Tableau, AWS QuickSight, Looker, SSRS, Excel, Python (Matplotlib, Seaborn, Plotly) Frameworks & Libraries: Pandas, NumPy, SciPy, Keras, NLTK, PyMC3, Scrapy Development Methodologies: Agile/Scrum, SDLC, CI/CD, ModelOps, Data Observability (Great Expectations) EDUCATION
Master’s in Computer Science Ohio University, Athens, Ohio, USA. Bachelor’s in Computer Science Jawaharlal Nehru Technological University, Hyderabad, India.