Senior Data Engineer - Cloud-Native Data Platforms Expert

Location:

Charleston, SC

Posted:

April 21, 2026

Contact this candidate

Resume:

Saritha Reddy Kamasani

*****************@*****.*** +1-940-***-****

https://www.linkedin.com/in/saritha-reddy-kamasani-24572b227 PROFESSIONAL SUMMARY:

Data Engineer with 5+ years of experience designing and building scalable, cloud-native data platforms across healthcare, finance, and enterprise environments. Expertise in developing batch and real-time data pipelines, modern lakehouse architectures, and analytics-ready data models using Databricks, Apache Spark, Airflow, and dbt. Strong experience across AWS, Azure, and GCP, with deep focus on data governance, observability, and performance optimization. Proven ability to deliver reliable, high-quality data solutions that power enterprise analytics, BI, and AI/ML initiatives including LLM-driven use cases. SKILLS:

● Programming Languages: Python, SQL, Scala, R, Java, C++, C#, JavaScript, YAML, Bash

● Database Management Systems (DBMS): PostgreSQL, MySQL, Oracle, SQL Server, MongoDB, Cassandra, Amazon RDS, DynamoDB, Elasticsearch, NoSQL, OLTP, RDBMS

● Data Warehousing & Lakehouse: Snowflake, Amazon Redshift, Google BigQuery, Teradata, Apache Hive, Delta Lake, Apache Iceberg

● ETL / ELT & Data Transformation: Apache Spark (PySpark, Spark SQL), Apache Airflow, dbt (Core & Cloud), Azure Data Factory, AWS Glue, Apache NiFi, Talend, Informatica, Fivetran, Stitch, Airbyte

● Big Data & Streaming Technologies: Apache Kafka, Kafka Streams, Apache Flink, Apache Hadoop, HBase, Apache Druid, Databricks, Amazon Kinesis, EMR, Cloudera

● Cloud Platforms & Ecosystems: AWS, Azure (Microsoft Fabric, Azure Synapse Analytics, ADLS Gen2), GCP, Amazon S3, Azure Data Lake Storage, Google Cloud Storage

● Version Control & CI/CD: Git, GitHub, GitHub Actions, Azure DevOps, CI/CD pipelines for data workflows

● Data Visualization & BI: Tableau, Power BI, Qlik Sense, Cognos, OBIEE

● Machine Learning / AI & Advanced Analytics: TensorFlow, PyTorch, LLM pipelines, RAG architectures, Vector Databases (FAISS, Pinecone), Feature Stores (Databricks Feature Store, Feast)

● Operating Systems: Linux/Unix, Windows, macOS

● Containerization & Orchestration: Docker, Kubernetes

● Tools & IDEs: IntelliJ, Visual Studio Code, Jupyter Notebook, PyCharm, ER Studio, JIRA, Confluence

● Data Governance, Quality & Observability: Unity Catalog, Azure Purview, Great Expectations, OpenLineage, RBAC/ABAC, Data Masking

● Monitoring & Logging: Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), AWS CloudWatch EXPERIENCE

Data Engineer

Abbott – Chicago, IL 07/2025 - Present

● Designed and developed scalable ETL/ELT pipelines using Azure Data Factory and Databricks to process large-scale healthcare and life sciences datasets.

● Built and optimized PySpark and SQL-based transformations to support batch and near real-time data ingestion and processing.

● Implemented dimensional data models (star and snowflake schemas) to enable analytics-ready datasets for enterprise reporting.

● Managed and optimized data storage in Azure Data Lake (ADLS Gen2) ensuring performance, scalability, and cost efficiency.

● Designed and implemented scalable data pipelines and lakehouse solutions using Microsoft Fabric to support data integration and analytics.

● Designed and implemented modern lakehouse solutions using Microsoft Fabric, leveraging OneLake, Dataflows, and integrated Power BI semantic models.

● Developed and optimized pipelines and dedicated SQL pools in Azure Synapse Analytics for high-performance transformations and large-scale analytics.

● Enforced data quality, validation, and reconciliation checks using automated frameworks to meet regulatory and audit requirements.

● Applied data governance and access controls using Unity Catalog and Azure Purview ensuring compliance with HIPAA, GxP, and SOX standards.

● Automated pipeline deployments using CI/CD practices with GitHub Actions and Azure DevOps.

● Designed and implemented AI/ML-driven data pipelines, enabling feature engineering, model-ready datasets, and integration of LLM-based analytics to support predictive insights and advanced decision-making.

● Collaborated with cross-functional teams to translate business requirements into scalable and maintainable data solutions.

● Optimized semantic layers and datasets for Power BI and Qlik dashboards, improving performance and user experience.

● Monitored, troubleshot, and tuned pipelines using observability tools to ensure high availability and reliability in production environments.

Data Engineer

Goldman Sachs - New York, NY 01/2024 – 06/2025

● Engineered scalable data pipelines using Python, Scala, and Apache Spark to process high-volume financial datasets.

● Built event-driven ETL pipelines using AWS Lambda for real-time and near real-time data ingestion from DynamoDB to Redshift.

● Developed ingestion frameworks using AWS Glue to process structured and semi-structured data from Amazon S3 into Snowflake and Redshift.

● Designed and optimized Snowflake architecture including staging, ODS, and dimensional layers following Kimball modeling principles.

● Implemented Medallion Architecture (Bronze, Silver, Gold layers) to organize data pipelines, improving data quality and enabling efficient downstream analytics.

● Implemented dbt and Snowpark-based transformation pipelines to standardize data modeling and improve pipeline efficiency.

● Orchestrated complex ETL workflows using Apache Airflow deployed on Kubernetes for scalable scheduling and orchestration.

● Developed high-performance PL/SQL procedures and optimized queries using indexing, partitioning, and materialized views.

● Designed RESTful and gRPC-based data services using Flask and Django to enable data access and microservices integration.

● Built centralized logging and observability platform using ELK stack, Kinesis, and CloudWatch for monitoring pipeline health.

● Managed Fivetran pipelines and connectors for automated ingestion into Snowflake and Redshift.

● Integrated large-scale customer and marketing datasets enabling segmentation, attribution, and advanced analytics use cases.

● Prepared datasets for AI/ML and LLM workloads including feature engineering, anonymization, and metadata enrichment.

● Automated infrastructure provisioning using Terraform and implemented DevOps best practices for data platforms.

● Delivered dashboards and insights using Tableau, Qlik, and Splunk to support business decision-making. Data Engineer

IBM – India 12/2020 – 07/2022

● Designed and implemented enterprise data warehouse and BI solutions using AWS services including S3, Redshift, Lambda, API Gateway, DynamoDB, and AWS Glue.

● Built scalable ETL pipelines using Apache Spark, PySpark, and Scala (RDDs, DataFrames, Spark SQL) for large-scale data transformation and aggregation.

● Developed automated data ingestion frameworks integrating data from APIs, Amazon S3, Teradata, and Snowflake using Python and Scala.

● Engineered batch and real-time data pipelines to process multi-channel customer engagement and marketing analytics datasets.

● Implemented CI/CD pipelines to integrate data engineering workflows with DevOps practices and improve deployment automation.

● Designed and maintained Apache Airflow DAGs to orchestrate complex ETL workflows and scheduled data pipelines.

● Built serverless data processing pipelines using AWS Lambda integrated with API Gateway and DynamoDB for event-driven architectures.

● Processed structured, semi-structured, and unstructured datasets while implementing data profiling and quality validation using Python and SQL.

● Developed Snowflake schemas and ingestion pipelines using Snowpipe and Matillion to process batch and streaming data from AWS S3 data lakes.

● Leveraged Spark Streaming, Hadoop (HDFS, Hive), and GCP services including Dataproc, GCS, Cloud Functions, and BigQuery for distributed data processing and analytics.

Big Data Developer

General Electric - India 06/2018 – 05/2020

● Collaborated across multiple projects and cross-functional teams to deliver scalable analytics and cloud-based data solutions.

● Designed and built scalable data pipelines on Azure using Azure Data Factory, Databricks, and related services.

● Developed and optimized PySpark applications on Azure Databricks for large-scale data processing and transformation.

● Built end-to-end ETL/ELT pipelines using Azure Data Factory, SSIS, and Databricks for batch and near real-time workloads.

● Engineered real-time data ingestion pipelines using Azure Event Hub and streaming frameworks for continuous data processing.

● Delivered interactive Power BI dashboards providing actionable insights into marketing performance and operational KPIs.

● Developed distributed data processing solutions using Apache Spark, Hive, Python, and Scala for large datasets.

● Optimized Spark and Hive workloads through performance tuning techniques including partitioning, caching, and query optimization.

● Orchestrated and automated workflows using Apache Airflow and Oozie for reliable job scheduling and pipeline management.

● Integrated Talend with on-premise systems and Azure SQL for seamless data migration and hybrid data workflows.

● Implemented CI/CD pipelines using Jenkins, improving deployment efficiency and code quality.

● Built serverless automation solutions using AWS Lambda to eliminate manual processes and improve operational efficiency.

● Deployed and managed data streaming applications using Docker and StreamSets for real-time ingestion into HDFS. EDUCATION

● Master of Science in Computer Science - University of North Texas

Contact this candidate