Data Engineer

Location:

Boca Raton, FL

Posted:

September 10, 2025

Contact this candidate

Resume:

SATHVIK REDDY NANDYALA

**************@*****.*** 561-***-**** Linkedin

OBJECTIVE

Dynamic Data Engineer with 5 years of experience designing and deploying scalable ETL/ELT pipelines, real-time streaming solutions, and cloud-native platforms across AWS, GCP, and Azure. Proficient in AWS (S3, Redshift, Glue, Lambda, SageMaker), experienced in GCP (BigQuery, Dataflow, Pub/Sub), and skilled in Azure (Databricks, Synapse, Data Factory). Strong expertise in Apache Spark, Kafka, Airflow, SQL, and Python, with a proven track record of optimizing data systems, enabling real-time analytics, and delivering business-critical insights.

PROFILE SUMMARY

•5 years of experience building end-to-end ETL/ELT pipelines and scalable data platforms across AWS, GCP, and Azure.

•Proficient in Apache Spark, PySpark, and Kafka for large-scale data ingestion, batch, and real-time streaming.

•Skilled in AWS (S3, Glue, Redshift, Lambda, SageMaker), GCP (BigQuery, Dataflow, Pub/Sub), and Azure (Databricks, Synapse, Data Factory), delivering optimized, cloud-native solutions.

•Experienced in data modeling, schema design, and lakehouse architectures (Delta Lake, Snowflake, Redshift) supporting structured and unstructured datasets.

•Automated workflows with Apache Airflow and CI/CD pipelines (Jenkins, GitHub Actions, Docker), improving deployment speed.

•Developed feature pipelines, ML model deployment workflows (MLflow, SageMaker), and LLM-based solutions (GPT-4, BERT).

•Delivered actionable insights with SQL, Tableau, Power BI, and QuickSight, enabling faster business decision-making.

•Applied predictive modeling, anomaly detection, NLP, and forecasting on operational and business datasets.

•Implemented RBAC, data validation, lineage tracking, and monitoring (CloudWatch, Stackdriver, Azure Monitor), ensuring compliance and trust in data.

•Partnered with cross-functional teams to transform raw data into analytics-ready datasets, accelerating insights and innovation.

EDUCATION

University of North Texas Denton, Texas

Master of Science, Computer and Information Sciences GPA: 3.81 August 2023 - May 2025 Relevant Coursework: Big Data Analytics, Data Warehousing, Cloud Computing, Machine Learning, Distributed Systems, Database Systems, Data Mining, Algorithms, Computer Networks

TECHNICAL SKILL

Programming & Scripting: Python (NumPy, Pandas, PySpark, FastAPI), SQL (T-SQL, PL/SQL), Java, R, Shell Scripting, Scala

Big Data & ETL Frameworks: Apache Spark, PySpark, Hadoop (HDFS, MapReduce), Apache Kafka, Informatica, IBM DataStage, Talend, Apache Beam, Flink

Data Warehousing & Databases: Snowflake, PostgreSQL, MySQL, SQL Server, Teradata, MongoDB, Cassandra, DynamoDB, Oracle

Cloud Platforms:

•AWS: S3, EC2, Lambda, Glue, Redshift, EMR, Athena, Kinesis, DynamoDB, CloudFormation, Step Functions, SageMaker

•GCP: BigQuery, Dataflow, Pub/Sub, Dataproc, Cloud Composer, GCS, Stackdriver (Ops Agent), Vertex AI

•Azure: Data Lake, Databricks, Synapse Analytics, Data Factory, Cosmos DB, Event Hub, Azure Functions, Purview, Azure ML

Data Formats & Storage: Parquet, ORC, Avro, JSON, CSV, XML, Delta Lake, Apache Iceberg, Hudi

Workflow Orchestration & Automation: Apache Airflow, Oozie, Dagster, Luigi, CI/CD (Jenkins, GitHub Actions, GitLab CI, Azure DevOps)

DevOps & Infrastructure: Docker, Kubernetes (EKS, AKS, GKE), Terraform, Ansible, Helm, Cloud Monitoring & Logging (CloudWatch, Azure Monitor, GCP Ops), Prometheus, Grafana

Messaging & Streaming Systems: Kafka, Kinesis, RabbitMQ, ActiveMQ, Pub/Sub, Pulsar, AWS MSK

Version Control & Collaboration: Git, GitHub, GitLab, Bitbucket, SVN, Jira, Confluence, Azure Boards

Data Governance & Quality: dbt (Data Build Tool), Collibra, Apache Atlas, Data Catalogs, Lineage Tracking, RBAC, IAM, GDPR, HIPAA,

Visualization & Analytics: Tableau, Power BI, Looker, AWS QuickSight,, Matplotlib, Seaborn, Excel (Advanced), Google Data Studio

Testing & Monitoring: Integration Testing, Unit Testing (pytest, JUnit), Data Quality Checks, Great Expectations, ELK Stack (Elasticsearch, Logstash, Kibana), Splunk

Machine Learning & AI Integration: MLflow, AWS SageMaker, Azure ML, Vertex AI, TensorFlow, PyTorch, Scikit-learn, Feature Engineering, Model Deployment, LangChain, LLM Integration (GPT-4, BERT, LLaMA, Hugging Face Transformers)

WORK EXPERIENCE

Data Engineer – Barclays Whippany, NJ May 2024 - Present

•Designed and maintained scalable ETL/ELT pipelines in Python and Java, ingesting 10M+ daily transactions from multiple financial systems into AWS S3 and Redshift.

•Built real-time ingestion pipelines with Amazon Kinesis and MSK (Managed Streaming for Kafka), reducing downstream analytics latency by 30%.

•Engineered optimized data storage solutions in Redshift and DynamoDB, applying schema tuning and partitioning strategies that improved query performance by 35%.

•Automated end-to-end ML model deployment using Amazon SageMaker, Step Functions, and MLflow, accelerating predictive model release cycles from weeks to days.

•Developed metadata enrichment workflows leveraging Amazon Comprehend (NER, topic modeling) to enhance search and transaction log indexing.

•Implemented comprehensive data quality validation and lineage tracking with Great Expectations and AWS Glue Data Catalog, ensuring trust and auditability.

•Orchestrated event-driven pipelines with AWS Lambda, Glue, and Step Functions, enabling seamless serverless data workflows.

•Built real-time BI dashboards in Amazon QuickSight, empowering financial teams with low-latency insights into key KPIs.

•Applied RBAC with AWS IAM policies and encryption (KMS, SSE-S3) to secure sensitive financial and PII data in compliance with GDPR and SOC2.

•Partnered with data scientists and business analysts to transform raw datasets into analytics-ready formats on AWS Lake Formation, reducing report generation time by 40%.

•Designed and deployed data lakehouse architecture on AWS (S3 + Redshift Spectrum + Glue), centralizing structured and semi-structured data for analytics and compliance.

•Integrated AWS CloudWatch, CloudTrail, and custom logging to monitor pipeline health, detect anomalies, and improve incident response times.

•Optimized costs by implementing lifecycle policies for S3 storage tiers (Standard, IA, Glacier), reducing monthly storage expenses.

•Conducted performance benchmarking with Redshift Workload Management (WLM), improving query concurrency and reducing SLA violations.

Environment:

AWS (S3, Redshift, Glue, Lambda, Kinesis, MSK, DynamoDB, Step Functions, SageMaker, Lake Formation, CloudWatch, CloudTrail),

Python, Java, SQL, PySpark, MLflow, Great Expectations, Apache Airflow, Tableau, Amazon QuickSight, GitHub Actions, Docker, Terraform, IAM/KMS

Data Engineer – TD Bank Mount Laurel, NJ October 2023 - April 2024

•Built and maintained scalable ETL pipelines with Apache Spark, PySpark, and Cloud Composer (Airflow on GCP), enabling near real-time processing of customer transactions across multiple banking systems.

•Automated ingestion from APIs, flat files, and relational databases into Google Cloud Storage (GCS) and Dataflow, reducing manual integration work by 45%.

•Designed BigQuery data models with clustering, partitioning, and materialized views, improving reporting query performance.

•Developed Pub/Sub streaming pipelines to capture customer behavior data, enabling proactive fraud detection and real-time monitoring of suspicious activity.

•Partnered with regulatory and compliance teams to provision analytics-ready datasets in BigQuery for AML and KYC reporting, ensuring faster regulatory submissions and full audit compliance.

•Implemented data governance practices including RBAC, column-level encryption, and lineage tracking in Data Catalog, strengthening security and ensuring PCI-DSS/GDPR compliance.

•Built a data validation framework using Great Expectations and Python within GCP workflows to enforce quality rules, preventing bad data from reaching downstream analytics systems.

•Streamlined deployments with CI/CD workflows (Jenkins, GitHub Actions, Docker, Cloud Build), reducing release times by 35% and minimizing production errors.

•Collaborated with data science teams to engineer feature pipelines for credit risk and fraud detection models on BigQuery ML, improving accuracy and reducing false positives by 20%.

Environment:

Google Cloud Platform (BigQuery, Dataflow, Pub/Sub, Dataproc, Cloud Composer, GCS, Stackdriver, Cloud Build), Python, SQL, PySpark,

Apache Spark, Airflow, Great Expectations, Docker, Jenkins, GitHub Actions, Looker Studio, Data Catalog

Data Engineer– Cognizant Hyderabad, India January 2021- July 2023

•Designed and orchestrated ETL workflows in Azure Data Factory and Databricks (PySpark, Python), improving pipeline reliability and scalability.

•Built ingestion pipelines to integrate APIs, log files, and relational sources into Azure Data Lake Storage and Synapse Analytics, enabling unified reporting for business stakeholders.

•Developed and optimized RESTful APIs for data exchange with downstream analytics systems, reducing latency in data delivery.

•Implemented incremental load strategies and partitioning in Azure SQL Database and Synapse, enhancing performance for large-scale transactional datasets.

•Leveraged Azure Cache for Redis for in-memory optimization, reducing repeat query overheads and improving data accessibility.

•Automated data validation and quality checks with PySpark, SQL, and Great Expectations, ensuring accuracy in downstream analytics.

•Migrated legacy ETL pipelines into Azure Data Factory and ADLS Gen2-based workflows, reducing infrastructure costs and simplifying maintenance.

•Designed schema evolution and change data capture (CDC) pipelines in Azure Databricks, ensuring smooth integration of frequently changing source systems.

•Implemented monitoring and alerting using Azure Monitor, Log Analytics, and custom dashboards, improving incident response and system visibility.

Environment:

Azure (Data Factory, Databricks, Synapse Analytics, Data Lake Storage Gen2, Azure SQL Database, Cosmos DB, Event Hub, Azure Functions,

Azure Cache for Redis, Azure Monitor, Log Analytics, Purview),Python, SQL, PySpark, REST APIs, Great Expectations, Git, Azure DevOps,

Docker, Jira, Confluence

CERTIFICATION

•AWS Certified Developer – Associate

•Microsoft Certified: Azure Developer Associate (AZ-204)

Contact this candidate