Post Job Free
Sign in

Senior Data Engineer with 7+ Years of Cloud-Powered Data Platforms

Location:
Columbus, OH
Posted:
January 20, 2026

Contact this candidate

Resume:

INDRAKANTH REDDY RUDRARAM

SENIOR DATA ENGINEER

Gmail: **************@*****.*** Contact: +1-210-***-****

LinkedIn: linkedin.com/in/indrakanth18

PROFESSIONALSUMMARY

Senior Data Engineer with 7+ years of experience designing and delivering large-scale, secure, cloud-native data platforms across AWS, Azure, and hybrid environments.

Expertise in the AWS ecosystem including Glue, S3, Redshift, Lambda, EMR, Athena, Step Functions, IAM, and KMS, with deep hands-on experience in Python and SQL optimization.

Skilled in building scalable ETL/ELT pipelines using AWS Glue, Azure Data Factory, Databricks, Airflow, and event-driven architectures (SQS, SNS, Lambda).

Strong experience in Snowflake, Redshift, Synapse, and BigQuery, including schema design, clustering, materialized views, and performance tuning.

Proficient with DBT (models, tests, documentation, lineage) for modern transformation workflows and semantic modeling.

Hands-on expertise with Delta Lake and Apache Iceberg, implementing ACID-compliant lakehouse architectures and high-performance data models.

Extensive experience developing real-time and batch pipelines using PySpark, Spark SQL, Spark Structured Streaming, and EMR/Synapse/Databricks.

Strong background integrating Kafka with Spark and cloud services for streaming ingestion, event processing, and low-latency analytics.

Experience deploying and orchestrating data workflows using Apache Airflow, Step Functions, Azure Functions, and CI/CD automation.

Skilled in Kubernetes (EKS/AKS) and Docker for containerized ETL workloads and scalable data services deployments.

Proficient in DataOps practices, including CI/CD for data pipelines, infrastructure as code (Terraform), automated testing (PyTest), and observability.

Advanced SQL expertise across PostgreSQL, Oracle, MySQL, MongoDB, including complex joins, stored procedures, tuning, and query performance optimization.

Experienced in building API-driven ingestion pipelines (REST, JSON, XML, streaming APIs) with secure auth (OAuth2, API Keys).

Strong experience implementing data governance using Collibra, Alation, and Glue Catalog (metadata, lineage, schema versioning, quality rules).

Skilled in Python for building reusable ETL frameworks, automation utilities, validation engines, and integration with third-party systems.

Extensive experience delivering Power BI, QuickSight, and Tableau dashboards, including DAX modeling, semantic layer design, and performance tuning.

Hands-on with Big Data/Hadoop components (HDFS, Hive, HBase, Sqoop, Flume, Oozie) for distributed data processing in large clusters.

Proven track record optimizing cost and performance using partitioning, file formats (Parquet/Avro), Z-ordering, caching, and resource tuning.

Strong collaborator with DevOps, BI, and ML teams; experienced in supporting feature pipelines, ML-ready datasets, and MLOps-adjacent workflows.

Mentor and team lead with experience training engineers in Spark optimization, Glue, Python best practices, and cloud engineering standards.

TECHNICAL SKILLS

Category

Technologies / Tools

Programming & Scripting

Python, SQL, PySpark, Scala, Shell Scripting

Cloud Platforms

AWS (Glue, S3, Redshift, Lambda, EMR, Step Functions, IAM, CloudWatch)

Azure (Data Factory, Databricks, Synapse, ADLS Gen2, Event Hubs, Azure SQL)

GCP (BigQuery, Google Cloud Storage, Dataflow – working knowledge)

Data Engineering & Lakehouse

Apache Spark, Delta Lake, Apache Iceberg, Lakehouse Architecture,

Data Modeling (Star/Snowflake), DBT (Models, Tests, Lineage), Feature Pipelines

Big Data & Streaming

Kafka, Spark Structured Streaming, Hadoop (HDFS, Hive, HBase, Sqoop), Flume

Orchestration & Workflow Automation

Apache Airflow, AWS Step Functions, Azure Data Factory, Azure Functions

Data Warehousing

Snowflake, Redshift, Azure Synapse, BigQuery, PostgreSQL, MySQL, Oracle

DevOps, CI/CD & Infrastructure

Terraform, Jenkins, Git, Docker, Kubernetes (EKS/AKS), GitHub Actions

PyTest (unit/integration tests), DataOps practices

Data Governance & Metadata

Collibra, Alation, AWS Glue Data Catalog, Schema Versioning, Lineage Tracking

Monitoring & Observability

CloudWatch, Prometheus, Grafana, Log Analytics, Cost Optimization

Visualization & BI

Power BI (DAX, Data Modeling), Amazon QuickSight, Tableau

Methodologies

Agile Scrum, CI/CD, TDD, SDLC

PROFESSIONAL EXPERIENCE

Interstate Batteries, Dallas, TX July 2025 – Current

Senior Data Modeler / Data Engineer

Led Project Green, an enterprise-wide sustainability initiative focused on optimizing analytical data models and enhancing reporting for operational and environmental metrics.

Assessed and redesigned existing Snowflake data models to improve schema efficiency, reduce query latency, and enhance downstream analytics performance.

Enhanced Snowflake schemas by adding new datasets, calculated fields, and business measures to support expanding BI and reporting needs.

Designed and implemented dimensional models (star and snowflake) to support Power BI self-service analytics across finance, operations, and sustainability domains.

Built modular DBT models (staging, intermediate, marts layers) to standardize transformation logic across Snowflake and Redshift environments.

Implemented DBT tests, documentation, and lineage tracking, improving data quality, auditing, and transparency for analytics teams.

Automated DBT execution using CI/CD pipelines, ensuring version-controlled, reproducible transformations aligned with enterprise DataOps practices.

Conducted detailed data profiling, validation, and lineage analysis to ensure data accuracy, completeness, and consistency across integrated systems.

Applied governance best practices using Collibra / Alation, documenting datasets, business definitions, lineage, and metadata to strengthen data discoverability.

Leveraged Google Cloud Platform (GCP) services including BigQuery and GCS to enable scalable data storage and analytics for geospatial-data solutions supporting the dealer network.

Designed and built ingestion pipelines to GCS and BigQuery for high-volume location and sales data, enabling near-real-time insights and interactive dashboards across mobile and web channels.

Integrated Google Maps Platform APIs with internal analytics architecture, enabling location intelligence and routing analytics for 150,000+ dealers; Data stored and processed via GCP-native services for scalability.

Implemented data quality rules, metadata tagging, schema versioning, and catalog enhancements using both Snowflake and AWS Glue Data Catalog.

Optimized ingestion and transformation pipelines to reduce Snowflake query execution time and overall data latency for critical dashboards.

Developed and optimized Power BI data models, DAX measures, and semantic layers using DAX Studio to improve dashboard refresh times and user experience.

Collaborated with data architects, analysts, and BI teams to translate business requirements into scalable data structures and reliable semantic models.

Created data architecture standards, modeling guidelines, and documentation for ETL logic, ensuring long-term platform maintainability.

Worked closely with QA and production teams to validate pipeline enhancements, maintain data accuracy, and support stable deployments.

Participated in Agile sprint planning, reviews, and retrospectives, contributing to faster delivery cycles and improved coordination across analytics teams.

Supported business stakeholders by developing actionable insights and presenting Power BI dashboards highlighting operational and sustainability KPIs.

JPMorgan Chase, New York, NY Jan 2023 – June 2025

Senior AWS Data Engineer

Designed and maintained enterprise-scale AWS Glue ETL pipelines using PySpark/Python to process 2+ TB of daily credit and loan data, ensuring accurate ingestion into Amazon Redshift.

Refactored legacy SQL logic and optimized Redshift queries & materialized views, reducing financial reporting runtimes and improving analytics performance.

Automated S3 data ingestion using Python/Boto3, implementing dynamic partitioning, lifecycle policies, and schema evolution to improve efficiency and reduce storage costs.

Built event-driven Lambda workflows triggered by S3 events and integrated with SNS/SQS to reduce manual intervention and improve pipeline responsiveness.

Implemented DataOps CI/CD pipelines using Jenkins/GitHub Actions to automate testing, packaging, and deployment of Glue, Airflow, and PySpark workloads.

Developed comprehensive data validation frameworks with PyTest for unit, integration, schema, and quality checks embedded directly into ETL flows.

Created and orchestrated complex Apache Airflow DAGs with SLA management, retries, and alerting, improving pipeline reliability and observability.

Engineered reusable Terraform modules to provision and manage AWS infrastructure including VPCs, IAM roles, Redshift clusters, Glue jobs, and S3 buckets.

Designed ML-ready datasets and feature pipelines, collaborating with data science teams and integrating with Databricks Feature Store/SageMaker Feature Store concepts.

Implemented strict IAM/KMS security policies and multi-account access strategies in alignment with JPMorgan’s enterprise data governance and compliance requirements.

Built automated lineage tracking and metadata management for Glue Data Catalog, improving transparency and auditability across data consumers.

Developed multi-step AWS Step Functions workflows with conditional branching and automated error handling to support complex ETL orchestration.

Implemented monitoring and alerting using CloudWatch logs, metrics, and SNS notifications, reducing incident response times and improving operational stability.

Containerized Python/PySpark workloads using Docker and deployed scalable data services on Kubernetes (EKS) with auto-scaling and secure secret management.

Built CI/CD automation for container image builds, vulnerability scans, and controlled rollouts to support reliable production deployments.

Optimized Glue job performance through Spark configuration tuning and custom transformation scripts, reducing overall processing time and AWS spend.

Delivered technical documentation, runbooks, and cross-team training on Glue, PySpark, and Redshift, while mentoring junior engineers to raise team capability.

Partnered with analytics, risk, and business teams to design data models supporting predictive risk scoring, operational dashboards, and real-time decisioning.

CVS Health, Woonsocket, RI Aug 2020- Dec 2022

AWS Data Engineer

Built and optimized AWS Glue and Apache Spark ETL pipelines to process multi-source healthcare claims, patient, and prescription data at scale, supporting analytics and regulatory reporting.

Improved Redshift performance through workload optimization, query rewrites, distribution/sort key tuning, and cluster configuration adjustments, accelerating dashboard refreshes and analytics turnaround.

Designed secure, HIPAA-compliant data pipelines using IAM roles, KMS encryption, VPC endpoints, and fine-grained access controls to safeguard PHI data.

Automated ingestion from on-premise clinical systems and SFTP sources into S3 using Python, enabling timely and reliable data availability for enterprise analytics.

Developed robust Python-based validation frameworks to detect anomalies, schema drift, missing data, and quality issues, significantly improving pipeline accuracy.

Engineered complex SQL transformations, cleansing processes, and aggregations in Redshift to support claims adjudication, quality-of-care analytics, and operational reporting.

Designed and maintained scalable API ingestion pipelines for REST/JSON, XML, and streaming feeds, with Python clients handling pagination, rate limits, OAuth2, and token-based authentication.

Built Lambda-driven event-based workflows to reduce ingestion latency and enable near real-time processing for time-sensitive healthcare data.

Delivered business dashboards using Amazon QuickSight to visualize claims status, utilization metrics, patient outcomes, and operational KPIs.

Implemented centralized monitoring using CloudWatch, Prometheus, and Grafana, with distributed logging and alerting systems that reduced MTTD/MTTR for pipeline failures.

Created Glue Data Catalog definitions and schema versioning processes, improving metadata governance and dataset discoverability across analytics teams.

Orchestrated multi-step ETL workflows using AWS Step Functions, implementing retries, branching logic, and error handling for improved reliability.

Designed S3 lifecycle policies and archiving solutions to reduce storage costs while maintaining audit-ready historical data retention.

Developed reusable SQL, Python, and Glue components to standardize common ETL patterns and accelerate onboarding for new data pipelines.

Participated in cloud migration efforts, modernizing legacy ETL workloads to AWS-native services for improved scalability and maintainability.

Performed root cause analysis and implemented automated corrective actions for failed data jobs, significantly improving operational stability.

Collaborated with analysts, data scientists, clinical teams, and compliance officers to translate regulatory and business requirements into scalable data models and reporting workflows.

Mentored junior engineers in PySpark, Glue best practices, data security principles, and overall cloud data engineering methodologies.

BrightMart Retail Solutions, Austin, TX Sep 2018-Jul 2020

Azure Data Engineer/Big Data

Designed and implemented scalable Spark and PySpark ETL pipelines processing 1.5+ TB daily of retail sales, inventory, and customer data to support enterprise analytics and forecasting.

Built optimized SQL queries, views, and stored procedures in Azure Synapse Analytics, significantly reducing query response times for merchandising and supply chain reporting.

Developed end-to-end ingestion workflows using Azure Data Factory, integrating on-premise, cloud, API, and streaming sources into ADLS Gen2 with near real-time freshness.

Designed and optimized Delta Lake / Apache Iceberg tables with partitioning, Z-ordering, and schema evolution to support high-performance analytical workloads.

Implemented ACID-compliant incremental ingestion frameworks using Delta/Iceberg, enhancing reliability for ML pipelines and downstream reporting systems.

Automated compaction, vacuuming, and metadata optimization workflows to reduce storage costs and improve lakehouse query latency.

Built clean, ML-ready datasets and collaborated with data scientists on feature engineering for customer churn, demand prediction, and forecasting models.

Implemented robust data quality frameworks with validation rules, anomaly detection, and reconciliation checks, reducing data errors by 30%.

Developed Azure Functions and Python-based orchestration scripts for workflow automation, monitoring, alerting, and pipeline health visibility.

Leveraged Azure Monitor and Log Analytics to track pipeline performance, optimize compute usage, and proactively identify bottlenecks.

Led migration of legacy batch jobs to Azure Event Hubs + Spark Structured Streaming, reducing data latency from hours to minutes.

Designed dimensional models and star schemas in Azure Synapse to support BI teams using Power BI for advanced retail analytics.

Created reusable PySpark modules and transformation frameworks to accelerate ETL development across multiple engineering teams.

Implemented secure access patterns with Azure AD, RBAC, and data masking to protect sensitive customer data and maintain GDPR compliance.

Built partitioned, indexed tables in Azure SQL Data Warehouse to optimize storage and speed up analytical workloads.

Automated deployment of ADF pipelines and Spark workloads using Azure DevOps CI/CD, reducing deployment errors and improving release cadence.

Performed root-cause analysis for failed workflows and implemented improved retry, error-handling, and documentation processes to reduce downtime.

Mentored junior engineers on Spark optimization, Azure best practices, and SQL standards, strengthening overall team productivity and technical capability.

EDUCATION: Master’s in Computer Science from Campbellsville University.



Contact this candidate