Senior Data Engineer with Cloud Data Platforms

Location:

Fairborn, OH

Salary:

80000

Posted:

February 27, 2026

Contact this candidate

Resume:

Goutham K

Email: ************@*****.***

Mobile: 937-***-****

LinkedIn: https://www.linkedin.com/in/goutam2001/

Senior Data Engineer

PROFESSIONAL SUMMARY

Data Engineer with 5 + years of experience designing and implementing scalable data pipelines, ETL/ELT workflows, and data warehousing solutions that support business intelligence and analytics initiatives.

Proven expertise in building and optimizing large-scale cloud data platforms (AWS, GCP, Azure) and integrating both relational and NoSQL systems to handle high-volume, high-velocity datasets.

Skilled in data modelling, data architecture, and metadata governance, ensuring data integrity, lineage tracking, and quality across complex organizational data ecosystems.

Strong proficiency in programming (Python, SQL, Scala) and big data frameworks (Spark, Kafka) for real-time stream processing and batch analytics, accelerating decision-making and insight generation.

Guided teams to achieve project milestones, enhancing collaboration and boosting productivity by 25%.

Mentored junior staff, fostering leadership skills and improving overall team performance and morale.

Collaborated with cross-functional teams, resolving conflicts and streamlining processes for efficient outcomes. TECHNICAL SKILLS

Programming And Scripting Languages - Python, SQL, Scala, Java, Bash, PL-SQL

Data Engineering And Processing - Apache Spark, PySpark, Hadoop, Hive, Kafka, Flink, Beam, Airflow, Luigi, ETL/ELT Pipelines, Metadata-driven Frameworks, ETL tools, ETL automation

Cloud Platforms - Amazon Web Services (S3, Redshift, Glue, Lambda, EMR, Kinesis) Microsoft Azure (Data Factory, Synapse, Databricks, Event Hub, ADLS Gen2, Microsoft Fabric) Google Cloud Platform (BigQuery, Dataflow, Pub/Sub, Dataproc, Cloud Composer), OpenShift

Data Warehousing And Storage - Snowflake, Azure Synapse, Amazon Redshift, Google BigQuery, Delta Lake, Teradata, PostgreSQL, MySQL, Oracle, MongoDB, DynamoDB

Data Modeling And Architecture - Dimensional Modeling, Star & Snowflake Schemas, Data Lakehouse Design, Schema Evolution, Partitioning & Clustering, Data Governance, large-scale architecture initiatives, enterprise rollouts

Orchestration And Automation - Apache Airflow, Cloud Composer, Azure Data Factory, Jenkins, GitHub Actions, CI/CD for Data Pipelines

Devops And Infrastructure - Docker, Kubernetes, Terraform, CloudFormation, Monitoring (Prometheus, CloudWatch, Azure Monitor), CI/CD, containers, containerized deployments

Data Governance And Quality - Azure Purview, AWS Lake Formation, GCP Data Catalog, Great Expectations, Deequ, Data Lineage, Data Validation

Analytics And Bi Tools - Power BI, Tableau, Looker, Databricks SQL Analytics, QuickSight, Alteryx, RapidMiner, Tableau Prep

Version Control And Collaboration - Git, GitHub, GitLab, Bitbucket, Jira, Confluence, Agile/Scrum

Ci Cd Management - CI/CD pipelines, CI/CD automation pipeline management

Technical Support And Troubleshooting - troubleshoot, resolve performance issues PROFESSIONAL EXPERIENCE

Chase July 2024 – Present

Senior Data Engineer

Architected and deployed data lake and warehouse solutions on AWS (S3, Redshift, Glue) for consumer banking analytics, enabling trusted high-volume transaction data access and reducing time-to-insight by 35%.

Designed and implemented real-time streaming ingestion pipelines using Kinesis Data Streams, Lambda and Step Functions, supporting fraud detection, payments and credit card processing with sub-second latency.

Developed and maintained ETL/ELT workflows using Spark on EMR, Glue jobs and Python/Java, transforming and curating multi-terabyte datasets for risk modeling and regulatory reporting across multiple business lines.

Built production-ready data modelling and metadata frameworks, creating star/snowflake schemas, partitioning strategies and data-lineage documentation to ensure audit readiness and support enterprise reporting.

Streamlined data preparation and workflow automation, enhancing data processing efficiency by 40% and reducing manual intervention by implementing robust automation frameworks.

Led architecture design and data processing automation initiatives, resulting in a 50% improvement in processing speed and scalability across enterprise systems.

Implemented performance optimization and code quality practices, boosting application reliability by 35% and reducing error rates significantly.

Enhanced system scalability and reliability through strategic architecture enhancements, supporting a 300% increase in user load without performance degradation. CVS Health May 2023 – June 2024

Data Engineer

Designed and delivered high-throughput data-ingestion pipelines on Google Cloud Dataflow and Apache Beam for healthcare claims and member data, integrating with BigQuery and Pub/Sub, enabling near-real-time analytics across 150 + million records.

Built microservice-based data-platform components using Python and Java on GCP, including containerized services (GKE) and managed workflow orchestration (Cloud Composer), to support modular analytics and self- serve data products.

Developed schema design and partitioning strategies in BigQuery (clustering, partitioning, query-tuning) to optimize performance and cost for large-scale data models supporting clinical insights, member engagement, and risk stratification.

Implemented robust data-governance, security and compliance frameworks leveraging IAM roles, KMS encryption, data-lineage tracking and HL7/FHIR-aware pipelines, ensuring adherence to HIPAA-regulated analytics in a sensitive health-services environment.

Developed Operational Insights dashboards, providing real-time analytics that improved decision-making efficiency by 25% across departments.

Demonstrated leadership skills by mentoring and guiding teams, fostering a collaborative environment that increased project delivery speed by 20%.

Collaborated with cross-functional teams using PL-SQL and Alteryx to reduce data processing time by 30% through optimized query execution.

Automated complex data workflows with RapidMiner and Tableau Prep, cutting data preparation time by 50% and enhancing data accuracy.

Bank Of America March 2021 – December 2022

Data Engineer

Built analytical data models and dashboards in Azure Synapse Analytics and Power BI, leveraging DAX expressions and materialized views to deliver high-performance business insights for credit risk, liquidity, and financial forecasting.

Architected and developed scalable data pipelines in Azure Data Factory (ADF) and Databricks (PySpark, SQL) for ingestion, transformation, and integration of multi-terabyte financial data, improving data availability and reducing load time by 40%.

Designed and optimized lakehouse architectures on Azure Data Lake Storage Gen2 with Delta Lake, applying partitioning, clustering, and schema evolution to manage both structured and unstructured datasets efficiently.

Implemented data governance and compliance frameworks using Azure Purview, RBAC, Key Vault, and encryption (KMS), ensuring adherence to SOX, FFIEC, and GDPR standards across all financial data systems.

Deployed applications on OpenShift, leveraging containers and containerized deployments to ensure seamless scalability and reduced deployment times by 40%.

Utilized ETL tools and ETL automation to streamline data integration processes, reducing data transfer errors by 60% and improving data reliability.

Managed CI/CD pipelines and CI/CD automation pipeline management, decreasing deployment cycles by 35% and increasing release frequency.

Led large-scale architecture initiatives and enterprise rollouts, successfully implementing solutions that supported business growth and innovation.

Groww July 2020 – February 2021

Data Engineer

Led the modernization of Groww’s investment analytics platform by engineering distributed data pipelines in Azure Data Factory (ADF) and Databricks (PySpark, SQL), consolidating data from trading, KYC, and payment systems into a unified analytics layer.

Devised a dynamic data lakehouse ecosystem on Azure Data Lake Storage Gen2 and Delta Lake, incorporating schema evolution, hierarchical partitioning, and time-travel features to support intraday trade analysis and performance reporting.

Optimized data warehousing and query performance in Azure Synapse Analytics, designing fact-dimension models, materialized views, and workload management strategies that accelerated reporting workloads by 60%.

Engineered self-service BI solutions through Power BI and Synapse Serverless Pools, enabling real-time visibility into investor activity, mutual fund flows, and market sentiment metrics across multiple business units.

Optimized shared services environment and enterprise-level governance, ensuring compliance and enhancing operational efficiency by 30%.

Improved task dependency tuning and scheduling, reducing project completion times by 20% and enhancing resource allocation efficiency.

Troubleshot and resolved performance issues, improving application response times by 40% and enhancing user satisfaction.

Led scrum teams to deliver high-quality software solutions, achieving a 25% increase in project delivery success rates.

EDUCATION

Masters in Computer Science - Wright State University

Bachelor's in Electronics and communication Engineering - MLR Institute of Technology

Contact this candidate