Goutham K
Email: ************@*****.***
Mobile: 937-***-****
LinkedIn: https://www.linkedin.com/in/goutam2001/
Senior Data Engineer
PROFESSIONAL SUMMARY
Data Engineer with 5 + years of experience designing and implementing scalable data pipelines, ETL/ELT workflows, and data warehousing solutions that support business intelligence and analytics initiatives.
Proven expertise in building and optimizing large-scale cloud data platforms (AWS, GCP, Azure) and integrating both relational and NoSQL systems to handle high-volume, high-velocity datasets.
Skilled in data modelling, data architecture, and metadata governance, ensuring data integrity, lineage tracking, and quality across complex organizational data ecosystems.
Strong proficiency in programming (Python, SQL, Scala) and big data frameworks (Spark, Kafka) for real-time stream processing and batch analytics, accelerating decision-making and insight generation.
Facilitated seamless collaboration by leveraging excellent written oral communication skills, enhancing team synergy.
Implemented passion automation to streamline workflows, boosting efficiency and reducing operational costs by 20%.
Led continual process improvement initiatives, fostering innovation and increasing overall project success rates. TECHNICAL SKILLS
Programming And Scripting Languages - Python, SQL, Scala, Java, Bash, Shell, Perl
Data Engineering And Processing - Apache Spark, PySpark, Hadoop, Hive, Kafka, Flink, Beam, Airflow, Luigi, ETL/ELT Pipelines, Metadata-driven Frameworks
Cloud Platforms - Amazon Web Services (S3, Redshift, Glue, Lambda, EMR, Kinesis) Microsoft Azure (Data Factory, Synapse, Databricks, Event Hub, ADLS Gen2, Microsoft Fabric) Google Cloud Platform (BigQuery, Dataflow, Pub/Sub, Dataproc, Cloud Composer)
Data Warehousing And Storage - Snowflake, Azure Synapse, Amazon Redshift, Google BigQuery, Delta Lake, Teradata, PostgreSQL, MySQL, Oracle, MongoDB, DynamoDB, Oracle Exadata
Data Modeling And Architecture - Dimensional Modeling, Star & Snowflake Schemas, Data Lakehouse Design, Schema Evolution, Partitioning & Clustering, Data Governance
Orchestration And Automation - Apache Airflow, Cloud Composer, Azure Data Factory, Jenkins, GitHub Actions, CI/CD for Data Pipelines
Devops And Infrastructure - Docker, Kubernetes, Terraform, CloudFormation, Monitoring (Prometheus, CloudWatch, Azure Monitor), CI/CD, Linux, Unix
Data Governance And Quality - Azure Purview, AWS Lake Formation, GCP Data Catalog, Great Expectations, Deequ, Data Lineage, Data Validation
Analytics And Bi Tools - Power BI, Tableau, Looker, Databricks SQL Analytics, QuickSight
Version Control And Collaboration - Git, GitHub, GitLab, Bitbucket, Jira, Confluence, Agile/Scrum
Data Integration Tools - Informatica
PROFESSIONAL EXPERIENCE
Chase July 2024 – Present
Senior Data Engineer
Architected and deployed data lake and warehouse solutions on AWS (S3, Redshift, Glue) for consumer banking analytics, enabling trusted high-volume transaction data access and reducing time-to-insight by 35%.
Designed and implemented real-time streaming ingestion pipelines using Kinesis Data Streams, Lambda and Step Functions, supporting fraud detection, payments and credit card processing with sub-second latency.
Developed and maintained ETL/ELT workflows using Spark on EMR, Glue jobs and Python/Java, transforming and curating multi-terabyte datasets for risk modeling and regulatory reporting across multiple business lines.
Built production-ready data modelling and metadata frameworks, creating star/snowflake schemas, partitioning strategies and data-lineage documentation to ensure audit readiness and support enterprise reporting.
Developed and optimized Shell and Perl scripts to automate data processing workflows, enhancing efficiency and reducing manual intervention by 40%.
Implemented Informatica and Oracle Exadata solutions to streamline data integration processes, resulting in a 25% increase in data processing speed.
CVS Health May 2023 – June 2024
Data Engineer
Designed and delivered high-throughput data-ingestion pipelines on Google Cloud Dataflow and Apache Beam for healthcare claims and member data, integrating with BigQuery and Pub/Sub, enabling near-real-time analytics across 150 + million records.
Built microservice-based data-platform components using Python and Java on GCP, including containerized services (GKE) and managed workflow orchestration (Cloud Composer), to support modular analytics and self- serve data products.
Developed schema design and partitioning strategies in BigQuery (clustering, partitioning, query-tuning) to optimize performance and cost for large-scale data models supporting clinical insights, member engagement, and risk stratification.
Implemented robust data-governance, security and compliance frameworks leveraging IAM roles, KMS encryption, data-lineage tracking and HL7/FHIR-aware pipelines, ensuring adherence to HIPAA-regulated analytics in a sensitive health-services environment.
Utilized Linux and Unix environments to manage file systems, mount types, and permissions, ensuring robust security and operational reliability.
Led Agile methodology initiatives to improve backend focus processes, achieving a 30% reduction in project delivery time.
Bank Of America March 2021 – December 2022
Data Engineer
Built analytical data models and dashboards in Azure Synapse Analytics and Power BI, leveraging DAX expressions and materialized views to deliver high-performance business insights for credit risk, liquidity, and financial forecasting.
Architected and developed scalable data pipelines in Azure Data Factory (ADF) and Databricks (PySpark, SQL) for ingestion, transformation, and integration of multi-terabyte financial data, improving data availability and reducing load time by 40%.
Designed and optimized lakehouse architectures on Azure Data Lake Storage Gen2 with Delta Lake, applying partitioning, clustering, and schema evolution to manage both structured and unstructured datasets efficiently.
Implemented data governance and compliance frameworks using Azure Purview, RBAC, Key Vault, and encryption (KMS), ensuring adherence to SOX, FFIEC, and GDPR standards across all financial data systems.
Designed and maintained toolsets and scripts for seamless data flows, leveraging standard tools and pipes to enhance data accuracy and consistency.
Demonstrated excellent written and oral communication skills, facilitating cross-functional collaboration and ensuring clear and effective information dissemination. Groww July 2020 – February 2021
Data Engineer
Led the modernization of Groww’s investment analytics platform by engineering distributed data pipelines in Azure Data Factory (ADF) and Databricks (PySpark, SQL), consolidating data from trading, KYC, and payment systems into a unified analytics layer.
Devised a dynamic data lakehouse ecosystem on Azure Data Lake Storage Gen2 and Delta Lake, incorporating schema evolution, hierarchical partitioning, and time-travel features to support intraday trade analysis and performance reporting.
Optimized data warehousing and query performance in Azure Synapse Analytics, designing fact-dimension models, materialized views, and workload management strategies that accelerated reporting workloads by 60%.
Engineered self-service BI solutions through Power BI and Synapse Serverless Pools, enabling real-time visibility into investor activity, mutual fund flows, and market sentiment metrics across multiple business units.
Exhibited passion for automation and continual process improvement, driving a 20% increase in operational efficiency through innovative solutions.
EDUCATION
Masters in Computer Science - Wright State University
Bachelor's in Electronics and communication Engineering - MLR Institute of Technology