GOWTHAM TEJA
DATA ENGINEER
Phone: +1-615-***-**** Email Id: *************@*****.*** Linkedin
Professional Summary
Data engineer with 6+ years of experience in designing, implementing, and optimizing data solutions to support business objectives. Proficient in leveraging a diverse set of technologies and tools to build scalable and efficient data pipelines, enabling data-driven decision-making and business insights. Skilled in managing and processing large volumes of structured and unstructured data, with a strong focus on data quality, reliability, and performance. Demonstrated expertise in cloud platforms such as Azure, AWS, and Snowflake, Informatica as well as proficiency in technologies like Python (NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, Seaborn), SQL, Apache Airflow, Azure Synapse and Scala. Proven ability to collaborate effectively with cross-functional teams to deliver impactful data solutions and drive organizational success. Education
Masters in Information Systems Jan 2021 – May 2022 University of Memphis
Work Experience
Senior Data Engineer May 2024 - Current
Medica Insurance Company, MN
• Developed a cloud-native healthcare data lakehouse using Azure Data Lake Storage Gen2, Blob Storage, and Delta Lake, integrating data from claims, eligibility, provider, and EHR systems along with legacy Oracle sources.
• Developed scalable PySpark transformation frameworks in Azure Databricks to cleanse, normalize, and enrich multi- terabyte healthcare datasets for actuarial, clinical, and operational analytics.
• Implemented Delta Lake for ACID transactions, schema evolution, and incremental CDC ingestion, ensuring historical accuracy and auditability of healthcare data.
• Integrated Azure Event Hubs for real-time ingestion of claim transactions and member activity, enabling near real-time analytics and operational monitoring.
• Designed metadata-driven orchestration pipelines using Azure Data Factory (ADF) for batch ingestion and dependency management across multiple data domains.
• Implemented Apache Airflow for complex, cross-platform orchestration, managing event-driven workflows and Databricks job scheduling, complementing ADF’s batch orchestration.
• Established a Bronze–Silver–Gold Delta architecture in Databricks to manage raw, curated, and analytics-ready layers, improving data quality and usability.
• Optimized Spark performance using Adaptive Query Execution (AQE), Dynamic Partition Pruning, broadcast joins, and autoscaling clusters, cutting compute costs by over 40%.
• Integrated Azure Synapse Analytics for federated querying and analytics, leveraging serverless SQL pools for cost-effective ad hoc reporting and financial modeling.
• Automated CI/CD pipelines for Databricks notebooks, ADF workflows, and Synapse artifacts using Azure DevOps, YAML, and ARM templates, ensuring consistent deployments across environments.
• Configured Azure Key Vault, Managed Identities, and RBAC policies for end-to-end data security, ensuring compliance with HIPAA and PHI governance standards.
• Established monitoring and observability frameworks using Azure Monitor, Log Analytics, and Databricks audit logs, ensuring proactive alerting and SLA compliance.
• Collaborated with data scientists and analysts to operationalize healthcare insights, supporting predictive modeling, utilization analysis, and member experience optimization. Data Engineer Jul 2022- Apr 2024
Change Healthcare, TN
• Designed and implemented real-time ingestion pipelines in Snowflake leveraging Snowpipe to load claims, eligibility, and provider data, enabling near real-time analytics for healthcare operations.
• Built data transformation pipelines using Snowpark (Python & Scala APIs) to process semi-structured healthcare data
(JSON/Parquet), enabling ML-ready datasets inside Snowflake without external ETL tools.
• Leveraged Snowpark DataFrame APIs to implement complex business logic (eligibility rollups, renewal month calculations) directly in Snowflake, reducing dependency on external scripting.
• Optimized large-scale queries by implementing automatic clustering on high-cardinality columns (e.g., GRP_ID, MBR_ID), improving scan performance on multi-terabyte healthcare fact tables.
• Applied micro-partitioning strategies to minimize data skew and optimize pruning, resulting in significantly reduced query execution times.
• Tuned warehouse configurations (scaling policies, auto-suspend, multi-cluster settings) to balance performance with cost efficiency during peak claim processing.
• Developed dynamic tables in Snowflake to replace traditional stored procedures, simplifying transformation logic and improving data load times and pipeline efficiency.
• Automated batch orchestration and monitoring using Automic Web Interface (AWI), ensuring SLA compliance and reducing manual interventions for production healthcare data loads.
• Managed incident tracking and workflow tickets using Apptio as a ticketing tool, streamlining issue resolution and improving cross-team communication.
• Developed shell scripts for file processing, log monitoring, and error handling across Linux environments, enhancing operational reliability.
• Used Azure DevOps to build CI/CD pipelines for ETL job deployments, version control, and automated environment promotions in compliance with healthcare governance.
• Migrated legacy Oracle databases into SQL Server (SSMS) using Informatica IDMC (IICS), creating a robust healthcare data mart to support actuarial, underwriting, and financial reporting requirements.
• Designed and built ETL workflows for claims, eligibility, and group data, applying rigorous data validation and transformation logic to ensure high data quality and regulatory compliance.
• Integrated the SQL Server data mart with Optum’s StepWise application, enabling automated underwriting, rating, and quoting for health insurance products, significantly reducing manual processing.
• Collaborated with underwriting and actuarial teams to ensure business-ready data feeds, accelerating decision-making and reducing quote turnaround time.
• Developed parameterized, reusable Informatica mappings to streamline ongoing migrations and reduce development effort by ~30%.
• Delivered optimized SQL queries, stored procedures, and star-schema models in SSMS to enhance StepWise reporting performance and support downstream analytics.
Data Engineer Aug 2018 - Dec 2020
Kelly Maxson, India
• Designed and implemented large-scale data ingestion and transformation pipelines using AWS EMR (Spark, Scala) and Amazon S3 to process structured and semi-structured datasets efficiently.
• Built real-time streaming pipelines leveraging Amazon Kinesis and Apache Kafka integrated with AWS Glue for low- latency event processing and enrichment.
• Developed PySpark and Scala batch jobs on EMR, pulling raw data from S3, transforming it and writing curated datasets back to the data lake in optimized Parquet formats.
• Automated data ingestion from RDBMS and APIs into S3 using AWS Glue Jobs and Glue Crawlers, establishing an automated metadata-driven data catalog.
• Orchestrated ETL pipelines to populate Amazon Redshift using Matillion and AWS Lambda for on-demand transformations and trigger-based executions.
• Deployed and containerized Spark jobs using Docker and Amazon EKS (Elastic Kubernetes Service) for scalable, containerized execution of data workloads.
• Implemented asynchronous message processing using AWS SQS and Lambda to decouple microservices and ensure reliable, fault-tolerant data delivery.
• Automated provisioning and lifecycle management of EMR clusters using Step Functions, Lambda, and tagging strategies to optimize compute usage and cost.
• Applied S3 data partitioning, versioning and lifecycle policies to improve performance, storage management, and archival compliance.
• Set up proactive monitoring and alerting with Amazon CloudWatch, SNS and EKS metrics, ensuring visibility and end-to- end pipeline reliability.
Jr. Data Engineer June 2017 – May 2018
Hetero Drugs, India
• Established CI/CD pipelines for data projects using AWS DevOps, reducing release time by 40% and doubling deployments.
• Collaborated with cross-functional teams to craft efficient data solutions, achieving a 95% satisfaction rate.
• Implemented data cataloging and metadata management with AWS Glue and Lake Formation, cutting data duplication by 80%.
• Worked on Maintaining Gitlab repositories for the version control of Glue scripts, ETL scripts and proficient in Agile and Waterfall methodologies.
• Orchestrated data workflows with Apache Airflow in AWS, reducing pipeline execution time by 30%.
• Developed end-to-end data analytics framework utilizing Amazon Redshift, Glue and Lambda enabling business to obtain KPIs faster with reduced costs.
• Worked closely with business analysts and stakeholders to gather report requirements and ensure delivery of accurate and timely reports.
• Developed SQL scripts for 20+ data extraction processes, reducing manual efforts by 80% and improving data accuracy. Technical Skills
Programming Languages: Python, Scala, SQL, Shell Scripting. Big Data Ecosystem: HDFS, Pig, Hive, Oozie, Sqoop, Kafka, Spark Streaming, Spark SQL, Dataframe. Cloud Ecosystem: Snowflake, Azure (Azure Data Factory, Azure Data Bricks, Azure Synapse, ADLS Gen2), AWS (EC2, EMR, Lambda, Kinesis, Glue, Redshift and S3).
Databases: MySQL Server, Oracle DB, HiveQL, Spark SQL, HBase, Redshift, Snowflake. Orchestration/Tools: Informatica, Automic Web Interface, Airflow, Oozie, Jira, Git, Matilion, Postman, Power BI, Docker, Jenkins.
Streaming: Spark Streaming, Kafka.
IDE: DataBricks, Pycharm, Anaconda, IntelliJ.