Post Job Free
Sign in

Data Engineer Solutions

Location:
Little Elm, TX, 75068
Salary:
90000
Posted:
September 10, 2025

Contact this candidate

Resume:

GOWTHAM TEJA

DATA ENGINEER

Phone: 615-***-**** Email Id: *************@*****.***

Professional Summary

Data engineer with 6+ years of experience in designing, implementing, and optimizing data solutions to support business objectives. Proficient in leveraging a diverse set of technologies and tools to build scalable and efficient data pipelines, enabling data-driven decision-making and business insights. Skilled in managing and processing large volumes of structured and unstructured data, with a strong focus on data quality, reliability, and performance. Demonstrated expertise in cloud platforms such as Azure, AWS, and Snowflake, Informatica as well as proficiency in technologies like Python (NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, Seaborn), SQL, Apache Airflow, Azure Synapse and Scala. Proven ability to collaborate effectively with cross-functional teams to deliver impactful data solutions and drive organizational success. Education

Masters in Information Systems Jan 2021 – May 2022 University of Memphis information system

Work Experience

Senior Data Engineer May 2024 - Current

Medica Insurance Company, MN

• Designed and implemented real-time ingestion pipelines in Snowflake leveraging Snowpipe to load claims, eligibility, and provider data, enabling near real-time analytics for healthcare operations.

• Built data transformation pipelines using Snowpark (Python & Scala APIs) to process semi-structured healthcare data

(JSON/Parquet), enabling ML-ready datasets inside Snowflake without external ETL tools.

• Leveraged Snowpark DataFrame APIs to implement complex business logic (eligibility rollups, renewal month calculations) directly in Snowflake, reducing dependency on external scripting.

• Optimized large-scale queries by implementing automatic clustering on high-cardinality columns (e.g., GRP_ID, MBR_ID), improving scan performance on multi-terabyte healthcare fact tables.

• Applied micro-partitioning strategies to minimize data skew and optimize pruning, resulting in significantly reduced query execution times.

• Tuned warehouse configurations (scaling policies, auto-suspend, multi-cluster settings) to balance performance with cost efficiency during peak claim processing.

• Developed dynamic tables in Snowflake to replace traditional stored procedures, simplifying transformation logic and improving data load times and pipeline efficiency.

• Automated batch orchestration and monitoring using Automic Web Interface (AWI), ensuring SLA compliance and reducing manual interventions for production healthcare data loads.;

• Managed incident tracking and workflow tickets using Apptio as a ticketing tool, streamlining issue resolution and improving cross-team communication.

• Developed shell scripts for file processing, log monitoring, and error handling across Linux environments, enhancing operational reliability.

• Used Azure DevOps to build CI/CD pipelines for ETL job deployments, version control, and automated environment promotions in compliance with healthcare governance.

• Migrated legacy Oracle databases into SQL Server (SSMS) using Informatica IDMC (IICS), creating a robust healthcare data mart to support actuarial, underwriting, and financial reporting requirements.

• Designed and built ETL workflows for claims, eligibility, and group data, applying rigorous data validation and transformation logic to ensure high data quality and regulatory compliance.

• Integrated the SQL Server data mart with Optum’s StepWise application, enabling automated underwriting, rating, and quoting for health insurance products, significantly reducing manual processing.

• Collaborated with underwriting and actuarial teams to ensure business-ready data feeds, accelerating decision-making and reducing quote turnaround time.

• Developed parameterized, reusable Informatica mappings to streamline ongoing migrations and reduce development effort by ~30%.

• Delivered optimized SQL queries, stored procedures, and star-schema models in SSMS to enhance StepWise reporting performance and support downstream analytics.

Data Engineer Jul 2022- Apr 2024

Change Healthcare, TN

• Interacted and gathered requirements for Designing and developing common architecture for storing Retail data within Enterprise and building Data Lake in Azure cloud.

• Developed Spark applications for data extraction, transformation and aggregation from multiple systems and stored on Azure Data Lake Storage using Azure Databricks notebooks.

• Developed applications using PySpark to integrate data coming from other sources like ftp, csv files processed using Azure Databricks and written into Snowflake.

• Written Unzip and decode functions using Spark with Scala and parsing the xml files into Azure blog storage.

• Developed PySpark scripts from source system like Azure Event Hub to ingest data in reload, append, and merge mode into Delta tables in Databricks.

• Design and Develop ETL Processes in DataBricks to migrate Campaign data from external sources like Azure DataLake, and gen2 in ORC/Parquet/Text Files.

• Optimized PySpark applications on Databricks, which yielded a significant amount of cost reduction.

• Created Pipelines in Azure Data Factory to copy parquet files from ADLS Gen2 location to Azure Synapse Analytics Data Warehouse.

• Worked on replacing existing Hive scripts with Spark Data-Frame transformation and actions for faster analysis of the data.

• Developed PySpark scripts to Reduce costs of organization by 30% by migrating customers data in SQL Server to Hadoop.

• Experience in handling JSON datasets and writing custom Python functions to parse through JSON data using Spark.

• Used Spark for interactive queries, processing of streaming data, and integration with popular NoSQL databases for huge volumes of data.

• Responsible for loading Data pipelines from web servers using Kafka and Spark Streaming API

• Used Airflow for orchestration and scheduling of the ingestion scripts.

• Generate weekly based reports and ops reports, customer goals reports, mobile scan and pay goals and usage in data by using power BI.

Data Engineer Aug 2018 - Dec 2020

Kelly Maxson, India

• Worked as a Data Engineer with Big Data and Hadoop ecosystem components.

• Involved in converting Hive/SQL queries into Spark transformations using Scala.

• Created Spark data frames using Spark SQL and prepared data for data analytics by storing it in AWS S3.

• Responsible for loading data from Kafka into HBase using REST API.

• Developed batch scripts to fetch the data from AWS S3 storage and perform required transformations in Scala using the Spark framework.

• Used Spark streaming APIs to perform transformations and actions on the fly for building a common learner data model which gets the data from Kafka in near real-time and persists it to the HBase.

• Created Sqoop scripts to import and export customer profile data from RDBMS to S3 buckets.

• Developed various enrichment applications in Spark using Scala for cleansing and enrichment of clickstream data with customer profile lookups.

• Troubleshooting Spark applications for improved error tolerance and reliability.

• Used Spark Data frame and Spark API to implement batch processing of Jobs.

• Well-versed with Pandas data frames and Spark data frames.

• Used Apache Kafka and Spark Streaming to get the data from adobe live stream rest API connections.

• Automated creation and termination of AWS EMR clusters.

• Worked in real-time data streaming data using AWS Kinesis, EMR, and AWS Glue.

• Worked on fine-tuning and performance enhancements of various spark applications and hive scripts.

• Used various concepts in spark like broadcast variables, caching, and dynamic allocation to design more scalable spark applications.

• Imported Hundreds of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.

• Created data partitions on large data sets in S3 and DDL on partitioned data.

• Improving Efficiency by modifying existing Data pipelines on Matilion ETL tool to load the data into AWS Redshift.

• Identify source systems, their connectivity, related tables, and fields and ensure data suitability for mapping, preparing unit test cases, and providing support to the testing team to fix defects.

• Defined HBase tables to store various data formats of incoming data from different portfolios.

• Developed the verification and control process for daily data loading. Jr. Data Engineer June 2017 – May 2018

Hetero Drugs, India

• Established CI/CD pipelines for data projects using AWS DevOps, reducing release time by 40% and doubling deployments.

• Collaborated with cross-functional teams to craft efficient data solutions, achieving a 95% satisfaction rate.

• Implemented data cataloging and metadata management with AWS Glue and Lake Formation, cutting data duplication by 80%.

• Worked on Maintaining Gitlab repositories for the version control of Glue scripts, ETL scripts and proficient in Agile and Waterfall methodologies.

• Orchestrated data workflows with Apache Airflow in AWS, reducing pipeline execution time by 30%.

• Developed end-to-end data analytics framework utilizing Amazon Redshift, Glue and Lambda enabling business to obtain KPIs faster with reduced costs.

• Worked closely with business analysts and stakeholders to gather report requirements and ensure delivery of accurate and timely reports.

• Developed SQL scripts for 20+ data extraction processes, reducing manual efforts by 80% and improving data accuracy. Technical Skills

Programming Languages: Python, Scala, SQL, Shell Scripting. Big Data Ecosystem: HDFS, Pig, Hive, Oozie, Sqoop, Kafka, Spark Streaming, Spark SQL, Dataframe. Cloud Ecosystem: Snowflake, Azure (Azure Data Factory, Azure Data Bricks, Azure Synapse, ADLS Gen2), AWS (EC2, EMR, Lambda, Kinesis, Glue, Redshift and S3).

Databases: MySQL Server, Oracle DB, HiveQL, Spark SQL, HBase, Redshift, Snowflake. Orchestration/Tools: Informatica, Automic Web Interface, Airflow, Oozie, Jira, Git, Matilion, Postman, Power BI, Docker, Jenkins.

Streaming: Spark Streaming, Kafka.

IDE: DataBricks, Pycharm, Anaconda, IntelliJ.



Contact this candidate