Data Engineer - Azure/Snowflake Cloud Data Architect

Location:

India

Salary:

70000-80000

Posted:

December 01, 2025

Contact this candidate

Resume:

Vamshi D

Dallas, TX, USA. +1-469-***-**** **********@*****.***

Summary

Data Engineer with 4 years of experience designing, building, and optimizing scalable, production-ready data pipelines, data models, and cloud architectures across healthcare, financial, and telecom domains. Proficient in Python, SQL, Airflow, and Snowflake, with hands-on expertise in AWS and Azure ecosystems for ETL/ELT workflows and orchestration. Skilled in Kimball data modeling (star/snowflake schemas) and delivering cost-efficient, reliable, and compliant data solutions. Experienced in developing self-service analytics and BI dashboards (Power BI, Tableau) that enable stakeholders to make data-driven decisions. Recognized for collaborating with cross-functional teams, solving complex business problems, and quickly adapting to modern data technologies and cloud-native platforms.

Technical Skills

•Programming Language: Scala, Python, R, SQL, C

•IDE's: PyCharm, Jupyter Notebook

•Big Data Ecosystem: Hadoop, MapReduce, Hive, Pig, HDFS, Spark, Kafka, PySpark, Apache Airflow, Zookeeper, Apache Flink

•Visualization & BI Tools: Power BI, Tableau, Amazon QuickSight

•Machine Learning: Linear Regression, Logistic Regression, Decision Tree, SVM, K mean, Random Forest

•Cloud Technologies: AWS (S3, EMR, EC2, Glue, Lambda, SDK, DynamoDB, Elasticsearch, QuickSight, Kinesis, Athena, VPC, Redshift), Docker, Azure (Data Lake, Data Factory, Databricks, Logic Apps, HDInsight, Synapse Analytics, Stream Analytics)

•Packages & Data Processing: NumPy, Matplotlib, Seaborn, TensorFlow, Plotly, PySpark, Data Pipelines, Jenkins

•Version Control, Database & Data Engineering: GitHub, Git, SQL Server, PostgreSQL, MongoDB, MySQL, Snowflake, Data Engineering

•Operating Systems: Windows, MacOS, Linux

Experience

Progressive Jul 2024-Present

Data Engineer Mayfield, OH

•Utilized Azure Databricks and Spark to process large-scale financial and policy data, optimizing both batch and near real-time data pipelines for enterprise reporting and regulatory compliance.

•Designed reusable, production-ready ETL pipelines in Azure Data Factory and AWS Glue, automating claims and customer data ingestion and reducing manual effort by 10%

•Designed and implemented Snowflake data warehouses with star/snowflake schemas to support high-performance querying, risk analytics, and regulatory reporting.

•Developed Spark applications with in-memory processing in Databricks, reducing transformation time by 40% and ensuring reliable, observable software in a cloud environment.

•Authored and tuned SQL queries to improve response times by 30%, enabling faster insights into customer accounts, claims, and financial transactions.

•Applied Python for data validation, orchestration, and automation, leveraging OOP principles along with unit and integration testing for data quality assurance.

•Engineered financial and insurance data pipelines focused on quality compliance and issue resolution, incorporating automation best practices, DevOps principles, and Step Functions for orchestration.

•Collaborated with business stakeholders to build dashboards for credit risk, loan default prediction, claims analysis, and profitability tracking, ensuring clear acceptance criteria and Agile participation.

•Integrated data from core insurance systems, payment platforms, and CRM applications into unified models, including structured and semi-structured data sources (APIs, JSON, XML, CSV, unstructured formats).

•Developed Snowflake role-based access controls (RBAC) and enforced governance/security standards to support regulatory and research reporting.

Verizon Aug 2021 - Jul 2023

Data Engineer Hyderabad, India

• Designed and optimized scalable data pipelines using AWS S3, EMR, and Glue to process large volumes of telecom usage, customer, and billing data in both batch and streaming modes.

•Developed and maintained robust ETL workflows with Python and SQL, loading data into Amazon Redshift and Snowflake to support enterprise reporting and analytics.

•Consolidated data from multiple sources including MySQL, PostgreSQL, HBase, and APIs (JSON/XML/CSV) into centralized repositories, ensuring accuracy and consistency for cross-departmental analytics.

•Built and managed real-time streaming pipelines using Kafka and Flume, enabling continuous monitoring of network performance and customer interactions.

•Leveraged Hadoop, EMR, and Spark for batch processing and large-scale analytics, supporting insights into churn, network reliability, and operational performance.

•Oversaw AWS infrastructure with CloudFormation, maintaining high availability and scalability of data storage and processing systems.

•Streamlined CI/CD pipelines using Git, Maven, Jenkins, and AWS CodePipeline, aligning with DevOps principles to reduce deployment efforts.

•Scheduled, monitored, and debugged ETL workflows using Apache Airflow and Step Functions, maintaining reliability, accuracy, and cost-efficient pipeline performance.

•Performed data profiling, governance, and validation checks to improve accuracy, reliability, and compliance of enterprise datasets.

Molina Healthcare Nov 2020 - Jul 2021

Data Analyst India

•Migrated legacy SQL Server and MySQL claims databases to Snowflake and PostgreSQL, improving query performance by 40% and accelerating claims reporting.

•Performed data conversion and validation on healthcare claims data, ensuring compliance with audit requirements and healthcare data standards (HIPAA).

•Conducted ETL transformations in Python (Pandas, NumPy) to preprocess claims and patient data, improving accuracy in eligibility and segmentation analysis by 18%.

•Built and automated ETL pipelines using AWS Glue and Lambda, reducing manual claims processing time and enabling scalable ingestion from multiple provider systems.

•Created optimized SQL queries and views to streamline reporting workflows, improving efficiency in regulatory and internal analytics use cases.

•Developed interactive Tableau dashboards connected to Snowflake and PostgreSQL, providing executives with real-time visibility into KPIs such as claims cost, provider utilization, and member demographics.

•Implemented data validation scripts to ensure quality and consistency of patient and claims data across multiple systems.

•Collaborated with business analysts, compliance officers, and data science teams to document data models, mapping requirements, and workflows for audits and onboarding

Education

Eastern Illinois University, IL, USA

Master's, computer technology

Sreenidhi institute of science and Technology, Telangana, India

Bachelors, Electronics and communication Engineering

Contact this candidate