Shashank R
Email: *****************@*****.*** Contact: 541-***-****
PROFESSIONAL SUMMARY
● 5+ years of hands-on experience in data engineering, delivering scalable, secure, and high-performance data solutions across financial, healthcare, and insurance domains.
● Designed and optimized 60+ end-to-end data pipelines using AWS Glue, and Azure Data Factory, processing datasets exceeding 50 million records monthly.
● Architected cloud-native data platforms on AWS, and Azure with robust implementations of Synapse, and Redshift, supporting analytics for 100+ stakeholders.
● Built real-time and batch data pipelines using Kafka, Kinesis, and Lambda, achieving ingestion rates over 150,000 events/hour with sub-minute latency.
● Implemented data governance and compliance frameworks, securing over 10 TB of PII and financial data through IAM policies, encryption, masking, and audit trails to meet HIPAA and PCI-DSS standards.
● Developed predictive models using SageMaker and Azure ML with 90%+ accuracy, integrating machine learning into production pipelines supporting risk scoring and fraud detection.
● Mentored junior engineers, implemented CI/CD workflows, and led cost optimization efforts that reduced query costs and processing time by up to 60% across enterprise environments. TECHNICAL SKILLS
Big Data Hadoop, Apache Kafka, Kinesis, PySpark, Spark, AWS EMR, GCP Dataflow, Databricks, MapReduce, Hive
Business Intelligence Tools Tableau, Power BI, AWS QuickSight, Azure Power BI Integration, GCP Looker, SSRS
Programming Languages Python, SQL, PySpark, R, Shell Scripting Containerization &
Deployment
Docker, Kubernetes (AKS, GKE, EKS), AWS Lambda, AWS CodePipeline, Azure DevOps, GCP Cloud Build, Jenkins, CI/CD Pipelines, Terraform, IaaC
Amazon Web Services (AWS) AWS Glue, AWS S3, AWS Athena, AWS Redshift, AWS Step Functions, AWS Lambda, AWS RDS, AWS CloudWatch, AWS EMR, AWS EC2, AWS VPC, AWS SageMaker, AWS IAM, AWS EKS, AWS QuickSight, AWS CodePipeline, AWS Kinesis, AWS AppFlow Databases SQL Server, Snowflake, Azure SQL Database, AWS RDS, AWS Redshift, GCP Cloud SQL, MongoDB, PostgreSQL, MySQL Microsoft Azure Azure Data Factory, Azure Synapse Analytics, Azure SQL, Azure ML, Azure Blob Storage, Azure Functions, Azure Active Directory, Azure VM, Azure VNet, Azure Databricks, Azure Key Vault, Azure DevOps ETL & Data Warehousing Snowflake, Redshift, Data Modeling, AWS Glue, Azure Data Factory, Databricks, Data Lineage, Change Data Capture
Methodologies Agile, DevOps, CI/CD, Scrum, Waterfall, ITIL PROFESSIONAL EXPERIENCE
Credit One Aug 2023 - Present
Data Engineer Las Vegas, NV
● Built and deployed 25+ Glue-based ETL pipelines to transform financial data from multiple systems into Redshift, processing over 2 TB of data weekly for compliance and analytics teams.
● Automated 10+ serverless ingestion pipelines with Lambda, Step Functions, and S3 event triggers to replace legacy scripts, reducing operational load by 100%.
● Designed 100+ Redshift tables with clustering keys and column-level masking to support secure, performant analytics for 40+ financial analysts.
● Developed real-time processing framework using Kinesis and Lambda to ingest and process 150,000+ transaction records hourly, enabling sub-minute fraud alerts.
● Created 180+ validation checkpoints across Redshift tables using SQL procedures and test harnesses, improving data accuracy scores to 99.2%.
● Reduced Redshift compute cost by 42% through query tuning, materialized views, workload isolation, and auto-suspend policies for underutilized warehouses.
● Configured IAM roles, S3 bucket policies, and VPC endpoint controls to maintain full PCI-DSS compliance across all financial datasets.
● Integrated Snowpipe to automate ingestion of semi-structured files from S3, reducing ingestion lag from 60 minutes to under 5 minutes.
● Trained two SageMaker models for credit risk classification with 93% accuracy, automating daily scoring for $50M+ in loans across three portfolios.
● Provisioned infrastructure using Terraform modules for Glue, S3, IAM, and Redshift, achieving complete IaaC coverage and enabling environment consistency.
● Developed and maintained 75+ modular DBT models for Snowflake transformations, reducing SQL duplication and enabling continuous data testing.
● Implemented column-level lineage tracking using Redshift’s information schema, Git commits, and DBT metadata to enhance traceability and audit readiness.
● Delivered 20+ data product releases via CodePipeline with CloudFormation, integrated rollback procedures, and multi-stage approvals to ensure stability.
● Diagnosed and resolved 50+ data issues across Redshift and Glue using CloudWatch, query logs, and Athena traces, reducing mean time to resolution to <4 hours.
● Designed and published Tableau and QuickSight dashboards with cross-source joins and row-level security, serving 60+ users across compliance, credit, and marketing.
● Mentored 6 junior data engineers and analysts in Redshift optimization, data governance practices, and cost controls, reducing team backlog by 30% over 2 quarters. United HealthGroup Jan 2020 - Dec 2021
Data Engineer Dallas, TX
● Built 20+ enterprise-grade ADF pipelines to ingest, transform, and stage 500+ GB of daily healthcare data from SQL Server into Azure Synapse, ensuring consistent SLA adherence.
● Consolidated and modernized 60+ legacy SSIS packages into reusable ADF dataflows, decreasing maintenance cycles by 45% and improving job visibility through pipeline-level logging.
● Modeled 120+ normalized and denormalized tables in Synapse to support provider analytics, encounter metrics, and claims audits across 5 business domains.
● Implemented scalable lakehouse architecture by integrating ADLS Gen2 with Blob Storage, reducing cross-environment data retrieval latency by 55% and centralizing 18 data sources.
● Deployed CI/CD pipelines using Azure DevOps for 18 code repositories, automating build-validation- deploy cycles and enabling 3x faster environment promotion.
● Designed and implemented Azure ML pipelines to train, evaluate, and operationalize patient risk models achieving 87% prediction accuracy and influencing 3 care programs.
● Integrated Azure AD with hybrid identity federation, enabling SSO and role-based access control for 1,200+ enterprise users across cloud and on-prem environments.
● Refactored 400+ stored procedures, optimized indexes for 30+ high-load tables, and introduced query caching, reducing clinical reporting runtimes by over 70%.
● Delivered 12 executive-level Power BI dashboards sourced from SSAS models to visualize cost, claim, and engagement metrics across 4 functional departments.
● Scheduled 30+ recurring batch jobs using Azure Automation and ADF triggers, automating 95% of previously manual data refresh operations.
● Streamed over 5,000 patient and claim events per day using Kafka and Azure Event Hubs into Azure Databricks for real-time aggregation and alerting.
● Enabled Change Data Capture on 10 SQL Server instances, decreasing data replication lag from 8 hours to under 45 minutes using incremental load logic.
● Tuned Synapse query performance by creating 40+ materialized views and restructuring resource groups, tripling data throughput for analytics teams.
● Applied HIPAA-compliant security configurations using private endpoints, Azure Key Vault encryption, and conditional access policies across 7 production systems.
● Monitored 50+ production pipelines using Azure Monitor and Log Analytics with custom alerts, reducing incident response time from 4 hours to under 30 minutes.
● Directed migration of 12 mission-critical SQL workloads from on-prem to Azure SQL Managed Instances with near-zero downtime using replication and cutover planning. Hartford Insurance May 2018 - Dec 2019
Data Engineer Chennai, India
● Constructed 12+ ETL pipelines using Python and AWS Glue to transform policy and claim data across five business units, handling over 50 million records monthly with fault-tolerant logic.
● Migrated 8 mission-critical data workloads from on-premise SQL Server to AWS using Amazon Redshift, S3, and Amazon RDS, reducing compute costs by $6,000 monthly and improving latency by 65%.
● Designed and implemented star-schema models in Amazon Redshift comprising 50+ optimized tables to support scalable analytical workloads and enable 25+ downstream dashboards.
● Engineered real-time streaming ingestion with Amazon Kinesis and AWS Lambda to capture and process 100,000+ insurance events per hour, supporting fraud detection with sub-second insights.
● Configured granular IAM roles, service accounts, and security groups to protect 10+ datasets with PII, achieving full compliance with internal security policies and audit requirements.
● Developed 15 interactive Amazon QuickSight dashboards backed by federated Redshift and Athena sources, enabling real-time policy performance metrics for claims managers across four departments.
● Orchestrated 25+ interdependent batch workflows using Amazon Managed Workflows for Apache Airflow
(MWAA), reducing pipeline failure rates by 40% and increasing job transparency for developers and stakeholders.
● Diagnosed and resolved 100+ ingestion errors and schema mismatches using Amazon CloudWatch, AWS X-Ray, and AWS CloudTrail, cutting data downtime by 70% over a 12-month span.
● Deployed three machine learning models on Amazon SageMaker to detect anomalous claims behavior with 92% precision, generating daily scoring reports consumed by underwriting systems.
● Automated CI/CD processes for data pipelines using AWS CodeBuild, CodeCommit, and CodePipeline, shortening release cycle times from 5 days to under 1 day.
● Implemented over 200 schema and data integrity validation rules across ingestion pipelines, raising data quality scores from 84% to 98% within six months.
● Provisioned 10+ secure Amazon EC2 instances with custom VPC networking and access configurations to support data science experimentation and ETL compute tasks.
● Authored 25+ custom SQL UDFs in Amazon Redshift to encapsulate reusable transformation logic, reducing SQL duplication and maintenance overhead by 30%.
● Enforced encryption-at-rest and in-transit using AWS KMS and VPC endpoints with PrivateLink, securing over 10 TB of sensitive claims and policy data.
● Tuned Amazon Redshift performance by restructuring 30+ tables with distribution styles and sort keys, decreasing average query cost by $4,000/month and runtime by 55%.
● Led migration of 30+ virtual machines using AWS Application Migration Service (MGN), decommissioning aging infrastructure and achieving a 70% reduction in system administration tasks. EDUCATION
University of North Texas
Master of Science in Data Science