SAI BGV
Senior Data Engineer
Email: ********@*****.***
PROFESSIONAL SUMMARY
● Experienced Data Engineer with over 5 years of expertise in developing, optimizing, and managing scalable ETL pipelines across cloud environments such as AWS, Azure, and GCP, specializing in financial, healthcare, and insurance data.
● Proficient in data migration, including transitioning on-premise SQL Server systems to cloud-based platforms like Amazon Redshift and Snowflake, reducing infrastructure costs and improving scalability.
● Skilled in big data technologies including Hadoop, Spark, Kafka, and Kinesis, with a strong focus on real-time and batch data streaming, improving data throughput, and reducing latency.
● Expertise in machine learning and data modeling, leveraging tools like TensorFlow, DBT, and Google Cloud AI Platform to drive predictive analytics and optimize data workflows for better decision-making.
● Adept at implementing CI/CD pipelines, infrastructure as code (IaC) using Terraform, and containerization with Docker and Kubernetes, ensuring faster deployments and streamlined cloud infrastructure management.
TECHNICAL SKILLS
Programming Languages: Python, SQL, Java
Amazon Web Services (AWS): AWS Glue, AWS Lambda, AWS EMR, Amazon Redshift, Amazon S3, AWS DMS, Amazon EC2, Amazon SNS, Amazon SQS, ECS, EKS, CloudWatch, CloudTrail, CodePipeline, Kinesis, Athena, IAM, VPC, Glacier, QuickSight Microsoft Azure Services: Azure Data Factory, Azure Databricks, Azure Event Hubs, Stream Analytics, Azure Synapse Analytics, Azure SQL Database, Azure Blob Storage, Logic Apps, Azure Kubernetes Service (AKS), Azure Data Lake Storage Google Cloud Services: Google Cloud AI Platform, Google Cloud Vertex AI, Google Cloud AutoML, Google Cloud Storage, Google Cloud BigQuery, TensorFlow Methodologies: Agile Methodologies, CI/CD, Infrastructure as Code (IaC), Scrum ETL & BI Tools: AWS Glue, Azure Data Factory, Databricks, Snowflake, DBT, Power BI, Tableau, Glue DataBrew, QuickSight
Machine Learning: TensorFlow, Google Cloud AutoML, Google Cloud Vertex AI, Google AI Platform
Databases: Amazon Redshift, Snowflake, Azure SQL Database, SQL Server (on-prem), Hive (Hadoop), DynamoDB
Big Data: Apache Spark, PySpark, Kafka, Airflow, Kinesis, Flink, Hadoop, Hive Version Control: Git, Jenkins, AWS CodePipeline, Azure DevOps Containerization & Deployment: Docker, Kubernetes, AWS ECS, Azure Kubernetes Service (AKS)
PROFESSIONAL EXPERIENCE
Snowflake Inc. Dallas, TX
Senior Data Engineer (May 2023 – Present)
● Developed scalable ETL pipelines using AWS Glue and AWS EMR to process 10 TB of financial data daily, improving processing speed by 25% and optimizing data loads into Amazon Redshift.
● Optimized Snowflake ELT workflows using DBT, improving data quality checks by 30% for complex financial datasets.
● Migrated on-premises SQL Server to Amazon Redshift and Snowflake, reducing infrastructure costs by 20%.
● Improved job execution on AWS EMR by optimizing PySpark scripts, increasing processing speed by 40% and reducing resource consumption by 15%.
● Designed event-driven architectures with AWS Lambda, SNS, and SQS, reducing transaction alert processing latency by 40%.
TCS (Tata Consultancy Services) Hyderabad, India
Data Engineer (Jan 2021 – Jul 2022)
● Engineered cross-cloud integration between Azure and GCP to synchronize data pipelines for real-time processing and analysis.
● Built real-time and batch ETL pipelines using Azure Data Factory, Databricks, Event Hubs, and Stream Analytics, processing 10 TB of healthcare data daily.
● Developed and optimized stored procedures in Azure SQL Database, improving query performance by 50%.
● Processed 5 TB of healthcare data daily using PySpark on Azure Databricks, reducing ETL processing time by 60%.
● Implemented partitioning and indexing in Azure Synapse Analytics, cutting query times by 70%.
Infosys Hyderabad, India
Data Engineer (Mar 2019 – Dec 2020)
● Engineered scalable ETL data pipelines using AWS Glue, Lambda, and Python scripts, automating processing of 10+ TB of insurance data monthly.
● Developed efficient data storage solutions with S3 and Redshift, reducing data retrieval time by 30%.
● Managed AWS IAM security and access control for sensitive insurance data.
● Designed and implemented star schema models in Redshift, boosting query performance by 40%.
● Integrated real-time data processing using Kinesis and DynamoDB, reducing transaction latency by 50%.
EDUCATION
University of Texas at Dallas – May 2024
Master’s in Computer Science