CERTIFICATIONS
CONTACT
SKILLS
*****************.**@*****.***
Aubrey, Texas
Big Data & Streaming: Apache Spark
(Scala, Java, PySpark), Kafka, Hadoop
(Hive, MapReduce), Apache Airflow
AWS: Glue, Redshift, EMR, Athena,
Lambda, S3, Step Functions, Kinesis,
DynamoDB, RDS, IAM, CloudWatch,
CloudFormation, LakeFormation, EKS .
GCP: BigQuery, Dataflow, Pub/Sub,
Dataform, Data Fusion, Cloud Storage,
Cloud Composer. Azure: ADF,
Synapse, Databricks, Azure Functions,
Cosmos DB, Azure Blob Storage
Snowflake (DBT, Snowpark, Streams &
Tasks), Informatica, Talend, Matillion,
SQL Server, PostgreSQL, MongoDB
Jenkins, Docker, Kubernetes,
Terraform, Git, CI/CD Pipelines
RBAC, IAM Policies, Encryption, Data
Governance (GDPR, HIPAA), Vertex AI
VARUN CHINTHAKINDI
SENIOR DATA ENGINEER
WORK EXPERIENCE
EDUCATION
PROFESSIONAL SUMMARY
12 years of experience in Data Engineering with expertise in building scalable data pipelines, cloud data warehousing, and big data technologies. Proficient in end-to-end ETL design, cloud migrations, and real-time data processing using AWS, Azure, and GCP. Adept at collaborating with global clients and leading cross-functional teams to deliver high-quality, production-ready solutions.
Citi Bank Oct 2023 - Present
Senior Data Engineer
Raymond James Dec 2021 – Sep 2023
Senior Data Engineer
Cambia Health Feb 2019 – Nov 2021
Data Engineer
Designed scalable data architectures using AWS Glue, EMR, Lambda, Redshift, Step Function, SNS, EventBridge and DynamoDB. Built ETL workflows with AWS Glue, Lake Formation, leveraging PySpark and Snowflake & DBT for complex data transformations. Optimized Snowflake queries for large-scale datasets, improving performance by 40%.
Designed end-to-end architecture for real-time and batch data systems using GCP Dataproc, Dataflow, BigQuery & Looker Studio. Leveraged Snowflake and DBT for data transformation and orchestration using Airflow.
Improved data pipeline reliability and performance through PySpark and Apache Beam.
Developed and optimized ETL/ELT pipelines using Databricks, ADF, Snowflake, and dbt, enhancing performance and scalability for financial data processing.
Implemented real-time streaming analytics with Spark Streaming, Delta Live Tables, and Snowflake Streaming, enabling instant insights for financial decision-making.
Master in Business Administration 2011 - 2013
Arvindaksha Educational Society’s Group of institutions Bachelor of Tech, Computer Science 2007 - 2011
Sana Engineering college, Kodad
linkedin.com/in/varun-ch
CITIBANK, FL OCT 2023 TO PRESENT
SENIOR DATA ENGINEER
Key Responsibilities:
Architected and implemented scalable, high-performance data engineering solutions using big data technologies such as PySpark, Scala, and Java on Amazon EMR, AWS Glue, AWS EKS, ensuring optimal ETL pipeline efficiency and real-time processing.
Designed and developed robust real-time data streaming solutions leveraging Amazon Kinesis, Apache Kafka, and AWS Lambda, optimizing low-latency event-driven architectures. Engineered high-performance data warehousing solutions using Amazon Redshift and Snowflake, implementing advanced partitioning, clustering, materialized views, and query optimization techniques to handle petabyte-scale datasets.
Optimized and automated ETL processes and Data Lake using AWS Glue bookmarks, crawlers, LakeFormation, and DMS, ensuring incremental data loads and efficient schema evolution. Migrated on-premise databases (Oracle, MS SQLServer) to PostgreSQL, Amazon Aurora, and Snowflake on AWS, utilizing AWS DMS, Schema Conversion Tool (SCT), and automation scripts for seamless transitions.
Developed and productionized large-scale ETL workflows using SQL Server Integration Services (SSIS), enhancing data transformation logic, reducing latency, and improving system reliability. Built and maintained complex data ingestion pipelines using Apache Sqoop, Python JDBC, Pandas, NumPy, Iceberg, Spark connectors, and Fivetran, ensuring efficient data integration across multiple sources.
Developed and deployed ML models within the MLOps framework, integrating machine learning workflows into data pipelines using AWS SageMaker, Glue ML Transforms, and DataRobot, Vertex AI, Streamlit, ensuring scalable and automated inference. Designed and implemented Matillion-based ETL pipelines, transforming and processing large-scale datasets, reducing processing times by ~30%.
Developed end-to-end data pipelines using Python, AWS Lambda, and Snowflake, enabling automated data ingestion, transformation, and validation while building interactive data visualization dashboards with Streamlit.
Extensive experience in cloud-based data warehousing with Snowflake, implementing Stored Procedures, Snowpark, Streams & Tasks, optimizing transformations and modeling using DBT (Data Build Tool), and orchestrating workflows with Apache Airflow and DAGs. RAYMOND JAMES, FL DEC 2021 TO SEPT 2023
SENIOR DATA ENGINEER
Key Responsibilities:
Architected and implemented enterprise-scale data systems on Google Cloud Platform (GCP), ensuring scalability, reliability, and high availability by leveraging BigQuery, Cloud SQL, Google Cloud Storage
(GCS), and Dataproc.
Designed, developed, and optimized real-time and batch data processing workflows using GCP Dataflow, Dataproc, Cloud Run, Databricks, and Apache Beam, enabling efficient ETL and real-time streaming pipelines with low-latency event processing.
Led the migration of critical financial and accounting data from on-premise databases to GCP, leveraging Google Database Migration Service (DMS), Change Data Capture (CDC), and custom scripts for seamless data transfer while ensuring zero data loss and minimal downtime. Designed and implemented dimensional data models, including conceptual, logical, and physical models, ensuring data integrity, lineage tracking, and governance compliance across financial datasets. Developed and enforced data governance policies, including data lineage tracking, metadata management, RBAC (Role-Based Access Control), and audit logging, ensuring compliance with SOX, GDPR, and other industry regulations.
Implemented advanced query optimization techniques on BigQuery and Cloud SQL, utilizing partitioning, clustering, indexing, and materialized views to enhance performance and reduce query execution time by ~25%.
CAMBIA HEALTH IMPL. THROUGH CITIUS TECH, MUMBAI
DATA ENGINEER FEB 2019 TO NOV 2021
Key Responsibilities:
Designed and implemented end-to-end data solutions leveraging Databricks, Azure Synapse Analytics, Snowflake, and Azure Data Factory (ADF) to support enterprise-scale data warehousing and financial accounting systems.
Migrated and optimized low-code pipelines from Azure Synapse and ADF into high-performance, fully coded solutions using Databricks, PySpark, and dbt (Data Build Tool), enhancing data processing efficiency by 40%.
Developed advanced watermarking systems using PySpark and Python for incremental data processing, reducing data latency and improving reliability in real-time data pipelines. Architected and deployed real-time streaming analytics solutions using Apache Spark Streaming, Delta Live Tables, and Snowflake Streaming, enabling near-instantaneous insights for financial decision- making.
Applied windowing and aggregation techniques on streaming data to deliver actionable insights, supporting real-time business intelligence and reporting. Designed and implemented scalable ETL/ELT processes using ADF, Databricks, Snowflake, and dbt, ensuring seamless data ingestion, transformation, and storage for large-scale financial datasets. Built and optimized data models in Snowflake and Azure Synapse Analytics using dbt, ensuring efficient storage, retrieval, and query performance for complex financial data. Orchestrated and automated data workflows using ADF, Apache Airflow, Terraform, and dbt Cloud, achieving 99.9% pipeline reliability and reducing manual intervention. Optimized Spark jobs (PySpark, Scala, Java) and Snowflake queries through partitioning, clustering, and caching, improving job performance by 30% and reducing resource consumption. Implemented data governance and metadata management solutions using Azure Purview, Unity Catalog, and Snowflake’s data governance features, ensuring compliance with financial data standards and end- to-end traceability.
Collaborated with cross-functional teams to translate business requirements into technical solutions, effectively communicating complex concepts to both technical and non-technical stakeholders. Demonstrated expertise in real-time programming and problem-solving, utilizing Python, Databricks, Snowflake, dbt, and modern data tooling to deliver innovative data engineering solutions. Built interactive dashboards and ML model front-ends using Streamlit for rapid prototyping and business stakeholder engagement.
Designed and implemented robust data validation and quality frameworks, leveraging dbt-utils, dbt- expectations, Great Expectations, and Apache Griffin to ensure data accuracy, consistency, and completeness across pipelines.
Developed CI/CD pipelines using GCP Cloud Build, Terraform, and Jenkins, automating deployment, monitoring, and security compliance for data engineering workflows. Conducted comprehensive code reviews and implemented best practices in PySpark, Scala, Java, and SQL to enhance code efficiency, maintainability, and scalability. Expertise in large-scale distributed data processing using Apache Spark (PySpark, Scala, Java), Apache Beam, and Airflow DAGs, ensuring real-time streaming analytics and event-driven architectures. Implemented security best practices using IAM policies, encryption (KMS), VPC configurations, and automated security scans to protect sensitive financial data, ensuring alignment with SOX and GDPR compliance requirements.
Built interactive dashboards in Looker Studio, integrating BigQuery datasets to visualize KPIs such as claim anomalies, processing lag, and fraud patterns—empowering operations teams with self-serve analytics and real-time insights.
Orchestrated complex workflows using Apache Airflow and Google Cloud Composer, ensuring efficient dependency management, failure recovery, and SLA monitoring for ETL pipelines. Proficient in utilizing GCP-native services such as Google Kubernetes Engine (GKE), BigQuery ML, Looker, and Dataform, integrating machine learning capabilities and advanced analytics into data pipelines.
Key Responsibilities:
Develop high-level and detailed solution architectures for data engineering projects, ensuring alignment with business objectives and scalability.
Design solutions using big data technologies like PySpark over Amazon EMR, AWS Glue, and others. Real-time data processing using services like Amazon Kinesis Design and optimize data warehouses using services like Amazon Redshift and implement strategies to optimize query performance for large datasets
Design and implement data models that meet business needs and ensure efficient data storage and retrieval
Set up monitoring and logging for data pipelines and warehouses. Diagnose and resolve issues in data pipelines and architectures
Effectively communicate technical decisions, challenges, and solutions to both technical and non- technical stakeholders
Extensive hands-on experience leveraging PySpark, Spark with Scala and Java on Amazon EMR, Hadoop with Java for large-scale data processing, ensuring efficient ETL workflows and optimal resource utilization. Data ingestion using Apache Sqoop, Python JDBC application, Spark connectors, Fivetran. Led the migration of complex ETL pipelines and data workflows from AWS to GCP, including Redshift to BigQuery and EMR to Dataproc, ensuring seamless data integration, optimized performance, and minimal downtime.
VERIZON IMPL. THROUGH SIGMOID ANALYTICS, BANGALORE DATA ENGINEER AUG 2016 TO DEC 2018
Key Responsibilities:
Developed data pipelines using Azure Data Factory and Snowflake to integrate data from MySQL and Cassandra into Azure Synapse Analytics for enterprise reporting. Implemented data ingestion and transformation workflows on Azure Databricks and HDInsight, utilizing Spark SQL and Python to process data from Azure Data Lake Storage (ADLS Gen2) and Cosmos DB. Automated real-time data ingestion processes using C# APIs and Azure Data Factory, ensuring efficient data flow and validation into Azure Synapse Analytics. Worked on real-time and batch data streaming with Azure Event Hubs and Databricks Spark Streaming, processing vehicle sensor data to support predictive maintenance. Developed Power BI dashboards and visualizations, providing insights into supply chain performance and production metrics for business teams.
BCBS IMPL. THROUGH SIGMOID ANALYTICS, BANGALORE
DATA ENGINEER MAR 2013 TO JULY 2016