Chandra Sai Kiran kammari
Data Engineer
**********@*****.***
Highly skilled and results-driven Data Engineer with 4+ years of experience in designing, developing, and maintaining data pipelines and large-scale data systems. Proficient in utilizing cloud platforms (AWS, Azure, GCP), data integration, ETL processes, and big data technologies like Hadoop and Spark. PROFILE SUMMARY:
4+ years of experience in designing, developing, and maintaining data pipelines and architectures for large-scale data systems.
Expertise in building and optimizing ETL pipelines, data models, and data storage solutions using SQL, Python, and big data technologies (Hadoop, Spark, Kafka) and Proficient in cloud platforms such as AWS, Azure, and Google Cloud Platform (GCP), implementing scalable and efficient data solutions.
Extensive experience with AWS services: S3, Lambda, Redshift, Glue, and EC2 for data storage, processing, and automation and Proficient in leveraging AWS tools for data ingestion, transformation, and loading into data lakes and data warehouses.
Experience with Azure Data Lake, Azure SQL Data Warehouse, Azure Databricks, and Azure Synapse Analytics and Skilled in using Azure DevOps for CI/CD pipelines to deploy and monitor data engineering workflows.
Hands-on experience with GCP services such as Big Query, Dataflow, and Cloud Storage and Built data lakes and ETL pipelines to process structured and unstructured data using GCP.
Proficient in SQL and Python for complex queries and data transformations and Extensive experience with relational databases and Expertise in designing and developing data models, including schema design for data warehouses and data lakes.
Proficient in big data frameworks like Apache Spark, Hive, and Hadoop for large-scale data processing and Experience with building and optimizing data pipelines to handle high volumes of data efficiently.
Skilled in automating data workflows using tools like Apache Airflow, Kubernetes, and Jenkins and Implemented fault-tolerant, scalable, and flexible data pipelines to ensure high availability and reliability.
Proficient in using Tableau and Power BI for building interactive dashboards and automating reporting systems and designed real-time reporting systems to enable data-driven decision-making for business stakeholders.
Proficient in SQL and Python for complex queries, stored procedures, and data transformation scripts.
Experience with columnar databases such as Snowflake and Amazon Redshift for high-performance analytics.
Experience with batch and real-time data pipelines using tools like Spark Streaming and Flink and Designed and maintained streaming data architectures for real-time analytics using Kafka and Kinesis.
Skilled in automating workflows with tools like Apache Airflow, Kubernetes, and Jenkins for CI/CD in data engineering and developed automated data quality checks using custom Python scripts and integrated them into pipelines.
Proficient in using Java for building scalable data processing applications, integrating with frameworks like Apache Kafka, Apache Spark, and Hadoop MapReduce, ensuring efficient handling of large datasets.
Experienced in Snowflake for cloud-based data warehousing, including performance optimization through clustering keys and partitioning and Proficient in leveraging Terraform and CloudFormation for infrastructure as code (IaC) to automate cloud resource provisioning.
Hands-on expertise with Docker and Kubernetes for containerizing and orchestrating scalable data engineering workflows and implemented advanced analytics using Pandas, NumPy, and Scikit-learn, enabling predictive modelling and statistical analysis and Expertise in deploying CI/CD pipelines for data pipelines using GitHub Actions, Jenkins, and Azure DevOps.
Proficient in developing and deploying Machine Learning models, utilizing frameworks like TensorFlow, Scikit-learn, and PyTorch to create predictive models, classification algorithms, and data-driven solutions across diverse industries. TECHNICAL SKILLS:
Programming & Scripting: Python, Java, Scala, SQL, Bash, R
Big Data Technologies: Hadoop, Apache Spark, Hive, HDFS, Presto, Pig, Flink
ETL/ELT Tools: Informatica, Talend, Apache Nifi, dbt, SSIS
Data Warehousing: Snowflake, Redshift, Azure Synapse, Big Query, Teradata, Vertica
Data Integration & Orchestration: Apache Airflow, Luigi, AWS Step Functions, Azure Logic Apps, GCP Data Fusion
Streaming & Real-Time Processing: Apache Kafka, AWS Kinesis, Azure Event Hubs, Google Pub/Sub, Spark Streaming
Database Systems: PostgreSQL, MySQL, Oracle, SQL Server, MongoDB, Cassandra, DynamoDB, Cosmos DB, HBase
Business Intelligence Tools: Tableau, Power BI, Looker, AWS Quick Sight, Google Data Studio
DevOps & CI/CD: Jenkins, GitLab CI/CD, Azure DevOps, AWS Code Pipeline, Docker, Kubernetes
Machine Learning Integration: AWS Sage Maker, Azure ML, GCP Vertex AI
Workflow Automation: Terraform, CloudFormation, Ansible
Version Control & Collaboration: Git, GitHub, GitLab, Bitbucket
Cloud Platforms:
AWS: Redshift, S3, Glue, Lambda, EMR, Athena, DynamoDB, Kinesis, CloudFormation
Azure: Azure Data Factory, Synapse Analytics, Azure Storage, Databricks, Cosmos DB, Logic Apps, Azure Functions
Google Cloud (GCP): Big Query, Dataflow, Pub/Sub, Cloud Storage, Cloud Composer, Cloud Functions WORK EXPERIENCE:
Advent Health Orlando, Orlando, FL, USA AZURE Data Engineer Oct 2023 – Present OBJECTIVE: Advent Health Orlando is a leading non-profit healthcare system and hospital in Florida, providing advanced medical care, research, and wellness services with a focus on whole-person health body, mind, and spirit. Designed and implemented scalable financial data solutions on Azure, leveraging Synapse, Databricks, Data Factory, and Power BI for real- time analytics, ETL automation, and high-performance cloud-based data processing. Key Responsibilities and Achievements:
Designed and implemented Azure Data Lake solutions for storing and processing large-scale financial data, ensuring efficient integration and management of data flows across systems using Azure Synapse Analytics and Azure Databricks.
Developed real-time financial data processing solutions using Azure Stream Analytics for actionable insights, enabling faster decision-making in dynamic financial environments.
Utilized Power BI to create interactive dashboards and reports, providing real-time financial analysis and performance tracking for stakeholders and Automated deployment and testing processes with Azure DevOps, streamlining continuous integration and continuous delivery (CI/CD pipelines) for finance applications and data workflows.
Managed and optimized Azure SQL Databases for high-performance analytics, ensuring scalable solutions for complex financial data queries and reporting and employed Pyspark and Scala on Azure Databricks to perform advanced data modelling, transformations, and financial data analysis, enhancing decision-making capabilities.
Configured Azure Cosmos DB for globally distributed, low-latency data storage, ensuring financial data availability and accessibility across multiple regions and Implemented Apache Hadoop and Apache Spark clusters on Azure HDInsight for distributed big data processing, enabling faster financial data analysis at scale.
Integrated Apache Kafka with Azure Event Hubs for stream processing, enabling real-time financial data ingestion and facilitating rapid event-driven decision-making and optimized financial data pipelines using Apache Airflow on Azure, scheduling and orchestrating complex workflows to automate data processing tasks.
Developed and managed ETL pipelines using Azure Data Factory and Apache NiFi, ensuring seamless extraction, transformation, and loading of financial data from various sources into a centralized cloud data warehouse for analytics and reporting and Integrated Sqoop for bulk data transfer between on-premise databases and Hadoop ecosystem, utilizing Impala for high-performance SQL queries on large datasets, while leveraging Zookeeper for managing distributed systems and Flume for reliable and scalable data ingestion from various financial data sources.
Managed version control using Git for collaborative development and integrated Maven for build automation, ensuring smooth CI/CD pipelines with Jenkins to streamline the deployment and testing of financial data applications and services in a cloud environment.
ENVIRONMENT: Azure Data Lake, Azure Synapse Analytics, Azure Databricks, Azure Stream Analytics, Azure Databricks, Azure Stream Analytics, Power BI, Azure DevOps, CI/CD pipelines, Azure SQL Databases, Pyspark, Scala, Azure Cosmos DB, Terraform, Azure Kubernetes Service, Apache Hadoop, Apache Spark, Apache Kafka, Apache NiFi, Sqoop, Hadoop, Impala, Zookeeper, Git, Maven, CI/CD, Jenkins.
Brown & Brown Insurance, Daytona Beach, FL, USA AWS Data Engineer Jan 2023 – Aug 2023 OBJECTIVE: Brown & Brown Insurance is a leading insurance brokerage firm that provides risk management solutions, employee benefits, and insurance services to businesses, individuals, and government entities. Design, develop, and optimize scalable data pipelines, real-time streaming, and machine learning solutions on AWS, leveraging services like AWS Glue, EMR, Lambda, Redshift, and Snowflake for efficient big data processing and analytics. Key Responsibilities and Achievements:
Design, develop, and maintain data pipelines and workflows on AWS using services such as AWS Glue, AWS Lambda, Amazon EMR, Amazon S3, and Amazon Redshift.
Develop and deploy serverless solutions using AWS Lambda to automate data ingestion and processing.
Implement real-time data streaming solutions using Amazon Kinesis and Apache Kafka for processing high-velocity data and Implement CI/CD pipelines for data workflows using tools like AWS Code Pipeline, AWS Code Build, and GitHub Actions and Use AWS Athena for ad-hoc data querying and analysis of large datasets stored in S3 and Perform advanced data transformations using Python and Pyspark within AWS Glue or EMR environments.
Utilize Amazon Quick Sight to build interactive dashboards and visualizations for business insights.
Ensure scalability and high availability of data infrastructure by leveraging Auto Scaling and AWS Elastic Load Balancer
(ELB) and Design, optimize, and maintain relational databases using SQL on platforms such as Amazon RDS
(PostgreSQL, MySQL), Amazon Redshift, and DynamoDB, ensuring high performance, scalability, and data integrity.
Process and analyses large-scale datasets using Hadoop ecosystem technologies such as HDFS, Apache Hive, and Apache Spark on Amazon EMR, enabling efficient big data processing and insights.
Develop and manage ETL workflows using tools like AWS Glue, Apache NiFi, and Talend to extract, transform, and load data from diverse sources into data lakes and data warehouses.
Build and optimize scalable big data solutions using technologies like Apache Spark, Hadoop, Amazon EMR, and Kinesis to process and analyse massive datasets efficiently and integrate and deploy machine learning models using Amazon Sage Maker, TensorFlow, and PyTorch, enabling predictive analytics and advanced data-driven decision-making.
Design and implement data pipelines to ingest, transform, and analyse data in Snowflake, leveraging its multi-cluster architecture, SQL capabilities, and seamless integration with AWS services like S3 and Glue.
Develop and deploy serverless data processing workflows using AWS Lambda to automate real-time data ingestion, transformation, and event-driven tasks with seamless integration across AWS services.
Implement CI/CD pipelines using tools like Jenkins, AWS Code Pipeline, and GitLab CI/CD to automate the deployment and testing of data workflows and infrastructure changes.
Manage and optimize data infrastructure on Linux and Windows operating systems, ensuring seamless integration with AWS services, high availability, and robust security configurations and develop interactive and insightful Power BI dashboards and reports by integrating data from AWS Redshift, SQL databases, and Snowflake, ENVIRONMENT: AWS Glue, AWS Lambda, Amazon EMR, Amazon S3, Amazon Redshift, Amazon Kinesis, Apache Kafka, Python, Pyspark, DynamoDB, HDFS, Apache Hive, Amazon Sage Maker, TensorFlow, PyTorch, Snowflake, CI/CD pipelines, Jenkins, GitLab CI/CD, Linux, AWS Lambda, Power BI. Visa Inc., Bangalore, India GCP Data Engineer Nov 2020 – Dec 2021 OBJECTIVE: Visa Inc. is a global financial services company that facilitates electronic payments and operates one of the world's largest digital payment networks. Design and implement scalable data pipelines on Google Cloud Platform (GCP) for real-time and batch data processing, analytics, and machine learning integration. Key Responsibilities and Achievements:
Design and implement scalable data pipelines on Google Cloud Platform (GCP) using Google Cloud Dataflow, Apache Beam, and Google Cloud Pub/Sub to process and stream real-time data.
Utilize Google Cloud Dataproc and Apache Spark to process and analyse large volumes of data, enabling batch and real- time data analytics.
Integrate machine learning models using Google AI Platform and TensorFlow to implement predictive analytics and automated decision-making for insurance claims and risk assessments.
Implement and maintain real-time data streaming solutions using Google Cloud Pub/Sub and Google Cloud Dataflow for timely insurance data processing and event-driven applications.
Automate the deployment and testing of data pipelines using Google Cloud Build and Terraform to ensure CI/CD practices in data infrastructure.
Leverage Cloud SQL to manage and analyse transactional data for customer profiles, claims history, and policy details in a secure, scalable environment.
Process and analyse large-scale insurance datasets using Hadoop technologies such as HDFS, Apache Hive, and Apache Spark on Google Cloud Dataproc, enabling distributed data processing and efficient querying for business insights.
Design and implement ETL workflows using Google Cloud Dataflow, Apache Beam, and Cloud Dataproc to extract, transform, and load insurance data from multiple sources into Big Query and Google Cloud Storage (GCS) for analytics and reporting.
Design and manage data pipelines to ingest, transform, and analyse large datasets in Snowflake, leveraging its multi- cloud architecture, SQL capabilities, and seamless integration with Google Cloud Storage and Big Query ENVIRONMENT: Google Cloud Platform, Google Cloud Pub/Sub, Google Cloud Dataflow, Apache Beam, machine learning models, Google AI Platform, TensorFlow, Terraform, CI/CD, Cloud SQL, Hadoop, HDFS, Apache Hive, Big Query. Amazon, Bangalore, India Data Engineer Jul 2019 – Oct 2020 OBJECTIVE: Amazon is a global technology giant specializing in e-commerce, cloud computing (AWS), AI, and digital streaming. As a data engineer creating and handling scalable data pipelines, assuring data quality, and maximizing data processing performance and manage data solutions using Cloud Platform technologies to support JTEKT, development, and business operations.
Key Responsibilities and Achievements:
Design and implement scalable data pipelines to process and manage large volumes of financial data for analytical insights using Apache Spark, Hadoop, and ETL tools. Optimize data integration and processing workflows using SQL Server, PostgreSQL, and cloud solutions like Azure Synapse Analytics and Google Big Query and develop and maintain real- time streaming solutions using Apache Kafka, Azure Stream Analytics for real-time financial transaction processing.
Use Python, Pyspark, and Scala to develop ETL pipelines for transforming and loading data from heterogeneous sources into centralized storage systems and Leverage SQL, MongoDB, and Cassandra for managing structured and unstructured financial data and performing complex queries to generate business insights and Automate data workflows and job scheduling using Apache Airflow, Azure Data Factory for seamless data integration and movement and Implement CI/CD pipelines using Jenkins, Git, and Maven to automate the deployment and testing of data engineering applications.
Utilize Databricks and Apache Hive to perform large-scale data analytics and advanced transformations for banking data processing and reporting and collaborate with data scientists and analysts to identify data needs and provide optimized data solutions for advanced analytics and business intelligence using Power BI and Tableau
Apply machine learning models using Azure ML Studio to predict financial trends, credit scoring, and fraud detection.
Leverage Sqoop for importing large datasets from relational databases into Hadoop HDFS for batch processing, use Impala for high-speed SQL queries on Hadoop, implement Zookeeper for maintaining coordination and synchronization across distributed data services, and utilize Flume for efficiently streaming data from various sources like logs and financial transactions.
ENVIRONMENT: Apache Spark, Hadoop, ETL tools, PostgreSQL, Azure Synapse Analytics, Google Big Query, Apache Kafka, Azure Stream Analytics, Python, Pyspark, Scala, MongoDB, Cassandra, Apache Airflow, CI/CD, Jenkins, Git, Maven, Apache Hive, machine learning, Power BI, Tableau, Sqoop, Zookeeper, Flume, Impala. EDUCATION:
Masters in Information technology and management from The University of Tampa - 2023 Aug, USA