Name: Kusuma M
Email: ************@*****.***
Ph#: +1-970-***-****
Professional Summary:
Nearly 5 years of extensive experience in data engineering, specializing in designing, implementing, and optimizing scalable data pipelines to support complex enterprise-level analytics and reporting across multiple cloud platforms such as AWS, Azure, and GCP.
Proven expertise in advanced SQL optimization, query tuning, and efficient database schema design, driving significant improvements in data retrieval speeds, performance, and efficiency for large-scale data operations.
Skilled in automating and streamlining ETL workflows using tools like Apache Airflow, Talend, Informatica, and Python (PySpark, Pandas), reducing manual intervention and improving data accuracy, consistency, and processing efficiency.
Advanced experience in migrating large-scale on-premises databases to cloud platforms (AWS, Azure, GCP), ensuring data integrity, security, and seamless integration across environments, including working with Azure SQL Database and Snowflake.
Proficient in developing interactive dashboards and visualizations using Power BI and Tableau, enabling stakeholders to gain real-time business insights and make data-driven decisions to improve operational efficiency.
Comprehensive knowledge of big data frameworks, including Apache Hadoop, Spark, Kafka, and Flink, for efficient large-scale data processing, real-time data streaming, and the orchestration of data flows in distributed environments.
Extensive experience in data warehousing with solutions such as Snowflake, Redshift, Teradata, and BigQuery, optimizing schema designs and enhancing data performance, ensuring fast, reliable query execution for reporting and analytics.
Implemented serverless computing and event-driven architectures using AWS Lambda, Azure Functions, and AWS Kinesis, improving scalability, reducing operational costs, and ensuring high availability for real-time data processing.
Designed and maintained complex data models using Star, Snowflake, and Data Vault schemas, consolidating data sources into centralized data lakes for better integration, access, and analytics.
Implemented real-time data streaming solutions (Kafka, AWS Kinesis, Azure Event Hubs) to facilitate continuous data ingestion and analytics, enabling real-time insights for decision-making.
Deep expertise in utilizing cloud-native data services, including AWS Glue, Azure Data Factory, Azure Synapse, and GCP BigQuery, to streamline data pipelines, accelerate data workflows, and ensure high-quality, reliable data across various business systems.
Proven track record in optimizing data storage and improving performance by managing AWS S3, Azure Data Lake, and NoSQL databases (MongoDB, Cassandra, HBase), reducing costs and ensuring fast, scalable data access.
Implemented infrastructure as code (IaC) using Terraform, CloudFormation, and Kubernetes, ensuring consistent, automated, and scalable cloud deployments, enhancing DevOps practices, and promoting a seamless integration process for cloud infrastructure.
Automated data validation processes using Python, ensuring data consistency and quality across disparate data sources, significantly improving data accuracy, reliability, and trustworthiness for analytics.
Implemented rigorous data governance, security, and compliance frameworks, adhering to best practices and ensuring compliance with industry standards such as GDPR, HIPAA, and SOC2 for data privacy and security.
Successfully mentored junior engineers and cross-functional teams, conducting training on cloud data engineering, automation, and best practices, enhancing the technical capabilities of team members.
Played a key role in integrating machine learning models into data pipelines using Python, supporting predictive analytics and enabling enhanced business intelligence solutions for customer behavior forecasting.
Created and maintained CI/CD pipelines using Jenkins, GitHub Actions, and CircleCI, automating data pipeline deployments and improving version control for data engineering workflows.
Developed and implemented monitoring and alerting solutions using tools like Splunk, AWS CloudWatch, Azure Monitor, and Datadog, ensuring the operational health of data pipelines and cloud-based applications.
Collaborated closely with business stakeholders to understand data requirements, translating business needs into scalable data solutions and supporting strategic decision-making across the organization.
Optimized data pipeline reliability and reduced downtime by implementing robust monitoring frameworks and proactive issue resolution strategies, ensuring uninterrupted data flow and minimal business impact.
Streamlined cloud-based data solutions, ensuring cost-efficiency, scalability, and secure data handling in complex, multi-cloud environments.
Consistently improved system performance and reduced operational overhead by fine-tuning data processing workflows and automating the data engineering lifecycle from ingestion to reporting.
Technical Skills
Programming Languages
Python, SQL, Scala, Java, Go, R, Bash, C++
Data Warehousing
Snowflake, Redshift, BigQuery, Teradata, Vertica, Greenplum
Databases
PostgreSQL, MySQL, Oracle, NoSQL (MongoDB, Cassandra, DynamoDB, HBase, CouchDB)
ETL/ELT Tools
Apache Nifi, Airflow, Talend, dbt, Informatica, SSIS
Big Data Frameworks
Apache Spark, Hadoop, Flink, Hive, Pig, Dremio, Drill
Cloud Platforms
AWS (S3, Glue, Redshift, Athena, Lambda, EMR), GCP (BigQuery, Dataflow, Dataproc), Azure (Data Factory, Synapse, Databricks)
Data Modeling
Star Schema, Snowflake Schema, OLAP, OLTP, Kimball, Inmon, Data Vault, Fact-Dimension Modeling
Streaming
Kafka, Apache Flink, Kinesis, Pulsar, Storm, RabbitMQ, Azure Event Hubs
Orchestration
Apache Airflow, Luigi, Prefect, Oozie, Dagster
Infrastructure as Code
Terraform, CloudFormation, Kubernetes, Ansible, Helm
DevOps & CI/CD
Docker, Kubernetes, Git, Jenkins, GitHub Actions, CircleCI, ArgoCD, TravisCI
Business Intelligence Tools
Tableau, Power BI
Professional Experience:
Client: Valley National Bank, Wayne, NJ. Duration: Jan 2024 – Till Date
Role: Data Engineer
Responsibilities:
Developed and maintained data pipelines using Azure Data Factory and Databricks, ensuring seamless data ingestion and transformation.
Implemented ETL workflows to process structured and unstructured data from multiple sources into Snowflake for analytics and reporting.
Optimized SQL queries and database performance, reducing execution time and improving efficiency in data retrieval.
Assisted in migrating on-premises databases to Azure SQL Database and Azure Data Lake, ensuring data integrity and accessibility.
Designed and implemented data models to support reporting, analytics, and business intelligence using Snowflake and Azure Synapse Analytics.
Automated ETL workflows using Python (Pandas, PySpark) to reduce manual intervention and enhance data accuracy.
Built and managed Azure Data Lake Storage, optimizing storage performance for high-volume data processing.
Integrated real-time data streaming using Kafka, facilitating real-time data ingestion and processing.
Utilized Apache Airflow to schedule and monitor data pipeline workflows, improving automation and efficiency.
Developed PySpark applications for distributed data processing, enabling scalable big data analytics.
Assisted in configuring Azure Analysis Services, supporting OLAP solutions and enhancing data aggregation.
Ensured data governance, security, and compliance by implementing role-based access controls (RBAC) and encryption in Azure environments.
Built Power BI dashboards by structuring and preparing datasets to enable meaningful visual insights for stakeholders.
Supported real-time monitoring of data pipelines using Azure Monitor and CloudWatch, ensuring system stability.
Maintained and optimized Snowflake schemas, improving query performance and cost efficiency.
Implemented Azure Functions to handle serverless event-driven data processing, reducing operational overhead.
Created and maintained documentation for data pipelines, transformation logic, and workflows to support team knowledge sharing.
Worked closely with data analysts and business stakeholders to understand data requirements and deliver scalable solutions.
Assisted in developing CI/CD pipelines using Jenkins and GitHub Actions to automate data pipeline deployments.
Conducted performance tuning of big data processing jobs in Databricks, optimizing cluster configurations and execution speeds.
Environment: Azure, PySpark, ETL, Databricks, ADF, AirFlow, Snowflake, Scala, Power BI, Python, Sqoop, Hive, Pig, MapReduce, Spark, Kafka, AWS, HBase, MongoDB, Cassandra, NoSQL, Flume and Windows.
Client: Dxc Technologies, India Duration: Mar 2022 – July 2023
Role: Data Engineer
Responsibilities:
Designed and implemented scalable data pipelines using Azure Data Factory and Databricks, enabling enterprise-wide analytics for diverse use cases.
Migrated large-scale on-premises databases to Azure SQL Database, ensuring seamless data integrity and optimized system performance.
Optimized complex SQL queries and database structures, reducing execution times by 40%, improving data retrieval efficiency for business-critical reports.
Developed interactive and real-time Power BI dashboards, providing actionable business insights and improving decision-making processes.
Automated ETL workflows using Python (Pandas, PySpark), reducing manual intervention and enhancing data accuracy and efficiency.
Standardized data integration processes, enabling advanced analytics and improving the ability to derive insights from structured and unstructured data.
Defined and implemented data architecture strategies, collaborating with cross-functional teams to ensure seamless data flow and governance compliance.
Designed and implemented a robust monitoring framework, reducing pipeline downtime by 20% and improving system performance tracking.
Conducted technical training sessions for junior engineers, sharing best practices in data engineering, cloud-based ETL development, and automation.
Assisted in the integration of machine learning models into data pipelines, supporting predictive analytics and customer behavior forecasting.
Developed and maintained data validation processes, ensuring high-quality and consistent data across multiple business applications.
Managed data warehousing solutions using Snowflake, optimizing schema designs for enhanced performance and query efficiency.
Implemented best practices in ETL process optimization, reducing processing times and improving overall data pipeline reliability.
Performed data lineage and impact analysis, ensuring traceability and accuracy in data movement across cloud-based environments.
Developed and deployed Azure-based data solutions, ensuring scalability, security, and cost-efficiency in cloud operations.
Collaborated with business stakeholders to translate data requirements into scalable solutions, improving overall analytics capabilities and reporting.
Environment: Azure, PySpark, ETL, Databricks, ADF, Airflow, Snowflake, SQL, Power BI, Python, Kafka, Azure SQL Database, Azure Data Lake, Spark, Hive, MapReduce, NoSQL, AWS, Redshift, MongoDB, Terraform, GitHub Actions, Windows.
Haritha IT Services, India Duration: Sep 2019 – Mar 2022
Role: Data Engineer
Responsibilities:
Designed and implemented ETL pipelines using AWS Glue, Informatica, and Apache Spark for efficient data ingestion and transformation.
Developed data processing workflows in PySpark and SQL, optimizing performance and reducing execution time for large-scale data operations.
Built and maintained data warehouses with Snowflake and Redshift, optimizing schema designs for faster queries and better performance.
Automated ETL workflows to minimize manual intervention and ensure accurate, efficient data processing.
Created data models using Star and Snowflake schemas, enhancing the scalability and performance of data storage and retrieval.
Implemented real-time data streaming solutions using Kafka and AWS Kinesis, enabling seamless data integration and analysis.
Managed AWS S3 and Azure Data Lake for structured and unstructured data storage, improving data accessibility and scalability.
Optimized SQL queries and indexing strategies, significantly improving data processing speed and query performance.
Developed interactive Power BI and Tableau dashboards, providing actionable insights and visualizing key business metrics for stakeholders.
Created and managed orchestration workflows using Apache Airflow and AWS Step Functions, automating the ETL processes.
Implemented data validation frameworks to ensure data quality, consistency, and accuracy across various sources.
Supported cloud migrations, transferring on-premises databases to AWS Redshift and Azure SQL Database, ensuring data integrity during the transition.
Established monitoring and alerting mechanisms using Splunk and AWS CloudWatch, ensuring system stability and proactive issue resolution.
Developed and enforced data security and governance policies, ensuring compliance with GDPR and other industry standards.
Collaborated with cross-functional teams to define data architecture, ensuring alignment with business objectives and improving data accessibility.
Environment: AWS, PySpark, ETL, Informatica, Snowflake, Redshift, Power BI, Python, Tableau, SQL, Kafka, AWS Glue, AWS S3, Azure Data Lake, Apache Airflow, AWS Step Functions, AWS Kinesis, Splunk, GitHub Actions, CloudWatch, Linux, Windows.
References: These will be provided upon request.