GANESH SURISETTY
Data Engineer +1-972-***-**** ****************@*****.*** LinkedIn Cincinnati, Ohio
PROFESSIONAL SUMMARY:
Proficient IT professional with over 9+ years of experience in Data Engineering, ETL Development, and Software Engineering, specializing in cloud technologies (GCP, Azure, Oracle Cloud), large-scale data pipelines, database management, and data analytics.
Adept at creating and managing data pipelines, optimizing ETL processes, and designing robust data models for large-scale data systems. Expert in Databricks for large-scale data processing, data engineering, and AI-powered analytics.
Hands-on experience in building and maintaining cloud infrastructure for data analytics, including expertise in Google Cloud Platform (GCP) and Microsoft Azure. Experienced with Oracle Integration Cloud (OIC) for enterprise application integration.
Skilled in utilizing languages like Python, SQL, and building real-time and batch data pipelines using tools like Databricks, Apache Kafka, Cloud Dataflow, Azure Data Factory, and Google Cloud Composer.
Expertise in working with Oracle Cloud, Azure, and GCP to implement data-driven solutions, optimize database performance, and support data migrations.
Experienced in performing data analysis, designing data lakes and data warehouses, and producing actionable insights for business decision-making.
Strong background in Databricks, Delta Lake, DBT, and CI/CD automation using Terraform, Airflow, GitHub Actions, Azure DevOps, and Google Cloud Build.
Expert in creating insightful dashboards and visualizations using Power BI, integrating diverse data sources like BigQuery, SQL Server, Snowflake, and Excel for business analytics.
Adept at collaborating across functional teams, project management, team leadership, and mentoring junior engineers in agile environments.
TECHNICAL SKILLS :
Programming Languages & Frameworks: Python (NumPy, Pandas, TensorFlow, PyTorch), SQL, Java, C, C++, JavaScript, HTML, CSS, Groovy Script, Spring Boot, Hibernate, React, Node.js
Data Engineering & Analysis: Apache Spark (PySpark), Apache Kafka, Hadoop (HDFS, MapReduce), ETL, Databricks, Data Pipelines, Data Cleansing, Data Transformation, Airflow.
Databases: SQL (MySQL, PostgreSQL, Oracle DB), NoSQL (MongoDB, DynamoDB), Google BigQuery, Snowflake, Azure SQL, Azure Synapse, Cloud Databases
Cloud Platforms & Tools: Google Cloud (BigQuery, Dataflow, Dataproc, Pub/Sub, GCS, Composer), Docker, Kubernetes, Microsoft Azure (ADF, Synapse, Blob Storage).
Data Technologies: ETL Pipelines, Databricks Delta Lake, Data Modeling, Data Warehousing, Data Integration, Power BI, Apache Hive, Presto, Apache Drill, Snowflake.
DevOps & Automation: CI/CD Pipelines (Jenkins, GitLab, Azure DevOps), Terraform, Git, Docker, Shell Scripting.
Business Intelligence & Reporting: Power BI, Tableau, Google BigQuery.
Data Integration: REST API, SOAP, OIC, Apache Kafka, Apache NiFi, Talend, SSIS, SAP BODS
CI/CD Tools: Jenkins, GitLab, Azure DevOps, Docker, Kubernetes
Tools & Frameworks: HubSpot CRM, Swagger, Skyvia, JIRA, Confluence, Pandas, Selenium, Postman, Visual Studio Code.
Version Control & Collaboration: Git, GitHub, Bitbucket, GitLab, JIRA, Confluence
CERTIFICATIONS :
Oracle Database SQL Certified Associate
Python for Data Science
Google Cloud Professional Data Engineer
Microsoft Certified Azure Data Engineer Associate
PROFESSIONAL EXPERIENCE:
CVS Pharmacy, TX September 2024– Present
GCP Data Engineer
Responsibilities:
Developed and optimized ETL/ELT pipelines using Databricks, PySpark, and Spark SQL to process large-scale data efficiently.
Designed and implemented cloud-based data solutions on GCP, leveraging BigQuery, Cloud Storage (GCS), Cloud Data Fusion, and Cloud Functions for scalable data storage and processing.
Optimized Hive ETL logic to handle complex transformations, improving data quality and reducing processing time. Developed real-time data streaming applications using Apache Kafka and Spark Streaming to facilitate instant data insights.
Managed Kubernetes clusters (GKE) for deploying scalable data processing workloads, ensuring high availability and fault tolerance.
Created and maintained data models, STM documentation, and data mapping sheets to align business requirements with data strategies.
Automated ETL workflows using Control-M, Autosys, and CI/CD pipelines with GitHub and Jenkins, reducing manual interventions and errors.
Enhanced Snowflake data warehouse performance through partitioning, indexing, and query optimization. Developed complex SQL queries and optimized database performance across Postgres, MSSQL, and Oracle databases.
Integrated Cloud Data Fusion for ETL automation, enabling seamless data migration and transformation for analytics. Developed and maintained Databricks workflows to support large-scale data transformation, analytics, and machine learning pipelines.
Implemented data governance strategies, ensuring compliance with security policies and regulatory standards in cloud environments.
Developed Teradata BTEQ scripts for efficient batch processing and reporting, enabling seamless integration with downstream analytics dashboards.
Created stored procedures and optimized SQL workloads on Teradata to support real-time business reporting and reduce query latency.
Designed and optimized data transformation workflows using DBT to streamline ETL processes and improve data quality. Configured and optimized Google Cloud Pub/Sub clusters for efficient real-time data streaming and processing.
Designed and deployed containerized microservices using GCP Cloud Run. Integrated Cloud SQL for scalable database management.
Built and managed data pipelines using GCP services like BigQuery, Cloud Composer, Dataflow, and GCS to support scalable and cost-efficient data analytics.
Integrated Databricks on GCP for advanced data science workflows, enabling collaborative analytics and ML processing.
Leveraged Teradata for querying and reporting on enterprise-scale datasets, optimizing complex SQL queries for performance and accuracy.
Environment: HDFS, Hive, Spark, Apache Kafka, SQL, GCP, Hadoop, HBase, GitHub, Oozie, GCS, GCE, Cloud Data Fusion, Control-M, Jenkins, Kubernetes (GKE), BigQuery, DBT, Cloud Run, Cloud SQL, Cloud Functions, Unix, Firestore, DevOps, Power BI, Snowflake, Teradata, BTEQ, Python, Agile Methodologies, Databricks (GCP).
American Express, AZ April 2023 – August 2024
GCP Data Engineer
Responsibilities:
Developed ETL frameworks using Spark, Python, and Google Cloud Dataflow, enabling efficient data ingestion and processing.
Built cloud ingestion automation between Google Cloud Storage (GCS) and HDFS, improving data integration across cloud and on-premises systems.
Designed and managed ETL workflows using SSIS to extract, transform, and load data across multiple environments. Implemented Power BI dashboards integrating SQL Server, BigQuery, and Excel data sources, providing real-time business insights.
Optimized data lake storage and Delta Lake performance, ensuring cost-effective and scalable analytics solutions. Developed REST APIs and integrated data pipelines using Google Cloud Functions and Workflows for seamless data exchange.
Conducted AMI data analysis for predictive modeling and reporting, supporting energy consumption insights. Implemented metadata management and data cataloging with Alation, improving data discoverability and lineage tracking.
Designed dimensional data models using star and snowflake schemas, optimizing data warehouse efficiency. Developed and optimized Databricks notebooks for ETL processing, real-time analytics, and machine learning applications.
Developed Databricks Jobs for batch and streaming pipelines using Auto Loader, Delta Live Tables, and Structured Streaming for real-time analytics.
Implemented Unity Catalog to manage data access, governance, and fine-grained permissions across multiple workspaces in Databricks.
Optimized notebook performance in Databricks with cluster autoscaling, caching strategies, and parameterization for reusable pipeline components.
Automated infrastructure deployment and monitoring using Terraform and, ensuring operational stability. Designed and maintained data governance frameworks to enforce security, compliance, and data access policies.
Configured BigQuery and Snowflake environments for high-performance querying and analytics workloads. Developed CI/CD pipelines using Cloud Build, Cloud Deploy, and Terraform to streamline data infrastructure deployments.
Integrated GCP services including Cloud Storage, Cloud Functions, and Dataflow to support hybrid cloud data processing and real-time analytics. Developed ETL jobs on Cloud Dataflow and automated job orchestration using Google Cloud Composer for seamless cloud-based data workflows.
Implemented cross-cloud data synchronization within GCP environments to improve data availability and disaster recovery.
Environment: Python, Spark, SQL, SSIS, Power BI, Google Cloud Platform (GCP), Google Cloud Storage (GCS), Google Cloud Dataflow, BigQuery, Google Cloud Functions, Google Cloud Composer, Google Cloud Workflows, Databricks, Delta Lake, Delta Live Tables, Structured Streaming, Unity Catalog, Terraform, Airflow, Cloud Build, Cloud Deploy, Snowflake, HDFS, Alation, REST APIs, Excel, SQL Server, Star Schema, Snowflake Schema.
Dollar General, Nashville, TN September 2021 – March 2023
Senior Data Engineer
Responsibilities:
Developed ETL frameworks using Spark, Python, and Azure Data Factory, enabling efficient data ingestion and processing.
Built cloud ingestion automation between Azure Blob Storage and HDFS, improving data integration across cloud and on-premises systems.
Designed and managed ETL workflows using SSIS to extract, transform, and load data across multiple environments. Implemented Power BI dashboards integrating SQL Server, Azure, and Excel data sources, providing real-time business insights.
Optimized data lake storage and Delta Lake performance, ensuring cost-effective and scalable analytics solutions. Developed REST APIs and integrated data pipelines using Azure Logic Apps and Functions for seamless data exchange.
Conducted AMI data analysis for predictive modeling and reporting, supporting energy consumption insights. Implemented metadata management and data cataloging with Alation, improving data discoverability and lineage tracking.
Designed dimensional data models using star and snowflake schemas, optimizing data warehouse efficiency. Developed and optimized Databricks notebooks for ETL processing, real-time analytics, and machine learning applications.
Automated infrastructure deployment and monitoring using Terraform and Airflow, ensuring operational stability. Designed and maintained data governance frameworks to enforce security, compliance, and data access policies.
Configured Azure Synapse and Snowflake environments for high-performance querying and analytics workloads. Developed CI/CD pipelines using Azure DevOps and Terraform to streamline data infrastructure deployments.
Implemented enterprise-scale Azure-based data lake architecture, integrating Azure Data Lake Storage Gen2 with Azure Synapse and Databricks for end-to-end data engineering solutions. Utilized Azure Monitor and Log Analytics for system diagnostics and proactive performance monitoring across Azure resources.
Environment: Hadoop, Spark, Apache Kafka, MongoDB, SQL, NoSQL, Azure Blob Storage, Azure Data Lake Storage Gen2, Azure Synapse, Azure Monitor, Azure Log Analytics, Azure Gen2, IICS, ADF, Power BI, DBT, Azure Functions, Databricks, Snowflake, Terraform, Airflow, Azure Pipelines.
ORACLE Solution Services Private Limited, IND June 2018 – August 2021
ERP Data Engineer
Responsibilities:
Developed and maintained Oracle-based applications, focusing on data integration and transformation within Oracle ERP and Oracle Integration Cloud (OIC).
Designed and implemented high-performance ETL pipelines to process large volumes of transactional data for financial reporting.
Built complex SQL queries to improve database performance and reduce query execution times, leading to a 25% improvement in data retrieval speed.
Built Delta Live Tables in Databricks to streamline large-scale data transformations. Designed Databricks job clusters to optimize data processing and reduce costs by 20%.
Created interactive dashboards using Power BI and integrated them with Azure Synapse and Azure SQL Database to provide real-time insights into operational performance.
Played a key role in database optimization and data transformation to support analytics and reporting for client operations.
Developed scripts to automate reconciliation processes, reducing manual workload by 60%, and leveraged Azure Data Factory for orchestrating data movement between hybrid systems.
Managed the deployment of custom RESTful APIs and SOAP integrations to facilitate communication between enterprise applications.
Led the preparation of technical specifications for interfaces with external systems, ensuring high standards of data integrity and security.
Collaborated on implementing cloud migration strategies, utilizing Azure Blob Storage and Azure Logic Apps to modernize legacy ERP data workflows.
Performed incremental data loads and change data capture (CDC) strategies using Azure Data Factory and SQL procedures, ensuring near real-time data availability for reporting and analytics.
Environment: Python, SQL, REST API, SOAP, Skyvia, Oracle Integration Cloud (OIC), Oracle ERP, Azure Data Factory, Azure SQL Database, Azure Synapse, Azure Blob Storage, Azure Logic Apps, Power BI, Databricks (Delta Live Tables, Job Clusters), Agile.
Wipro Technologies, Hyderabad, IND April 2015 – May 2018
Software Engineer
Responsibilities:
Designed and implemented Oracle Customer Cloud Service (CCS) solutions for various business modules, including billing, credit collections, and accounting, with integration into Oracle ERP.
Built data ingestion pipelines using Apache Kafka to enable real-time data processing across different enterprise applications.
Developed SQL-based data transformation scripts to cleanse and normalize incoming data, reducing data inconsistency issues by 40%.
Utilized SQL and Python to automate data transformation tasks, improving workflow efficiency and reducing manual data entry by 30%.
Designed and optimized stored procedures to enhance data processing efficiency across large datasets.
Managed the development of microservices-based applications using Java, Spring Boot, and REST APIs to support data-driven enterprise applications.
Implemented CI/CD pipelines with Jenkins and Kubernetes to streamline the development and deployment of data systems.
Worked with stakeholders to define business requirements and provided data-driven insights to improve operational decision-making. Designed automated anomaly detection models using Python to identify discrepancies in financial transactions.
Collaborated with cross-functional teams to implement Oracle Integration Cloud (OIC) workflows, enabling seamless communication between Oracle CCS and external systems, which enhanced system interoperability and reduced manual intervention.
Environment: Oracle Cloud, Oracle CCS, Oracle Integration Cloud (OIC), Oracle ERP, SQL, Python, Java, Spring Boot, REST APIs, Apache Kafka, Jenkins, Kubernetes, GitLab, Agile Methodology.
EDUCATION:
Bachelor of Technology, Computer Science and Engineering June 2011 – April 2015
GITAM University – Visakhapatnam, Andhra Pradesh, India
CGPA: 9.12/10