Data Engineer, Dataiku DSS, Azure Synapse, Hadoop, Hive

Location:

Jacksonville, FL

Posted:

July 24, 2025

Contact this candidate

Resume:

Avinash Gopinath

*******.**********@*****.***

+1-904-***-****

Professional Summary

Big Data Engineer/Dataiku Developer with 11+ years of experience in designing scalable data architectures, optimizing ETL pipelines, and automating workflows for high-performance analytics.

Expertise in Big Data processing, distributed computing, and cloud-based data solutions, leveraging Dataiku DSS, Python and Snowflake for efficient data integration and transformation.

Architected large-scale ETL pipelines using Python, SQL, and Spark, reducing data processing time by 50% and improving operational efficiency across finance, healthcare, and telecommunications sectors.

Designed and optimized cloud-based data warehouses (Snowflake, Redshift, Big Query, Azure SQL), implementing partitioning and clustering strategies to enhance query performance by 60%.

Expert in Dataiku DSS, managing and optimizing large-scale clusters to handle high-volume data warehousing and analytics workloads, ensuring high availability and fault tolerance.

Developed serverless data pipelines in Dataiku DSS enabling cost-effective, scalable ETL solutions for structured and unstructured data sources.

Strong expertise in data governance, security, and compliance, implementing RBAC policies, encryption, and data quality monitoring frameworks to align with GDPR and HIPAA.

Automated data workflows using Alteryx, and MS Power Automate, reducing manual data processing efforts by 80%.

Developed interactive dashboards in Power BI, Tableau, and Spotfire, integrating real-time data streams and optimizing query execution for improved analytics.

Led predictive ML model automation, collaborating with data scientists to streamline model outputs, parse large parquet files, and enhance model accuracy.

Conceptualized and developed an in-house resource management tool, leveraging SharePoint, Dataiku, and Power Automate, saving $60,000 annually.

Automated Excel reporting workflows with Python and VBA, cutting manual processing time by 80% and improving data accuracy.

Optimized ETL workflows for high-performance analytics, integrating Apache Spark, SQL, and Python-based transformations for structured and unstructured data processing.

Triaged and resolved data pipeline failures in production, ensuring minimal downtime and seamless business operations.

Developed full-stack applications using Java, JSP, and Servlets, integrating data processing APIs for real-time business intelligence and reporting

Technical Skills:

Cloud & Big Data Technologies: Dataiku DSS, AWS (Glue, S3, Redshift), Hadoop, Hive, Snowflake.

ETL & Data Engineering: Dataiku DSS, Alteryx, Apache Spark, Python (Pandas, PySpark), SQL, Airflow.

Database Management & Query Optimization: T-SQL, PL/SQL, Redshift, Snowflake, Oracle, MySQL, Postgres, Stored Procedures, Data Modeling, Performance Tuning.

Data Warehousing & Pipelines: Dataiku DSS, Snowflake, AWS Redshift, Azure Synapse, Delta Lake, Kafka.

Reporting & Visualization: Power BI, Tableau, MicroStrategy, Spotfire, Excel (Power Query, VBA).

Programming & Scripting: Python (Pandas, NumPy, PySpark), SQL, Java, VBA, Shell Scripting.

Data Governance & Security: Data Quality, Compliance, GDPR, HIPAA, Role-Based Access Control (RBAC), Encryption Standards.

Professional Experience:

Trane Technologies – Jacksonville, FL May 2025 – Present

Senior Big Data Developer – Dataiku DSS

Project: Alteryx to Dataiku Migration Initiative

Objective: Led the enterprise-wide migration of data workflows from Alteryx to Dataiku DSS, modernizing the data analytics infrastructure to enhance scalability, governance, and automation across business units.

Responsibilities:

Spearheaded the end-to-end migration of legacy Alteryx workflows to Dataiku DSS, ensuring functional parity, performance optimization, and data governance compliance.

Redesigned existing pipelines using Dataiku Visual Recipes, Python, and SQL, improving maintainability and scalability.

Developed custom plugins and reusable components in Dataiku to replicate Alteryx macros and streamline repetitive tasks.

Implemented validation frameworks using Dataiku Metrics, Checks, and Scenarios to ensure data integrity and automate quality assurance.

Created parameterized Dataiku Applications to encapsulate complex workflows, enabling self-service analytics for business users.

Conducted performance benchmarking using Spark and Snowflake, reducing processing time by over 40%.

Delivered training sessions and documentation to upskill teams on Dataiku best practices, accelerating adoption and reducing legacy tool dependency.

Collaborated with IT and data governance teams to establish standardized development practices, version control, and deployment pipelines.

Environment: Dataiku DSS (Code Recipes, Scenarios, Metrics, Checkpoints, Plugins), Alteryx Designer & Server, Snowflake, Tableau, Python, SharePoint

Standard Chartered GBS – Chennai, IN October 2023 – March 2025

Senior Big Data Developer

Project: Operations Volume Data Integration and Metrics Dashboard

Objective: Designed and implemented a comprehensive data integration platform using Dataiku DSS and Power BI to streamline operational volume data across multiple business groups and countries, enabling real-time volume metrics for key stakeholders.

Responsibilities:

Led the design and implementation of scalable data pipelines in Dataiku DSS, integrating operational volume data from various TP systems across multiple geographies.

Automated ETL processes to extract, transform, and load (ETL) operational data into Azure SQL-based data warehouses, enabling real-time analytics and reducing manual interventions by 80%.

Optimized SQL queries and partitioning strategies, improving data retrieval performance by 60%.

Managed and optimized Dataiku DSS clusters to handle large-scale operational data, ensuring high availability and fault tolerance for global operations.

Leveraged Apache Spark within Dataiku to process and aggregate millions of records, reducing data transformation times by 50%.

Architected a data processing pipeline on Azure, integrating Dataiku DSS, Azure SQL, and Power BI, ensuring scalable and reliable data flow.

Integrated Azure Data Factory for seamless data orchestration and implemented MS Power Automate for backend process automation, saving 60+ hours/month.

Designed normalized and denormalized data models to support fast analytical queries and interactive dashboards in Power BI.

Applied incremental data loading techniques, reducing ETL processing times by 40% and optimizing storage efficiency.

Implemented data governance frameworks ensuring data quality, lineage, and compliance with regulatory standards across multiple regions.

Configured role-based access controls (RBAC) and Azure IAM policies, ensuring secure data access for business groups.

Developed interactive dashboards in Power BI, enabling real-time monitoring of operational volume metrics for stakeholders, reducing reporting time by 70%.

Conceptualized, designed, and developed a resource management tool using SharePoint, Dataiku, and Power Automate, saving the department $60,000 annually.

Automated data collection and backend processing using Python, reducing manual interventions by 50%.

Environment: Dataiku DSS, Power BI, Azure SQL, Python, Apache Spark, Azure Data Factory, MS Power Automate, Tableau, MicroStrategy.

i2econsulting – Groton, CT August 2020 – October 2023

Technical Lead – Big Data

Project: ML Model Output Parsing and Data Visualization

Objective: Designed and implemented a scalable data processing solution for parsing and transforming ML model outputs, provided in complex zip files containing JSON mappings and large parquet datasets. The processed data was normalized, loaded into Snowflake, and used for interactive Power BI dashboards, delivering actionable insights.

Responsibilities:

Architected and implemented an end-to-end ETL pipeline to process ML model outputs, handling complex nested JSON mappings and parquet files.

Developed Python-based ETL scripts to extract, normalize, and transform data, ensuring seamless integration into Snowflake.

Designed incremental ETL workflows using Dataiku DSS, reducing processing time by 50% and enhancing data freshness.

Optimized data parsing and transformation workflows using Apache Spark, ensuring efficient processing of terabyte-scale ML model outputs.

Utilized AWS Glue for serverless ETL processing, enhancing scalability and reducing operational overhead.

Implemented PySpark-based transformations to clean and enrich raw ML model data for analytics.

Built a cloud-native data architecture on AWS, integrating S3, Glue, and Snowflake, ensuring high availability and cost efficiency.

Developed partitioning and clustering strategies in Snowflake, optimizing query performance by 60%.

Automated data ingestion from AWS S3, SharePoint, and NAS Drives, ensuring seamless cross-platform data availability.

Designed efficient data models for ML output storage in Snowflake, improving analytics performance and storage efficiency.

Wrote and optimized complex SQL queries for high-performance analytics, reducing execution time by 40%.

Implemented data governance best practices for ML model output processing, ensuring data quality, security, and lineage tracking.

Created custom JSON-based data visualizations for advanced analytics in Tableau and Spotfire.

Designed and deployed fully automated ETL processes in Alteryx, eliminating manual intervention and improving reliability.

Developed Python-based automation scripts for data transformation, report generation, and anomaly detection.

Monitored data pipeline performance, triaging production issues and implementing root cause fixes with minimal downtime.

Designed error-handling mechanisms for data integrity checks and anomaly detection, improving data reliability.

Environment: Dataiku DSS, Power BI, AWS Glue, Apache Spark, Python, Snowflake, Alteryx, Tableau, Spotfire, Oracle, Redshift, MySQL

Cloud Big Data Technologies LLC – Dallas, TX February 2019 – December 2019

Senior Big Data Engineer

Objective: Developed and managed a highly scalable predictive analytics platform using Dataiku DSS, Power BI, and distributed computing frameworks, integrating machine learning models into production workflows to enable data-driven decision-making.

Responsibilities:

Designed and developed end-to-end data pipelines for predictive ML models, automating data ingestion, transformation, and feature engineering in Dataiku DSS.

Built real-time and batch ETL workflows using Hive, SQL, and Python, optimizing data flow across AWS S3, Redshift, Teradata, and Oracle.

Integrated structured and unstructured data sources into a unified data warehouse, improving data processing efficiency and enabling seamless analytics.

Developed and optimized high-performance data pipelines on Hadoop and Hive, leveraging custom UDFs (User-Defined Functions) in Python for advanced data processing.

Automated large-scale data transformations using PySpark and Hive queries, improving query execution speed by 50%.

Implemented real-time data parsing of nested JSON structures, using custom Hive functions and Python-based transformations.

Automated end-to-end ML data pipeline orchestration using Jenkins and Shell scripting, ensuring continuous integration and deployment (CI/CD).

Developed serverless data ingestion workflows using AWS S3 and Redshift, enabling scalable storage and processing for predictive modeling.

Designed efficient data models in Redshift, Hive, and Azure SQL, ensuring optimized storage and retrieval for ML-ready datasets.

Optimized SQL queries for data retrieval and model training, reducing query execution time by 40%.

Created master data tables by aggregating Teradata, MySQL, and Oracle data sources, enabling enhanced feature engineering for ML models.

Collaborated with data scientists to create feature-engineered datasets, improving model accuracy and efficiency.

Developed interactive Power BI dashboards to visualize ML model predictions and insights, enabling real-time monitoring and decision-making.

Implemented model monitoring pipelines to track prediction drift and model performance over time.

Implemented data security policies within Dataiku DSS and cloud environments, ensuring compliance with GDPR and industry standards.

Enforced role-based access controls (RBAC) and encryption protocols to secure sensitive business data.

Environment: Hadoop, Dataiku DSS, Power BI, Hive, SQL, Azure SQL, Redshift, Teradata, AWS S3, PySpark, Jenkins, Shell Scripting.

Fahrenheit IT – Indianapolis, IN November 2018 – January 2019

Senior Operations Engineer

Responsibilities:

Automated reporting processes for the Operations team using VBA, significantly reducing manual effort and increasing data-driven decision-making capabilities.

Built and optimized SQL queries to support data analysis and reporting requirements.

Interactions LLC – Indianapolis, IN September 2016 – August 2018

Senior Operations Analyst

Responsibilities

Designed and developed high-performance dashboards in Power BI and Tableau, enabling senior executives to gain real-time insights for data-driven decision-making.

Automated data extraction and transformation processes using AWS Redshift and SQL, ensuring efficient and scalable data reporting.

Integrated data from Teradata, Redshift, and Oracle, optimizing data pipelines to reduce query execution time by 40%.

Implemented ETL pipelines for cost monitoring in AWS Redshift, identifying overage expenses and enabling monthly savings of $25,000.

Developed automated cost-tracking reports, ensuring timely analysis of vendor expenses and performance optimization.

Saved $90,000 per quarter by designing a scalable process automation framework, improving operational efficiency and maintaining 95%+ performance benchmarks.

Conducted in-depth data analysis using Redshift, SQL, and Teradata, extracting actionable insights that streamlined business operations.

Designed diagnostic monitoring solutions, enabling proactive issue resolution and minimizing downtime for critical business functions.

Partnered with quality managers and senior executives to investigate, analyse, and resolve key business challenges, ensuring data integrity and compliance.

Provided ad-hoc analytical insights and deep-dive investigations using Redshift and SQL, supporting on-demand executive reporting.

Communicated complex analytical findings in a clear, concise, and actionable manner, fostering collaboration across business, vendor, and technical teams.

Environment: AWS Redshift, Power BI, Tableau, Teradata, SQL, Oracle, Google Sheets, ProModel.

Neptune Information Solutions Pvt. LTD. – India. July 2012- June 2014

Senior Software Engineer

Responsibilities

Developed and optimized scalable applications using Java, JSP, and Servlets, ensuring high performance and reliability in enterprise-level environments.

Led cross-functional collaboration, gathering detailed business requirements from internal stakeholders to align development with strategic goals.

Architected and implemented system design, executing code reviews, integration testing, and performance optimizations to improve application efficiency.

Enhanced data collection and processing workflows, optimizing data pipelines for real-time and scheduled reporting at weekly, monthly, and quarterly intervals.

Designed and developed executive-level reports and dashboards, delivering data-driven insights for senior management to support strategic decision-making.

Environment: Java, JSP, Servlets, SQL, Cloud-Based Data Processing, Automation, Data-Driven Analytics.

Certification:

Successfully completed Training for Developer Certificate.

Dataiku Core Designer certification and Dataiku Advanced Designer.

Working on AWS Cloud Practitioner Certification.

Project Management, Project Management Institute Certified. Licence No.: #2669IME614

Educational Qualification:

Master of Science – Industrial Engineering from Western Michigan University, USA

Master of Science – Mechanical from Indian Institute of Technology Kharagpur, India

Bachelor of Science – Mechanical Engineering from Osmania University, India

Contact this candidate