Data Engineer Business Intelligence

Location:

Dallas, TX

Posted:

October 16, 2025

Contact this candidate

Resume:

Nagamohan Dande

AWS Data Engineer

Phone:+1-972-***-****

Email : ************@*****.***

PROFESSIONAL SUMMARY:

• Over 7 years of experience in Data Engineering and Python Development, specializing in designing, developing, and optimizing enterprise-level data models, ETL pipelines, and data warehousing solutions to enhance business intelligence and decision-making.

• Expertise in Big Data storage, processing, and analysis across AWS, Azure, and GCP, ensuring scalable and efficient data workflows.

• Extensive experience with distributed computing architectures, including Hadoop, Snowflake, Apache Spark, and Python-based data processing, leveraging MapReduce and SQL for managing large datasets.

• Strong proficiency in AWS cloud services, including EC2, S3, RDS, IAM, Glue, Kinesis, Lambda, EMR, Redshift, and DynamoDB, with expertise in designing, configuring, and managing cloud environments. Skilled in CloudFormation for automating infrastructure deployment using Infrastructure as Code (IaC) principles.

• Hands-on experience in real-time data streaming and processing with Apache Kafka and Spark Streaming, ensuring seamless data ingestion, transformation, and analysis.

• Proficient in data preprocessing, cleaning, and transformation using Python libraries such as Pandas, NumPy, and PySpark, ensuring high data quality for analytics and reporting.

• Strong expertise in SQL and PL/SQL, working extensively with Teradata, Oracle, and NoSQL databases, optimizing queries, stored procedures, and database schemas.

• In-depth knowledge of data modeling techniques, including dimensional modeling, Star & Snowflake Schema, OLTP/OLAP systems, and normalization vs. denormalization strategies to enhance data storage and retrieval efficiency.

• Experience in ETL orchestration and automation using tools like Talend, Informatica, and Apache Airflow, enabling seamless data integration across multiple sources and platforms.

• Skilled in the Hadoop ecosystem, including HDFS, MapReduce, Hive, Pig, HBase, Storm, Oozie, Sqoop, and Zookeeper, managing large-scale data storage and distributed processing.

• Experience in containerizing data processing applications with Docker and orchestrating them using Kubernetes to ensure scalability and fault tolerance.

• Proficiency in Apache Spark components, including Spark Core, Spark SQL, Spark Streaming, DataFrames, Datasets, and Spark ML, for building high-performance data pipelines and machine learning applications.

• Strong background in data ingestion from various sources using Sqoop, efficiently transferring structured and unstructured data into HDFS and Hive tables for analytics and processing.

• Expertise in cloud infrastructure automation using CloudFormation, Terraform, and CI/CD pipelines to streamline cloud-based data environment deployment and management.

• Hands-on experience with Jenkins, GitHub Actions, and AWS CodePipeline for automating deployment, testing, and monitoring of ETL pipelines.

• Extensive experience in cloud architecture, development, and data analytics within AWS ecosystems.

• Proficient in version control and collaboration tools, including Git, BitBucket, and SVN, ensuring seamless code management and integration.

• Strong experience with C# for software development and integrating data platforms for seamless data processing.

• Exceptional analytical and problem-solving skills, optimizing data workflows, improving system performance, and driving data-driven decision-making for enterprise solutions.

• Comprehensive knowledge of Big Data Analytics, including installation, configuration, and utilization of ecosystem components such as Hadoop MapReduce, HDFS, HBase, Zookeeper, Cloud Functions, Hive, Sqoop, Pig, Flume, Cassandra, Kafka, Spark, Oozie, and Airflow.

• Proficiency in Relational Databases like MySQL, Oracle, and MS SQL Server, as well as NoSQL databases such as MongoDB, HBase, and Cassandra.

EDUCATION: Master’s in business Analytics from the East Texas A&M University,2024 T ECHNICAL SKILLS:

AWS EC2, S3, Glacier, Redshift, RDS, EMR, Lambda, Glue, CloudWatch, Kinesis, CloudFront, Route53, DynamoDB, Code Pipeline, EKS, Athena, Quick Sight ETL Tools AWS Glue, Azure Data Factory, Airflow, Spark, Sqoop, Flume, Apache Kafka, Spark Streaming

Programming and

Scripting

Spark Scala, Python, Java, MySQL, PostgreSQL, Shell Scripting, Pig, HiveQL Data Warehouse AWS RedShift, Snowflake, Teradata.

SQL and NoSQL

Databases

Oracle DB, Microsoft SQL Server, PostgreSQL, MongoDB and Cassandra Monitoring Tools Splunk, Chef, Nagios, ELK

SourceCode

Management

Bit-Buckets, Nexus, GitHub

Containerization Docker, Kubernetes, OpenShift

Hadoop Tools HDFS, HBase, Hive, YARN, MapReduce, Pig, Apache Storm, Sqoop, Oozie, Zookeeper, Spark, SOLR, Atlas

Build & Development

Tools

Jenkins, Maven, Gradle, Bamboo

Methodologies Agile/Scrum, Waterfall

PROFESSIONAL EXPERIENCE:

NextEra Energy, Juno Beach, FL September 2022 –Present AWS Data Engineer

Responsibilities:

• Designed and implemented real-time data streaming pipelines using AWS Kinesis to collect and process IoT device and field sensor data.

• Utilized Apache Kafka and Spark Streaming for high-throughput data ingestion and real-time analytics of operational metrics.

• Integrated Flume and Sqoop to facilitate seamless data movement between structured and unstructured sources.

• Optimized large-scale oil and gas data queries using Teradata for efficient storage and retrieval.

• Engineered Python-based ETL pipelines with AWS Glue and PySpark to process operational data at scale.

• Established a data lake architecture on Amazon S3 and Glacier, enabling scalable storage for structured and unstructured datasets.

• Leveraged AWS Redshift, Snowflake, and Teradata for efficient operational data storage and retrieval.

• Scheduled and managed batch jobs with AutoSys to ensure timely data ingestion and processing.

• Implemented error-handling mechanisms in Shell scripts for failure tracking, alerting, and logging.

• Managed DynamoDB to provide low-latency access to well and equipment information, improving operational efficiency.

• Used Sqoop to transfer data between Teradata and Hadoop for analytics purposes.

• Designed and optimized ETL pipelines using AWS Glue and Airflow for data cleaning and transformation.

• Developed data transformation scripts with Pandas and NumPy to ensure high data quality and consistency.

• Leveraged Apache Spark on AWS EMR for large-scale data transformations, accelerating processing for well and production metrics.

• Implemented DBT (Data Build Tool) for modular and scalable data transformations within Redshift and Snowflake.

• Developed Python-based data movement applications to enhance system performance with low latency and high throughput.

• Automated ETL workflow orchestration using Apache Airflow for scheduling and monitoring.

• Utilized Amazon Athena and Redshift to query massive datasets and execute complex SQL queries.

• Created interactive dashboards with AWS QuickSight, Tableau, and Splunk to visualize production metrics, equipment performance, and real-time operational insights.

• Processed sensor logs and telemetry data for predictive analytics, enhancing proactive maintenance strategies.

• Built real-time data streaming solutions using Kafka and Spark Streaming, with Python-based producer/consumer services.

• Enforced security policies with AWS IAM, KMS, and CloudTrail, ensuring role-based access control, encryption, and audit logging.

• Configured AWS CloudWatch and Nagios for real-time monitoring and alerting on data pipeline performance.

• Implemented data governance frameworks using AWS Lake Formation and Apache Atlas to ensure compliance with industry standards.

• Automated infrastructure provisioning using AWS CloudFormation, Terraform, and Kubernetes (EKS/OpenShift) for scalable deployments.

• Deployed containerized ETL applications with Docker, Kubernetes, and OpenShift, ensuring consistency and portability.

• Managed CI/CD automation using AWS CodePipeline and Jenkins to streamline data pipeline deployment and updates.

• Monitored system health and optimized data pipelines using AWS CloudWatch, ELK Stack, and Splunk.

• Optimized SQL queries and PL/SQL procedures for PostgreSQL, Oracle DB, and Microsoft SQL Server to improve execution efficiency.

• Configured PostgreSQL replication and clustering to ensure high availability and fault tolerance for critical data.. Fifth Third Bank, Chicago, IL December 2019 –August 2022 Cloud Data Engineer

Responsibilities:

• Developed a centralized platform to integrate key operational data, including equipment statuses, well details, and production metrics.

• Designed and developed an enterprise-grade cloud data warehouse to enhance risk assessment and business intelligence (BI) capabilities, ensuring efficient data storage, processing, and reporting for financial risk management.

• Built a scalable data warehouse using Amazon Redshift, Google BigQuery, and Snowflake to store financial and risk-related data.

• Integrated AWS S3, Google Cloud Storage, and Azure Blob Storage for secure storage of structured and unstructured data.

• Configured RDS (PostgreSQL, MySQL) and NoSQL databases like DynamoDB and MongoDB to optimize data retrieval and storage.

• Implemented Python-based SQL query execution within ETL workflows to optimize transformations in Redshift, Snowflake, and BigQuery.

• Developed ETL pipelines using AWS Glue, Apache Airflow, and Azure Data Factory to streamline data ingestion, transformation, and loading.

• Configured and scheduled Autosys jobs for orchestrating ETL workflows, reducing manual effort in pipeline execution.

• Processed large-scale datasets using Apache Spark (PySpark/Scala), SQL, and Hadoop-based tools (HDFS, Hive, Pig) to enhance performance and reliability.

• Leveraged Kafka and Kinesis for real-time data streaming and event-driven architectures.

• Maintained and optimized UNIX-based scripts for system monitoring, reducing downtime and improving operational efficiency.

• Developed interactive dashboards and reports using Tableau, Power BI, and AWS QuickSight to provide real-time insights into financial risks and operational efficiency.

• Enabled self-service BI by integrating Amazon Athena and Google BigQuery for ad hoc querying and analytics.

• Enhanced query performance using indexing, partitioning, materialized views, and caching strategies in Redshift, BigQuery, and Snowflake.

• Implemented IAM roles, encryption mechanisms (KMS), and fine-grained access controls to ensure data security and compliance with regulatory standards like GDPR, CCPA, and PCI-DSS.

• Configured AWS CloudTrail and CloudWatch for auditing, monitoring, and anomaly detection in data workflows.

• Designed Python-driven monitoring solutions using CloudWatch, Splunk, and ELK for real-time anomaly detection.

• Automated data pipeline orchestration using Apache Airflow, AWS Lambda, and Kubernetes (EKS/OpenShift) to ensure high availability and fault tolerance.

• Managed CI/CD pipelines for data workflows using Jenkins, GitHub Actions, and AWS CodePipeline to ensure seamless deployment and version control.

• Deployed containerized workloads with Docker and Kubernetes, ensuring portability and scalability of data processing applications.

• Utilized Splunk, ELK (Elasticsearch, Logstash, Kibana), and AWS CloudWatch for real-time monitoring, log analysis, and proactive incident resolution.

• Implemented disaster recovery (DR) and backup strategies using AWS S3 Glacier, Cross-Region Replication (CRR), and automated snapshots to ensure business continuity. Vizient, Inc, Irving, TX Sep 2017 – November 2019

Data Engineer

Responsibilities:

• Developed and optimized automated underwriting and pricing models using big data analytics, ETL workflows, and real-time BI dashboards to improve policy pricing strategies.

• Designed and built scalable ETL pipelines with Apache Airflow, DBT, and AWS Step Functions to process and transform underwriting data efficiently.

• Ingested and processed large volumes of structured and unstructured data from SQL databases (PostgreSQL, SQL Server, Oracle), NoSQL databases (MongoDB, DynamoDB), and cloud storage solutions (AWS S3, Azure Blob, GCS).

• Developed Python-based risk assessment models utilizing Pandas, NumPy, and Scikit-learn to enhance underwriting accuracy.

• Automated data integration from insurance claims, customer records, and financial transactions using AWS Glue, Apache Kafka, and Spark Streaming.

• Leveraged Hadoop (HDFS, Hive, Pig) and Apache Spark (PySpark, Scala) to analyze large-scale historical claims data, identifying patterns and trends for risk assessment.

• Containerized Python applications with Docker and orchestrated them using Kubernetes (EKS) to ensure high availability and scalability.

• Integrated Shell Scripts with Autosys to trigger real-time event-driven workflows, ensuring seamless execution of data pipelines.

• Implemented machine learning models with AWS SageMaker and Spark MLlib to predict underwriting risks and optimize pricing strategies.

• Utilized Amazon Athena and Redshift Spectrum for ad hoc querying and real-time analytics on insurance datasets.

• Designed and developed real-time BI dashboards and reports using Tableau, Power BI, and AWS QuickSight to visualize underwriting risk factors and policy pricing trends.

• Enabled self-service analytics by integrating Redshift and Snowflake with BI tools, empowering underwriters with actionable insights.

• Created custom reports and visualizations to track risk exposure, claim trends, and pricing adjustments across different policy categories.

• Optimized data pipeline performance through indexing, partitioning, and caching in Amazon Redshift, Snowflake, and BigQuery.

• Ensured compliance with HIPAA, GDPR, and SOC 2 by implementing robust security measures, including IAM, AWS KMS encryption, and role-based access controls.

• Configured AWS CloudWatch, Splunk, and the ELK Stack to monitor data workflows, detect anomalies, and trigger automated alerts.

• Streamlined CI/CD workflows for data pipelines using Jenkins, GitHub Actions, and AWS CodePipeline.

• Deployed containerized ETL jobs using Docker and Kubernetes (EKS, OpenShift) to enhance scalability and portability.

• Implemented disaster recovery (DR) and backup strategies leveraging AWS S3 Glacier, automated snapshots, and cross-region replication.

Contact this candidate