Data Engineer Real-Time

Location:

North Richland Hills, TX

Salary:

120000

Posted:

May 02, 2025

Contact this candidate

Resume:

Sushil Bhandari

******.*****@*****.*** 682-***-**** LinkedIn

Professional Summary – Data Engineer

Results-driven Data Engineer with over 6 years of experience in designing and implementing scalable data solutions across cloud platforms such as AWS, Google Cloud, and Microsoft Azure, specializing in end-to-end data analytics, back-end and front-end development, and full-stack solutions. Proven expertise in developing and optimizing ETL pipelines, managing data warehouses, data lakes, and ensuring data security and compliance with standards like HIPAA and GDPR in fast-paced environments. I am skilled in data structures, database management—including non-relational databases—and leveraging tools like Apache Spark, AWS Glue, Databricks, and Tableau to enhance real-time data processing, business intelligence, and data storytelling. Adept at building and optimizing data architecture, implementing data governance, and using machine learning for predictive analytics to drive operational improvements, performance management, and business strategy. Experienced in troubleshooting, problem solving, code review, Root Cause Analysis, technical support and data mining, with a focus on customer needs, business objectives, product features, and financial markets and effective communication across teams by utilizing advanced analytics, mathematics, statistics, caching, cookies, and a robust data strategy to deliver actionable insights and support strategic planning. Experienced in delivering cloud-native solutions across IaaS, PaaS, and SaaS models, with a strong focus on AWS security, infrastructure management, and service delivery using Infrastructure-as-Code (IaC) tools such as Terraform and YAML for scalable, secure, and automated deployments. Experienced in supporting end-to-end data science workflows, from data ingestion and feature engineering to model deployment and monitoring, while maintaining code quality and collaboration through version control systems such as Git and GitHub. Demonstrated strong growth mindset, effective problem-solving abilities, and excellent verbal and written communication skills to collaborate across teams and drive impactful data solutions.

Technical Skills:

Cloud Infrastructure & Big Data Technologies: Apache Spark, PySpark, Hadoop, Hive, HDFS, MapReduce, Apache Flink, Databricks, Snowflake, BigQuery, Redshift, ECS, EMR, Datadog, AWS Cloud Watch, Azure Data Lake, GCS, Dataproc, AWS Services, Azure Services, GCP Services

Programming, Scripting & Libraries: Python, Scala, Java, R, SQL (PL/SQL, T-SQL), Bash scripting, Shell Scripting, HTML, Pandas, NumPy, C++, ASP.NET, C#

Data Engineering & ETL: Apache Airflow, dbt, Informatica, Azure Data Factory, AWS Glue, AWS Step Functions, GCP Composer, ETL Pipelines, Dimensional Modeling

Databases & Storage: MySQL, PostgreSQL, Oracle, Teradata, Azure Synapse Analytics, Star Schema, Snowflake Schema, HBase, Cassandra, MongoDB, NoSQL, DynamoDB

File Formats: Parquet, JSON, Avro, ORC, CSV, XML

Streaming & Messaging: Apache Kafka, AWS Kinesis, AWS MSK, GCP Pub/Sub

DevOps, Containerization & CI/CD: Kubernetes, Docker, Terraform, AWS CloudFormation, GCP Deployment Manager, Azure DevOps, Jenkins, Git, CI/CD, Prometheus, many other microservices tools.

Data Visualization & Analytics: Power BI, Tableau, Grafana, Looker, Qlik, Amazon Quicksight, Google Data Studio, Google Analytics, SAP, Radius, Excel, Matplotlib

Machine Learning & AI: TensorFlow, PyTorch, Scikit-learn, Spark ML, BigQuery ML, Predictive Analytics, LangChain

APIs & Metadata Management: RESTful APIs, Azure Purview, AtScale

Project Management & Compliance: Agile (SCRUM), Jira, ServiceNow, SOX Compliance.

Operating Systems: Windows, Linux (Unix), macOS.

Professional Experience

Client: Spring Health, New York, NY Dec 2022 – Till Date

Role: Data Engineer

Responsibilities:

Developed, deployed, and optimized ETL pipelines using Python, AWS Glue, Lambda, Step Functions, and Snowflake for behavioral health analytics, ensuring efficient data ingestion, transformation, and integration.

Designed event-driven, real-time data pipelines with Apache Airflow, AWS Step Functions, and Kafka, enabling seamless streaming and batch data processing from healthcare provider platforms.

Built and optimized data models using SQL, dbt (Data Build Tool), and Snowflake applying data modeling techniques to reduce query complexity by 50%, improving report generation in Looker, Tableau, and Mode Analytics

Integrated AWS and GCP for hybrid cloud data storage, using Google Cloud Storage (GCS) for backup and leveraging AWS S3 and Lake Formation for structured and semi-structured data storage.

Managed and optimized an AWS Redshift-based data warehouse, improving query performance by 35%, ensuring high-performance analytical workloads for clinical and business reporting.

Automated data quality validation using Great Expectations, AWS Glue DataBrew, and Python-based validation scripts, increasing data accuracy by 40% across patient engagement and mental health assessment datasets.

Designed scalable, fault-tolerant data lake architecture using AWS S3, Parquet, Avro, JSON, and Delta Lake formats, ensuring efficient storage and retrieval of large-scale behavioral health records.

Implemented data security and governance controls using AWS IAM, KMS, Snowflake role-based access control (RBAC), and row-level security, ensuring 100% compliance with HIPAA and GDPR.

Automated CI/CD pipeline deployments using Terraform, AWS CodePipeline, Docker, and Kubernetes (EKS), reducing deployment time from 3 days to 3 hours while maintaining robust DevOps best practices.

Optimized batch and streaming data processing using Apache Spark on AWS Glue, reducing data processing time for large datasets by 60%.

Designed scalable data ingestion pipelines integrating Fivetran and Kafka, ensuring seamless data extraction from electronic health records (EHRs), clinical APIs, and third-party mental health applications.

Developed and managed Airflow DAGs to schedule and orchestrate data pipeline workflows, aligning data structure with pipeline execution to optimize data movement across AWS Glue, Redshift, and Snowflake, supporting task automation.

Collaborated with Data Scientists and Machine Learning Engineers, ensuring feature engineering pipelines efficiently transform data for AI-driven mental health treatment recommendations.

Optimized cloud infrastructure cost and performance, leveraging AWS Cost Explorer, Snowflake query optimization, and Redshift auto-scaling, reducing cloud expenses by 15%.

Worked in an Agile development environment, participating in daily standups, sprint planning, and retrospectives, ensuring efficient team collaboration via Jira, Confluence, and GitHub Actions.

Maintained documentation and knowledge sharing through Confluence and API specifications, ensuring clarity on data schema, transformations, business analytics reporting and governance policies.

Key Technologies: AWS Glue, AWS Lambda, AWS S3, AWS Redshift, AWS Step Functions, AWS IAM, AWS KMS, AWS SNS, AWS CloudWatch, AWS CodePipeline, AWS Secrets Manager, Snowflake, Google Cloud Storage (GCS), Apache Airflow, Apache Kafka, Apache Spark (AWS Glue), Fivetran, dbt, PostgreSQL, MySQL, Redshift Spectrum, Parquet, Avro, JSON, Delta Lake, Terraform, Docker, Kubernetes (EKS), Helm, Looker, Tableau, Mode Analytics, Jira, Confluence, Git, Pandas, PySpark, Great Expectations, Datadog, Prefect, Jenkins,Django Data Modeling, Data Structure, Data Science

Client: Wells Fargo, Washington, DC Feb 2020 – Sep 2022

Role: Data Engineer

Responsibilities:

Led migration of data infrastructure to Microsoft Azure and Google Cloud, reducing cloud infrastructure costs by 20% while enhancing scalability and performance for financial datasets.

Developed and optimized ETL pipelines using Apache Spark, Azure Data Factory, Google Cloud Dataflow, and Apache Kafka, decreasing data processing times by 35% and increasing pipeline reliability by 40%.

Built and managed data warehousing solutions on Azure Synapse Analytics, Google BigQuery, and Amazon Redshift, enhancing reporting speed by 25% and enabling real-time data insights for business units.

Designed multi-cloud data lakes with Azure Data Lake, Google Cloud Storage optimize storage efficiency and reducing retrieval latency by 30%.

Ensured compliance with security protocols by implementing encryption and IAM roles, using AWS KMS, Azure Key Vault, and Google Cloud IAM, resulting in zero security breaches during the cloud transition.

Enabled real-time data processing and analytics using Databricks, Apache Flink, and Azure Synapse, leveraging serverless compute resources to improve financial forecasting and risk management by 40%.

Built CI/CD pipelines using Azure DevOps, Terraform, Docker, and GitHub Actions, reducing deployment time by 50% and accelerating the release cycle for cloud-based data solutions.

Integrated data orchestration with Apache Airflow and Google Cloud Composer, automating workflows and improving data pipeline monitoring and management.

Collaborated with data scientists and analysts to develop predictive models for fraud detection and financial forecasting, increasing the accuracy of financial risk models by 30%.

Enhanced data quality using dbt (Data Build Tool), Great Expectations, and AWS Glue DataBrew, reducing inconsistencies in datasets by 25% and improving data validation processes.

Developed data models and materialized views in SQL, Snowflake, and BigQuery, enabling efficient data querying and reducing report generation time by 30%.

Mentored junior engineers on best practices for cloud technologies, data pipelines, and automation, boosting team productivity by 20%.

Integrated machine learning models into production using TensorFlow, MLflow, and Azure ML, reducing false positives in fraud detection by 15% and enhancing risk mitigation strategies.

Key Technologies: Microsoft Azure, Google Cloud Platform (GCP), Amazon Web Services (AWS), Azure Synapse Analytics, Google BigQuery, Amazon Redshift, Azure Data Lake, Google Cloud Storage (GCS), Amazon S3, Apache Spark, Databricks, Apache Flink, Azure Data Factory (ADF), Google Cloud Dataflow, Apache Kafka, Apache Airflow, Google Cloud Composer, AWS Key Management Service (AWS KMS), Azure Key Vault, Google Cloud IAM, Azure DevOps, Terraform, Docker, GitHub Actions, dbt (Data Build Tool), Great Expectations, AWS Glue DataBrew, SQL, Snowflake, TensorFlow, MLflow, Azure Machine Learning (Azure ML)

Client: Texas Mutual Insurance Company, Austin, TX Dec 2018 – Jan 2020

Role: Data Engineer

Responsibilities:

Designed and implemented ETL workflows using AWS Glue, Apache Spark, and Python, optimizing data processing pipelines for claims and underwriting data, improving processing speed by 30%.

Developed and managed data warehouse solutions on Amazon Redshift, enabling real-time reporting and enhancing data retrieval performance by 25%.

Built and automated data pipelines for data integration from multiple sources, ensuring seamless data flow between legacy systems and cloud platforms like AWS S3, reducing integration time by 35%.

Created interactive dashboards and reports using Tableau, empowering business stakeholders with real-time insights and supporting decision-making for claims and risk management teams.

Ensured data security and compliance with HIPAA, GDPR, and AWS best practices by implementing role-based access controls (AWS IAM) and data encryption using AWS KMS.

Collaborated with cross-functional teams, including data scientists and business analysts, to define data requirements and deliver high-quality data-driven solutions.

Migrated legacy data systems to cloud-based infrastructure, improving system scalability, reducing maintenance costs by 20%, and enhancing overall system performance.

Implemented data governance and quality practices, leveraging industry standards to improve data consistency and data quality across multiple data sources and platforms.

Participated in Agile development cycles, contributing to sprint planning and backlog prioritization to ensure the timely delivery of data engineering solutions.

Key Technologies: AWS Glue, Apache Spark, Python, Amazon Redshift, AWS S3, AWS IAM, AWS KMS, AWS Lambda, AWS CloudWatch, AWS Step Functions, AWS Secrets Manager, AWS Athena, AWS RDS, AWS DMS, Tableau, ETL, Data Warehousing, Data Pipeline Automation, Data Integration, Role-Based Access Control (RBAC), Real-Time Reporting, Data Security, Compliance (HIPAA, GDPR), Cloud Migration, Agile, Data Governance, Data Quality, Legacy System Integration, Sprint Planning, Backlog Prioritization, Cross-Functional Collaboration, Mentorship, System Scalability Optimization, Business Intelligence (BI).

Education & Certifications

Master’s in information technology management – Webster University, San Antonio, TX

Languages

English, Japanese (N2)

Contact this candidate