Senior Cloud Data Engineer with 6 Years Experience

Location:

Prosper, TX

Posted:

December 11, 2025

Contact this candidate

Resume:

Name: Ravi A

Email: *********.**@*****.***

Contact: 903-***-****

Senior Data Engineer

Professional Summary:

Accomplished Senior AWS Data Engineer around 6 years of experience architecting and delivering scalable, high-performance data and machine learning solutions across AWS, Azure. Proven expertise in big data engineering, streaming analytics, and multi-cloud integrations, with deep proficiency in Apache Spark, Apache Kafka, Airflow, and Snowflake. Adept at designing end-to-end data pipelines, enabling real-time analytics, predictive modelling. Skilled in building HIPAA-compliant architectures that ensure data security, integrity, and governance. Recognized for bridging data engineering and data science to drive actionable business insights, improve decision-making, and optimize operational efficiency. Apart from this I have good knowledge about statistics and ML regression models.

Technical Skills:

Big Data Ecosystems

Hadoop, Spark(Batch/Structured), Hive, Kafka, HBase, Cassandra, Kubernetes, Databricks, Storm, Sqoop, Pig, Oozie, Hue, and Flume.

Scripting/Programming Languages

Cassandra, Python, Scala, Regular Expressions, Shell scripting, PL/SQL, R, C#, python, C, Bash, Java, SQL, Java Scripting, HTML, CSS.

Databases

Data warehouse, RDBMS, NoSQL (Certified MongoDB), Oracle, PostgreSQL, MySQL.

Tools

Eclipse, JDeveloper, MS Visual Studio, Microsoft Azure HDInsight, Microsoft Hadoop cluster, JIRA, NetBeans, Eclipse

Methodologies

Agile, Scrum, Waterfall, Kanban

Operating Systems

Unix/Linux, Windows

Machine Learning Skills (MLlib)

Feature Extraction, Dimensionality Reduction, Model Evaluation, Clustering, Regressions, Tensorflow, Pytorch

Cloud

AWS (Glue, S3, redshift, Athena, EMR, Lambda), MS Azure,(ADF, ADLS),Snowflake,

Professional Experience:

Client name: USBank July 2024 to Present

Role: Software Engineer/ Data Engineer

Project description: Contributed as a Senior Data Engineer in developing a cloud-native data Lakehouse and real-time fraud analytics platform on AWS. Built scalable batch and streaming pipelines using Kinesis, Spark, EMR, and Glue to ingest, transform, and govern large volumes of customer, account, and transaction data. Implemented secure, metadata-driven workflows, automated orchestration, and governance controls while enabling near real-time fraud detection and advanced analytical insights.

Responsibilities:

Contributed to building a cloud-based Enterprise Data Lake to consolidate data from multiple domains including customers, accounts, transactions, and loans.

Leveraged Kinesis for near real-time data ingestion, enhancing operational and analytical data workflows.

Designed and implemented Kafka-based streaming pipelines to process high-volume transaction and customer event data for near real-time fraud detection.

Developed real-time and batch data processing workflows using Apache Spark, PySpark, and AWS EMR for high-performance analytics.

Managed and optimized Amazon S3-based data lakes with Delta Lake for structured governance, versioning, and schema enforcement.

Configured and integrated Databricks Unity Catalog to manage access control, schema validation and lineage tracking for curated datasets.

Built scalable ingestion pipelines using metadata-driven patterns across multiple source systems. Developed reusable data transformation logic using AWS Glue and stored the data into S3 and Redshift.

Integrated Snowflake with AWS S3 for seamless data access, ingestion, and analytics.

Implemented AWS Step Functions to orchestrate multi-step data processing workflows with error handling and retries.

Managed AWS Glue Data Catalog for metadata management, schema tracking, and dataset discovery.

Reduced fraud detection latency from 5 minutes to under 1 minute using Kinesis and EMR autoscaling

Managed job orchestration using Apache Airflow ensuring SLA adherence and automated recovery.

Collaborated with QA and governance teams to implement data quality validations, deduplication logic, and lineage tracking.

Used Scikit-learn to prepare datasets, engineer features, and fine-tune models to improve detection accuracy.

Automated compliance auditing and reporting pipelines using AWS CloudWatch Logs and AWS Config.

Deployed Spark clusters on AWS EMR with autoscaling to optimize performance and reduce costs.

Created Tableau dashboards using the Fraudulent data for monitoring and reporting.

Environment: AWS(S3,redshift,kensis,EMR, Glue), NoSQL, Mapreduce, Hive, Quilk Replica, HBase, Kafka, Impala, SparkSQL, Spark, Python, Streaming, Eclipse, Jira, Scala, JSON, Oracle, Teradata, PL/SQL

Client Name: Mindtree Aug 2021- Jan 2022

Role: Data Engineer

Project description: Worked as a Data Engineer modernizing aviation operations data systems by migrating legacy pipelines to an AWS Lakehouse architecture. Designed scalable ETL/ELT pipelines using AWS Glue, PySpark, and Databricks, reducing ETL runtimes by 50% and cutting infrastructure costs by 30%. Built real-time ingestion workflows with Kinesis and Lambda for live gate monitoring, improving operational visibility and reducing gate wait times. Developed optimized Redshift data marts, a production feature store for ML workloads (boosting prediction accuracy by 15%), and implemented Lake Formation governance to reduce data trace time by 80%. Collaborated with operations, revenue management, and data science teams to support forecasting, analytics, and efficiency improvements across the airline ecosystem.

Responsibilities:

Optimized the flight operations data pipeline, reducing ETL job runtime by 50% through PySpark code refactoring and leveraging Databricks cluster auto-scaling, enabling faster post-flight analysis.

Migrated legacy on-premises data warehouses to AWS, architecting a scalable data lake on S3 and implementing ETL pipelines using AWS Glue, which cut infrastructure costs by 30% and improved data accessibility.

Orchestrated complex, event-driven data workflows using AWS Step Functions and Lambda, automating the ingestion and processing of diverse data sources including FAA alerts and weather APIs.

Developed and maintained a serverless real-time dashboard using Amazon Kinesis Data Streams and DynamoDB to monitor live airport gate availability, reducing average gate wait times by 10%.

Engineered a scalable data mart in Amazon Redshift, optimizing schemas and distribution styles to accelerate query performance for the revenue management team's reporting and forecasting tools by 40%.

Optimized PySpark jobs with auto-scaling, caching, and broadcast joins cut ETL runtime by 50% and improved SLA adherence; tuned SQL queries for Hive-compatible performance.

Built production feature store with 50+ engineered variables (weather, traffic, maintenance) integrated with MLflow and SageMaker improved delay prediction accuracy 15%.

Containerized data applications with Docker and orchestrated on Kubernetes for portable, consistent deployments across cloud/on-prem.

Cut ETL runtime by 50% for flight operations using optimized PySpark and cluster autoscaling.

Built unified data catalog and lineage in AWS Glue and Lake Formation, cutting trace time 80% and enabling governed, self-service analytics across teams.

Collaborated with revenue management and Data Science teams to evolve complex data models feeding forecasting tools, aligning with business goals for top-line metrics.

Automated the extraction and validation of baggage handling system logs, creating a reliable dataset that identified key bottlenecks and informed a 20% improvement in baggage transfer efficiency.

Environment: Python, PySpark, NoSQL, AWS(S3,Glue,redshift,kensis), Kafka, SparkSQL, Spark Streaming, Eclipse, Jira, Scala, JSON, Oracle, Teradata, CI/CD, PL/SQL UNIX Shell Scripting.

Client: Accenture Apr 2018 to Aug 2021

Role: Data Engineer

Project description: Contributed in development of a large-scale clinical data warehouse on Azure for a major integrated healthcare provider, consolidating patient, claims, and EHR data from multiple hospital networks into a unified analytics platform. Built scalable ingestion frameworks with Azure Data Factory, Azure Data Lake Storage, Databricks, and HDInsight for processing structured, semi-structured, and unstructured healthcare data. Created Spark-based transformations in Scala and Python, integrating with Hive, Snowflake, and MongoDB for advanced analytics. Leveraged Delta Lake for historical tracking and slowly changing dimension (SCD) management. Automated ETL processes and data quality checks via Unix Shell scripting to improve reliability and reduce errors. Implemented event-driven ingestion through Azure Event Grid and deployed predictive analytics in Azure Synapse to forecast patient readmissions and optimize resource allocation. Maintained strict HIPAA compliance through encryption, role-based access controls, and continuous audit logging.

Responsibilities:

Designed end to end scalable architecture to solve business problems using various Azure Components like HDInsight, Data Factory, Data Lake, Storage and Machine Learning Studio.

Specialize in designing and implementing clinical data warehouses that consolidate patient data from multiple sources for comprehensive analysis and reporting.

Designed and developed batch ingestion pipelines using Azure Data Factory and Databricks to load clinical, EHR, and claims data from large Oracle systems into Azure Delta Lake.

Worked on Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, and Azure DW) and processing the data in In Azure Databricks.

Used Azure Databricks for fast, easy, and collaborative spark-based platform on Azure.

Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that processes the data using the SQL Activity.

Maintained and optimized Oracle database connectivity and schema integrity to ensure reliable, high-volume transfer of patient data into the unified clinical data warehouse on Azure.

Tuned Spark configurations (e.g., batch duration, trigger intervals, memory overhead) to achieve lowlatency stream processing.

Designed Tableau dashboards to visualize outputs from predictive analytics models deployed in Azure Synapse, specifically tracking and reporting on key metrics like patient readmission forecasts and resource allocation optimization.

Managed Snowflake and MongoDB databases, ensuring data integrity, security, and scalability.

Proficient in SQL for querying and managing relational databases, facilitating data analysis and reporting.

Integrated SAP HANA ad BW into Azure synapse using ADF for unified healthcare analytics

Experienced in HiveQL for big data processing, enabling effective management of large datasets.

Utilized Scala for scalable data processing solutions, improving overall system performance and reliability.

Used Azure Event Grid for managing event service that enables you to easily manage events across many different Azure services and applications.

Used Azure Key Vault for managing secrets and securely accessing REST API credentials within ADF pipelines

Used Azure Data Catalog which helps in organizing and to get more value from their existing investments.

Environment: Azure (Storage, DW, ADF, ADLS, Databricks), AWS Redshift, Ubuntu 16.04, Hadoop 2.0, Spark (PySpark, Spark streaming, Spark SQL, SparkMLlib), Mapreduce, Hive, Nifi, Jenkins, Pig 0.15, Python 3.x (Nltk, Pandas), AI/ML, Tableau 10.3, Git.

CERTIFICATION – AWS Certified Data Engineer-associate (DEA-C01)

Link: https://www.credly.com/badges/d6e6455a-a6c2-4dce-81a9-f17c5aff964b/public_url

Contact this candidate