Devi Ruwali, Data Engineer DataOps
Levittown, NY 11756
347-***-****, ****************@*****.*** https://www.linkedin.com/in/devi-ruwali/
Objective
Seeking a Senior Data Engineer role in a forward-thinking organization where creativity, sincerity, skill, and performance are the measures of growth and recognition, backed by 6+ years of expertise in cloud-native data pipelines, distributed architectures, and DevOps best practices to deliver impactful, data-driven business outcomes.
Summary
Designed cloud-native data pipelines in AWS and GCP, leveraging Terraform for infrastructure as code and enabling automated, scalable deployments across multiple environments.
Engineered Big Data ETL workflows using Spark (PySpark/Scala) and Hive, efficiently processing multi-terabyte data sets and cutting query times from hours to minutes.
Employed containerization with Docker and Kubernetes to create fault-tolerant microservices, minimizing overhead and simplifying the deployment process in dev/test/prod.
Built DataOps workflows in Airflow and AWS Step Functions, reducing data latency by 50% and accelerating near real-time analytics for high-impact use cases.
Automated CI/CD pipelines with Jenkins, GitHub Actions, and Git, implementing comprehensive testing and validation to streamline release cycles.
Enforced data governance via Collibra and Great Expectations, maintaining lineage, metadata, and compliance standards in regulated environments.
Secured enterprise data with IAM roles/policies, encryption at rest/in transit, and OAuth2, ensuring robust RBAC to meet strict compliance and prevent unauthorized access.
Deployed and optimized Snowflake data warehouses, integrating advanced partitioning and clustering strategies to halve query durations and maximize performance.
Integrated Kafka and RabbitMQ for real-time streaming and asynchronous message flows, boosting operational responsiveness and facilitating rapid event-driven processing.
Aligned with Agile/Scrum practices, collaborating closely with cross-functional teams (Data Scientists, Risk, Analytics) to deliver solutions that meet evolving business needs.
Utilized multiple languages—Python, Scala, Java, SQL, and JavaScript—to develop full-stack solutions, from backend processing to frontend integrations.
Guided junior engineers on DevSecOps best practices, code optimization, and data engineering frameworks, cultivating a dynamic culture of continuous improvement.
Developed self-service infrastructure through Terraform, cutting manual configuration by 80% and empowering teams to provision their own resources.
Improved high-volume Spark clusters using partitioning, caching, and tuning, enabling consistent handling of large-scale data sets with near real-time processing speeds.
Created robust testing strategies (Postman, Rest Assured, TestNG, and Cucumber) for RESTful APIs, microservices, and SQL validations to ensure data integrity and code reliability.
Provided ML-ready datasets and real-time dashboards (e.g., Power BI), driving data-driven decisions for varied stakeholders and accelerating analytics initiatives.
Embraced serverless compute (AWS Lambda) alongside Docker/Kubernetes to optimize costs, scale processing on demand, and support hybrid cloud environments.
Technical Skills
Programming Languages
Python, Scala, SQL, Java, JavaScript
Data & Analytics
Spark (PySpark, Scala), Hive, Kafka, RabbitMQ, dbt, ML (Forecasting, Analytics), Airflow, AWS Step Functions, Great Expectations, Power BI
Cloud & Infrastructure
AWS (S3, Redshift, EMR, Lambda), GCP, Terraform, Docker, Kubernetes
Databases & Warehouses
Oracle, DB2, Snowflake, Redshift
DevOps & CI/CD
Jenkins, GitHub Actions, Git, CI/CD pipelines, Maven, Microservices Architecture
Security & Governance
IAM roles/policies, Encryption (at rest/in transit), OAuth2, Role-Based Access Control (RBAC), Collibra (Data Governance), Data Lineage Tracking
Testing & Automation
Postman, Rest Assured, TestNG, Cucumber, JUnit, Page Object Model (POM), Automated Scripts, Root-cause Analysis
Frameworks
Java Spring (backend), Flask/FastAPI (Python), React (front-end), Containerized Services (Docker, Kubernetes), Serverless (AWS Lambda)
Project Methodologies
Agile (Scrum), DevSecOps, Continuous Improvement, Sprint Planning, Retrospectives
Professional Experience
Vanguard, Malvern, PA, April 2023 – Present
Data Engineer, Specialist
Portfolio Analytics Tool is a real-time, cloud-native platform for on-demand risk/return diagnostics, portfolio comparisons, and hypothetical performance modeling. It integrates streaming pipelines (Kafka, Spark), ML-driven forecasting, and a self-service portal for scenario analyses—all fortified by IAM encryption and Collibra-based governance to ensure compliance.
Responsibilities:
Collaborated with cross-functional stakeholders (Product, Risk, Analytics) to translate complex business specifications into technical designs and production-ready code, delivering on-demand risk analyses and reducing turnaround times by 30% for ad-hoc reports.
Developed and deployed a highly scalable project in AWS (S3, Redshift, EMR, Lambda) using Terraform, enabling self-service infrastructure deployments and reducing manual configuration by 80%.
Automated ingestion from multiple sources (Oracle, DB2, Kafka, REST APIs) and orchestrated ETL workflows (Airflow), reducing data latency by 50% and enabling near real-time availability for downstream analytics.
Leverage containerization (Docker, Kubernetes) and serverless compute to minimize costs, auto-scale resources, and streamline deployment across dev/test/prod environments.
Implement CI/CD pipelines (Jenkins, GitHub Actions) to build, test, and deploy data transformations (Python, PySpark, Scala, SQL, dbt) with comprehensive code reviews and branching strategies.
Optimize Spark (PySpark/Scala) and Hive jobs for high-volume processing, employing partitioning, caching, and cluster tuning for performance gains and reduced query times.
Enforce data governance via Collibra, Great Expectations, and lineage tracking, ensuring data quality, metadata management, and compliance with enterprise standards.
Employ robust security (IAM roles/policies, encryption at rest/in transit) and role-based access controls to safeguard data across AWS environments and pipelines.
Monitor pipelines with tools like CloudWatch, Datadog, and Splunk, setting proactive alerts and conducting root-cause analyses for failures or data anomalies.
Collaborate in agile squads, partnering with Data Scientists, DevOps, and Analysts to deliver real-time dashboards and ML-ready datasets.
Lead all phases of solution development, including thorough testing for accuracy, explain technical considerations to stakeholders, review final deliverables with clients, and provide data analysis guidance as needed.
Mentor junior engineers in data engineering best practices, DevSecOps principles, and code optimization, fostering a culture of continuous improvement.
Environment: AWS (S3, Redshift, EMR, Lambda), Terraform, Oracle, DB2, Kafka, Airflow, AWS Step Functions, Docker, Kubernetes, Jenkins, GitHub Actions, Python, Scala, SQL, dbt, Spark (PySpark), Hive, Collibra, Great Expectations, IAM, Encryption, Role-Based Access Control, CloudWatch, Datadog, Splunk, DevSecOps
Aegis Software, Horsham, PA Jan 2020 – March 2023
Software Engineer (Data Engineer Focus)
Responsibilities:
Translated complex data needs into robust Spark (Scala) ETL pipelines, ingesting multi-terabyte datasets in near real-time, enabling improved analytics performance.
Developed Snowflake data warehouses, reducing query durations by 50% through partitioning, clustering, and advanced optimizations, while enforcing strict governance to meet enterprise compliance standards and ensure reliable insights.
Automated ingestion from SQL databases, Kafka, and REST APIs, leveraging Airflow for near real-time data pipelines.
Deployed Docker and Kubernetes microservices for Java, and Python applications, seamlessly integrating AWS platforms to support scalable, fault-tolerant solutions across multiple environments.
Employed AWS Lambda for serverless computing, using Terraform to implement infrastructure as code and accelerate deployments, thereby significantly reducing manual overhead and configuration complexities.
Upgraded legacy SQL systems to Spark-centric frameworks, boosting scalability, concurrency, and overall performance for massive data workloads through distributed processing.
Constructed event-driven pipelines with Kafka, supporting asynchronous message flows, real-time data streaming, and rapid alerting to enhance operational responsiveness and reliability.
Implemented rigorous data validation, lineage tracking, and version control, ensuring data integrity and audit readiness across evolving analytics demands and complex transformations.
Developed and optimized data pipelines to ensure seamless data integration, enabling Power BI dashboards to deliver real-time insights into key metrics and trends, supporting faster decision-making for stakeholders across multiple business units.
Implemented secure authentication with OAuth2, encryption at rest and in transit, plus role-based access control to effectively safeguard sensitive data and meet stringent compliance requirements.
Collaborated with distributed teams, conducting thorough testing using Postman for REST endpoints, while facilitating seamless integration between front-end React components and back-end microservices across multiple environments.
Environment: Apache Spark (Scala), Snowflake, Kafka, RabbitMQ, Python, Java, JavaScript, Docker, Kubernetes, AWS (Lambda, Terraform), GCP, Apache Airflow, Power BI, OAuth2, Git, CI/CD Pipelines, Microservices Architecture.
Verisk Analytics, Jersey City, NJ June 2019 – Jan 2020
Associate Software Engineer
Responsibilities:
Developed and maintained scalable Java-based applications, adhering to industry best practices and design patterns.
Collaborated with cross-functional teams to clarify requirements, produce technical specifications, and deliver solutions aligned with business objectives.
Wrote and refactored clean, efficient, and testable code, proactively troubleshooting existing codebases to enhance performance and reliability.
Participated in Agile ceremonies (sprint planning, daily scrums, retrospectives) for iterative development and rapid feedback loops.
Designed test plans, test cases, and scripts, performing both manual and automated testing across diverse environments.
Developed automated scripts using Java, Maven, TestNG, and Cucumber, leveraging Page Object Model (POM) for structured test automation.
Utilized CI/CD pipelines (Jenkins) for nightly builds, employing Git for version control and faster feedback.
Conducted RESTful API testing with Postman, Rest Assured, and SQL-based back-end validations to maintain data integrity.
Performed Android/iOS app testing on Appium emulators and simulators, ensuring a consistent user experience across mobile devices.
Education & Certifications
Bachelor of Science in Computer Science and Information Technology, Tribhuvan University (GPA: 3.7)
AWS Certified, Data Engineer Associate
Databricks Certified, AWS Platform Architect
Hashicorp Certified, Terraform Associate
Personal Interests
Tech Meetups & Conferences: Stay current with emerging trends and network with industry experts by attending local and national events on cloud, DevOps, and big data.
Open-Source Contributions: Contribute to GitHub repositories related to Python data libraries, enhancing code quality and documentation.
Reading & Research: Passionate about reading technical blogs and books on distributed systems, AI ethics, and scalable architectures.
STEM Volunteering: Mentor aspiring programmers at local code camps and hackathons, sharing knowledge on data engineering and DevOps practices.
Hiking & Travel: Pursue outdoor adventures to recharge and gain fresh perspectives, often exploring national parks and scenic trails.
References Available Upon Request