Raksho Umasankar
Data Engineer - Nikola Corporation, Phoenix AZ
********@*****.*** 716-***-**** www.linkedin.com/in/raksho
Experienced Data Engineer with track record of developing rich analytics applications/visualization reports, design of big data processing systems (pipelines) and work on optimizing ETL workflows for the timeseries data generated.
Adept at working cross-functionally with diverse stakeholders across engineering & IT organizations to gather & define clear requirements.
Driven by passion to improve data-accuracy, uptime in processing, and reliability of data applications to help drive engineering business decisions and product strategies leveraging high-quality data. Areas of Expertise
● Software Development
● User Experience & Visualization
● Application Development
● Automated ETL Pipeline Systems
● Timeseries Metrics
● Edge Computing
● Data Engineering
● Analysis & Data Modelling
Technical Skills
Web: Spring, Angular8, REST, Graph QL DevOps: Airflow, Gitlab, Docker, Jira, Maven, Tomcat Programming: Python, Java, PostgreSQL AWS Cloud Services Development: EC2, AWS CloudWatch, S3, Glue, ECR, ECS, Lambda,SNS Python Libraries: Polars, Pandas, Numpy, Multiprocess, Matlplotlib,Dash, SQLAlchemy Big Data Analytics: PySpark, TimescaleDB, DuckDB, Dask,Jupyter Notebook Reporting & Visualization: PowerBI, Grafana, Apache Superset Career Experience
Nikola Corporation, Phoenix, AZ (Feb 2024 – Present) Data Engineer – Vehicle Telematics Group
● Developed an ETL pipeline for metrics processing using Python and Polars serving as backend for customer fleet reporting by collaborating with engineering teams for understanding the metrics logic.
● Developed an Apache Airflow-based pipeline system to process data collected from various sources(data loggers), including RDS databases and data lakes, at a high-resolution frequency of 10ms for engineering analysis projects.
● Implemented Spark SQL and staging function for processing complex parquet files, enabling dynamic ingestion of varying dbc schema.
● Implemented SQL procedure to dynamically adopt newly added signals and load data into reporting table based on CAN signal priority.
● Deployed Airflow applications within Docker containerized environment to ensure isolated execution for discrete analysis jobs.
● Collaborated with Devops team to automate the deployment of image of metrics pipeline in AWS Elastic Container Repository.
● Utilized AWS Lambda and S3 triggers to implement data validation protocols and handle data mismatches, such as outliers to address discrepancies in metrics algorithms.
● Developed unit test cases using Python's mocking library to ensure pipeline accuracy, utilizing seeding functions to generate large dataframes by collaborating with SME’s to define precise data range.
● Implemented an automated alert system using CloudWatch and Microsoft Teams integrated with Pagerduty API for real-time incident management, ensuring uptime of critical data pipelines Key Projects :
Pyspark based ETL pipeline for large scale automated processing of parquet files into Timeseries Database:
● Enhanced the data pipeline for processing large amount of logger data on trucks per day by integrating SQL caching, Spark dataframe API and multithreading on AWS glue.
● Created a replica of the AWS Cloud- based ETL pipeline locally using pySpark by integrating AWS Glue kernel in Visual Studio for failure troubleshooting and root cause analysis.
Database Migration Framework for Version Control of End-Customer Reporting/Production Level Analytics Applications
● Development of a scalable framework solution using alembic configuration to recover back-end databases in the event of failure
● Setup auto generation of alembic migration using SQLAlchemy models and orchestrated the process using GitLab CI/CD pipeline Stream Processing System for near Real-time Data Sharing with Supplier (for warranty purposes)
● Developed scripts to consume AWS Kinesis Streams and publish topics to supplier’s Apache Kafka stream processing system using AWS Lambda functions
● Setup automated validation of recorded signals against requested signal list at specific logging rates, in a real-time manner Nikola Corporation, Phoenix, AZ (Aug 2023 – Jan 2024) Data Engineer – Propulsion Engineering Group
● Cross-functional collaboration with engineering teams for metrics definition and design data models that support business insights and decision-making.
● Developed aggregation functions and SQL materialized views for low-latency visualization of time-series metrics with refresh policy.
● Developed engineering dashboards in PowerBI to visualize aggregated metrics and, Grafana to visualize timeseries trends
● Developed OLAP solutions with duck db and crunchy analytics cluster for ad-hoc analysis of archived parquet files in AWS S3.
● Automated the mechanism for Power BI dataset refresh utilizing REST API calls in Python to enable event driven dashboard refresh Key Achievements:
● Data Storytelling and Visualization: Developed high-impact dashboards in PowerBI and Grafana, presenting complex data in a digestible format for non-technical users, aiding product strategy and operational improvements.
● Improved Stakeholder Collaboration: Established data communication practices with stakeholders, clarifying analytics processes and ensuring data models aligned with business requirements. Cognizant Technology Solutions, Chennai, India (2019 –2022) Senior Software Developer
● Developed and managed advanced T-SQL queries, stored procedures, and views using functions and CTEs for complex data operations
● Formulated real-time dashboard with Angular 8, integrating data from RESTful services for improved charts and visualization
● Created Restful services using Spring Boot and streamlined data management with Hibernate mappings for efficient data access Cognizant Technology Solutions, Chennai, India (2018 – 2019) Software Developer
● Optimized single-page application using Angular 8 & enhanced UX efficiency through advanced features like lazy loading and hooks
● Ensured application security with robust user authentication via Spring Boot and Spring Security
● Conducted precise data cleaning and validation with SSIS by maintaining data integrity for SQL Server Education
M.S. in Computer Science & Engineering
State University of New York At Buffalo, NY, USA
Coursework: Distributed Systems Data Modelling & Query Language Design & Analysis of Algorithms Machine Learning Block chain IoT & Big Data Systems
B.S in Computer Science & Engineering
Anna University, Chennai, India