Data Engineer Big

Location:

Martinsburg, WV

Salary:

90000

Posted:

November 26, 2024

Contact this candidate

Resume:

Raksho Umasankar

Data Engineer - Nikola Corporation, Phoenix AZ

********@*****.*** 716-***-**** www.linkedin.com/in/raksho

Experienced Data Engineer with track record of developing rich analytics applications/visualization reports, design of big data processing systems (pipelines) and work on optimizing ETL workflows for the timeseries data generated.

Adept at working cross-functionally with diverse stakeholders across engineering & IT organizations to gather & define clear requirements.

Driven by passion to improve data-accuracy, uptime in processing, and reliability of data applications to help drive engineering business decisions and product strategies leveraging high-quality data. Areas of Expertise

● Software Development

● User Experience & Visualization

● Application Development

● Automated ETL Pipeline Systems

● Timeseries Metrics

● Edge Computing

● Data Engineering

● Analysis & Data Modelling

Technical Skills

Web: Spring, Angular8, REST, Graph QL DevOps: Airflow, Gitlab, Docker, Jira, Maven, Tomcat Programming: Python, Java, PostgreSQL AWS Cloud Services Development: EC2, AWS CloudWatch, S3, Glue, ECR, ECS, Lambda,SNS Python Libraries: Polars, Pandas, Numpy, Multiprocess, Matlplotlib,Dash, SQLAlchemy Big Data Analytics: PySpark, TimescaleDB, DuckDB, Dask,Jupyter Notebook Reporting & Visualization: PowerBI, Grafana, Apache Superset Career Experience

Nikola Corporation, Phoenix, AZ (Feb 2024 – Present) Data Engineer – Vehicle Telematics Group

● Developed an ETL pipeline for metrics processing using Python and Polars serving as backend for customer fleet reporting by collaborating with engineering teams for understanding the metrics logic.

● Developed an Apache Airflow-based pipeline system to process data collected from various sources(data loggers), including RDS databases and data lakes, at a high-resolution frequency of 10ms for engineering analysis projects.

● Implemented Spark SQL and staging function for processing complex parquet files, enabling dynamic ingestion of varying dbc schema.

● Implemented SQL procedure to dynamically adopt newly added signals and load data into reporting table based on CAN signal priority.

● Deployed Airflow applications within Docker containerized environment to ensure isolated execution for discrete analysis jobs.

● Collaborated with Devops team to automate the deployment of image of metrics pipeline in AWS Elastic Container Repository.

● Utilized AWS Lambda and S3 triggers to implement data validation protocols and handle data mismatches, such as outliers to address discrepancies in metrics algorithms.

● Developed unit test cases using Python's mocking library to ensure pipeline accuracy, utilizing seeding functions to generate large dataframes by collaborating with SME’s to define precise data range.

● Implemented an automated alert system using CloudWatch and Microsoft Teams integrated with Pagerduty API for real-time incident management, ensuring uptime of critical data pipelines Key Projects :

Pyspark based ETL pipeline for large scale automated processing of parquet files into Timeseries Database:

● Enhanced the data pipeline for processing large amount of logger data on trucks per day by integrating SQL caching, Spark dataframe API and multithreading on AWS glue.

● Created a replica of the AWS Cloud- based ETL pipeline locally using pySpark by integrating AWS Glue kernel in Visual Studio for failure troubleshooting and root cause analysis.

Database Migration Framework for Version Control of End-Customer Reporting/Production Level Analytics Applications

● Development of a scalable framework solution using alembic configuration to recover back-end databases in the event of failure

● Setup auto generation of alembic migration using SQLAlchemy models and orchestrated the process using GitLab CI/CD pipeline Stream Processing System for near Real-time Data Sharing with Supplier (for warranty purposes)

● Developed scripts to consume AWS Kinesis Streams and publish topics to supplier’s Apache Kafka stream processing system using AWS Lambda functions

● Setup automated validation of recorded signals against requested signal list at specific logging rates, in a real-time manner Nikola Corporation, Phoenix, AZ (Aug 2023 – Jan 2024) Data Engineer – Propulsion Engineering Group

● Cross-functional collaboration with engineering teams for metrics definition and design data models that support business insights and decision-making.

● Developed aggregation functions and SQL materialized views for low-latency visualization of time-series metrics with refresh policy.

● Developed engineering dashboards in PowerBI to visualize aggregated metrics and, Grafana to visualize timeseries trends

● Developed OLAP solutions with duck db and crunchy analytics cluster for ad-hoc analysis of archived parquet files in AWS S3.

● Automated the mechanism for Power BI dataset refresh utilizing REST API calls in Python to enable event driven dashboard refresh Key Achievements:

● Data Storytelling and Visualization: Developed high-impact dashboards in PowerBI and Grafana, presenting complex data in a digestible format for non-technical users, aiding product strategy and operational improvements.

● Improved Stakeholder Collaboration: Established data communication practices with stakeholders, clarifying analytics processes and ensuring data models aligned with business requirements. Cognizant Technology Solutions, Chennai, India (2019 –2022) Senior Software Developer

● Developed and managed advanced T-SQL queries, stored procedures, and views using functions and CTEs for complex data operations

● Formulated real-time dashboard with Angular 8, integrating data from RESTful services for improved charts and visualization

● Created Restful services using Spring Boot and streamlined data management with Hibernate mappings for efficient data access Cognizant Technology Solutions, Chennai, India (2018 – 2019) Software Developer

● Optimized single-page application using Angular 8 & enhanced UX efficiency through advanced features like lazy loading and hooks

● Ensured application security with robust user authentication via Spring Boot and Spring Security

● Conducted precise data cleaning and validation with SSIS by maintaining data integrity for SQL Server Education

M.S. in Computer Science & Engineering

State University of New York At Buffalo, NY, USA

Coursework: Distributed Systems Data Modelling & Query Language Design & Analysis of Algorithms Machine Learning Block chain IoT & Big Data Systems

B.S in Computer Science & Engineering

Anna University, Chennai, India

Contact this candidate