Post Job Free
Sign in

Embedded AI Kernel Developer

Company:
Acara Solutions, An Aleron Company
Location:
Rollingwood, TX, 78716
Posted:
April 30, 2025
Apply

Description:

Our AI Technology Group enables state-of-the-art ML and DL model development across our hardware portfolio, using sophisticated model compression and acceleration techniques to deploy previously impractical AI tasks to battery-powered environments. Our team identifies neural architectures best suited to our customer’s needs, selects those models most amenable to deployment on our platform, trains them carefully, tuning for memory, compute, and energy constraint tradeoffs, and deploys them using AI runtimes optimized for our hardware platform. Finally, we publish and socialize our findings via conferences, workshops, and publications.

Beyond a healthy obsession with computational efficiency, the successful candidate will be comfortable operating in a ‘version zero’ environment, marshaling internal, open-source, and third-party resources to solve our customers' problems quickly and elegantly.

Specific Responsibilities

Optimize embedded AI runtimes such as Tensorflow Lite for Microcontrollers to utilize hardware products efficiently.

Develop advanced inference performance profiling tools to help customers identify optimization targets and solutions.

Develop novel, ahead-of-time AI model inference compilers to achieve better power, latency, and memory performance incorporating state-of-the-art pruning and quantization techniques.

Develop training-side tools and libraries to help AI developers identify neural architectures that optimally run on our platforms.

Publish and maintain these tools, including documentation and other assets our customers need to bootstrap their internal AI features.

Socialize their achievements via conferences, meetups, workshops, and publications.

Requirements

Education

A bachelor’s degree in computer science or a related field requires at least 2 years of relevant experience. A master’s degree or PhD in related topics is highly desirable

Required Skills/Abilities

Experience writing CPU kernels leveraging vector accelerators such as Arm Helium, Arm Neon, or Intel AVX. Past work with CUDA, OpenCL, or other low-level kernel development environments is a plus.

Experience with AI model performance profiling.

Experience with embedded C or C++

Experience with Keras and Tensorflow (TFLite, TFLite for Microcontrollers).

Bonus Qualifications

Experience with compiler development

Experience with developing for embedded NPUs

Past TinyML/EdgeAI involvement or experience

Experience developing and optimizing for TFLite for Microcontrollers

Experience with model-to-binary compilers (IREE, MicroTVM, etc)

Experience with ONNX, TOSA, Jax, LLVM, and/or MLIR

Experience with optimizing for heterogeneous AI compute (e.g., CPU+NPU+DSP)

Apply