HW/SW Co-design

This page describes functionality or feature that may not necessary exist at the time of writing. HW/SW co-design is meant to be a showcase of Memoria as an application development framework, it will be gradually available when it’s ready.

Basic information

By hardware/software co-design we mean a system design process when, given some functionality, we have a choice to implement it in hardware or in software. Actually, there are not that much of substantial difference between those two options, the problem is that corresponding engineering practices are very different. Co-design framework aims to align them to each other and, if possible, unify into a single seamless super-process.

We have practically successful examples of existing co-design frameworks – neural networks-oriented opensource ML frameworks (PyTorch, Tensorflow and others) support using custom designed hardware to accelerate computational graphs. Independent hardware manufactures can adapt their hardware to the ways how ML framework operate, and ML frameworks users may design custom hardware to accelerate their specific dataflow graphs.

Memoria follows the same paradigm: it defines formal computational environment, protocols, data structures and their implementations, domain-specific languages and compilers, but is focused on a different class of computations than existing frameworks – on latency-sensitive computations and on memory parallelism.

High-level computation model in Memoria is event-driven computations. There are three corresponding hardware domains in the focus of Memoria’s:

  • Generic mixed DataFlow and ControlFlow computations. A lot of practical compute- and IO-intensive applications, that may be run either on CPUs or on specialized hardware, fall into this category.
  • Integrated Circuits for fixed (ASIC) and reconfigurable logic (FPGA, Structured ASIC). May be used for high performance and low power stream/mixed signal processing part of application, giving ability to handle events with a nanosecond-scale resolution.
  • Rule/search-based computations in a forward chaining (Complex Event Processing, Streaming, Robotics) and backward chaining (SQL/Datalog databases) forms.

Functions in hardware domains are connected with a unified hardware-accelerated RPC+streaming communication protocol, HRPC, allowing intra- and cross-domain seamless communication. HRPC is very much like gRPC (that is used to communicate with services in distributed computing tasks), but optimized for direct hardware implementation.

Generalized computational model in Memoria is data-flow over decentralized persistent data structures. Data-flow can be either a plain low-level tokenized one or high-level complex-event-driven RETE-based one.

As a co-design framework, Memoria defines two main design extension points:

  1. Custom hardware functions attached via bridges to an HRPC router and visible in the system as HRPC endpoints. This is loose coupling design pattern.
  2. Custom ISA extensions for a RISC-V reconfigurable processing unit. This is tight coupling design pattern.

Loose coupling of functionality simplifies compositionality and distribution (of compute within a memory architecture, for example). Tight coupling pattern allows concentration of compute where necessary (matrix multipliers for ANN/HPC).

At the software level, Memoria is also a software development framework (different one) for data intensive applications like data platforms, storage systems, networking and IoT/embedded. The whole idea is that by following Memoria-defined protocols hardware developers my gain immediate access to a pretty complex software ecosystem.

Co-design part of the framework consists from the following components:

  1. Hermes and HRPC libraries and tools, this is also a part of Memoria Core module. Core can be used independently from the rest of Memoria.
  2. DSLEngine provides runtime-efficient system-wide Intermediate Representation of the code model for each hardware domain. IC domain may be lowered to Circt (preferable way).
  3. Clang-based Jenny C++ compiler and associated tools targeting CF/DF/Rule domains and custom RISC-V kernels. Note that regular Memoria applications don’t need a dedicated compiler. Any compliant C++20 compiler should be OK. Clang is preferable.
  4. Build and automation tools. Generic programmable event-driven build system, available both in command line and via HRPC API. HRPC may have JSON and gRPC bindings for compatibility reasons.

Special features

As a special feature, Memoria provides data containers for compressed searchable sequences with alphabets from 1 to 8 bits. These data structures are good for capturing digital waveforms for further analysis:

  1. Large alphabet is needed when signals may have more than 2 states.
  2. Searchbility is needed for locating specific patterns in the signal.