Hybrid AI
What is “Hybrid AI” in Memoria?
Historically, by Hybrid AI people meant something related to Khaneman’s Dual process theory or any combination of “intuitive reasoning” (shallow, fast and wide System 1) and “symbolic reasoning” (deep, slow and narrow System 2) that are expected to complement each other. LLMs turned out to be well-hybridizable with many different technologies, not limited to symbolic reasoners and databases. So, interests in ANNs is fueling secondary interest in technologies that previously have been resting in an oblivion.
In Memoria, the meaning of this term is slightly different (but not contradicting). Memoria project follows the intuition that there is no any specific “secret” in human intelligence in particular, and in intelligence in general: at the end of the day (after all possible optimizations have been applied) it’s all about resources – compute and memory. This position should not be confused with Sutton’s The Bitter Lesson. While both are similar in wording and final resolutions, Memoria implies that resources are always limited, and that makes huge difference: problem-specific optimizations do matter. Ultimately:
- If there is some algorithm or mathematical method, or data structure that can reduce computational complexity of AI, it’s worth using.
- If there is a custom hardware architecture that can improve raw performance and/or performance per watt, it’s worth using.
- If there is some physical process that we can utilize to improve performance characteristics, it’s worth considering.
- Quantum supremacy? Perfect!
- If we can improve our introspection and get some bits about inner machinery of mind that may help us to achieve better human-likeness of AI, let’s do it!
- Any useful bits form other disciplines are always welcome!
Memoria is grounded in Algorithmic Information Theory and it’s compression-based approach to AI. From this perspective, Systems 1 and 2 are just different compressibility domains. System 2 corresponds with highly-compressible domain, and System 1 corresponds with low-compressible domain. Traditional for programming distinction to algorithms and data structures has the same nature.
There may be many more compressibility domains than just two, so, potentially, we may have System 1…N in our AI architecture, where N is pretty large. Even within the same complexity domain there are many sub-domains, so methods like Mixture of Experts and Ensemble learning are efficient. These methods work across even distant domains too, it’s just a technical question how to make it working efficiently. Those technical questions are in the focus of Memoria.
In Memoria, by “Hybrid AI” it’s meant an architecture spawning multiple different compression domains.
Probabilistic LM-based Hybridization
A probabilistic language model is simply a probability distribution over a set of strings $S$ representing texts in this language: $P(S)$, where $S = w_0w_1w_2…w_i$ – is a sequence of text elements (usually, tokens). Probabilistic models are used by sampling (generating elements) from them. For sampling from language models we may use autoregressive schema by sampling form conditional distribution $P(w_i|w_{i-1}…w_0)$ – probability of a next element in the string given its prefix.
Autoregressive sampling means that we generate a string in an element by element, left-to-right way, each time appending newly sampled elements to the prefix. Additional techniques, like beam search, may be used to increase the probability value of the generated string. Autoregressive sampling gives us one additional important feature: we can sample strings that are continuations of a given prefix, that is called a prompt.
The language model has to be created somehow, and the simplest way is to learn the model inductively from the set of strings drawn from the real language. There are a lot of scientific and technical challenges here, but, basically, there are three different notable approaches: statistical n-gram-based, NN-based and VMM-based. In all cases we feed the mode a corpus of strings and expect it to predict those (seen) strings correctly. What we do want from the model is to predict correctly the unseen strings. In ML they call it generalization. When we are solving AI problems with ML, this is where al the ‘magic’ happens.
It have turned out that some very large neural language models (LLM) can generalize so well over natural language that they can solve some logical and mathematical problems, follow instructions, reason about some emotions and mental states (sic!), write program code, translate from one language to another, change style of a text, summarize/elaborate and maintain conversation – all from the natural language (e.g.: English).
Of course LLMs aren’t doing everything good enough (at the expert human-level), they are making a lot of hard to recognize and fix mistakes that is seriously limiting their practical suitability. They are pretty good at tasks in low-compressible domain: translation, style transfer, summarization and elaboration, and some others. The reason is probably that in low-compressible domains the role of generalization isn’t that high and the model size/scale is all that ultimately matters.
In highly compressible domains like basic math, logic puzzles, constraint solving, board games, database query execution and logical inference – generalization matters, but generalizability depends on many factors. The most important of them are training data quality, model architecture and learning algorithms. Both model architecture and learning algorithms are fixed for neural LM. There is no way single architecture may be good for everything, one may even say that NFL theorem prohibits this. There some indirect evidence that effects behind In-Context Learning in Transformers may help models to adapt to specific narrow tasks like basic arithmetic beyond what would be expected from the architecture alone. But those effects are severely limited. Basically, no amount of scaling can make a database engine out of a neural network.
Actually, the latter isn’t an issue if we want to achieve HXL-AI (Human-Expert Level AI), because humans aren’t that good at symbolic tasks either. The point is that mere scaling of LLMs is a wrong direction. Instead, we need to identify certain domains where scaling doesn’t work but there can be different solutions, and provide custom solvers – arithmetic, logical reasoning, constraint solving… and so on. A relatively small but fine-tuned LLM may be used here to pre-process the input, find narrow formal problems in it, invoke corresponding solvers and then post-process the result. Text in a mixture of natural language and structured formats can be seen as Intermediate Representation for this type of hybrid AI.
Using LLMs as a human-like interface to various problem solvers greatly increases their exposure to the potential audience. Problem solvers, despite having great potential value are pretty hard to use directly.
Hybrid and Approximate Reasoning
By “reasoning” we mean what is usually meant by “declarative problem solving”: logic (of various types), constraint solving, SAT/SMT, theorem proving, planning and many other types of combinatorial problem solving (CPS). CPS was initially meant as a main purpose of AI because of the value it creates. It literally solves problems and makes our life better. But there are two main obstacles:
- CPS is rather hard to set up and use: “you need a PhD for that”.
- CPS is very slow. Many important problems are out of our reach.
Complexity of CPS methods may be addressed with specialized LLMs, translating from problem description obtained in a dialogue with a user to structured problem representation. Right now (2024) it doesn’t work well in many ways, a lot of improvements of are still ahead. But at least this specific direction looks feasible and economically viable.
Computational complexity of CPS is mach harder to solve issue because, basically, there is no workaround. Logic is computationally complex. If we augment an LLM with a reasoner that will be logical parts of the query, it may take practically infinite time for even apparently simple problems. Inference time is rather hard to predict.
Approximate reasoning may be useful in some cases. We, humans, aren’t perfect in logic and other types of CPS either, but that’s OK. Especially if there are ways to improve a partial or approximate solution. There are two main ways to implement approximate CPS:
- Heuristic (ex: greedy) methods.
- Trading speed for memory.
Heuristic methods are some statistically and statically inferred rules that we can use to reduce computational complexity of a CPS method and produce a “very good” result in many (but not most!) cases.
Trading speed for memory (TSM) is a large family of computational complexity reducing methods, with dynamic programming as a famous example. TSM may also be used for saving energy, if energy costs of storing and retrieving a solution is lower than costs or recomputing it.
TSM can also be viewed as a heuristic method with an unbound number of heuristics, so we accumulate useful heuristics in memory as soon as they help reducing computational complexity (even at the expense of precision). Example: heuristic instance-based reasoning.
The challenge is that the number of instances/heuristics/solutions stored in memory and their descriptional complexity may be pretty large. Specifically for that, Memoria provides highly-functional solutions: advanced data structures, query engines, possibility of hardware acceleration, integrated storage stack from bare metal to high-level computing, decentralisation and many other useful features.
Associative memory for LLM
One of the most well-known ways of augmenting an LLM with external memory is RAG. Here, simply speaking, a prompt is translated into one or more queries to external data sources like web pages and databases. Retrieved result is summarized and returned to the user. Another way is to augment transformer with explicit memory. By doing this, authors have reported significant reduction in effective model size required for the same level of performance.
Implicit or parameters memory of LLMs is very expensive: computational costs are in the other of O(N) where N is a number of parameters. The number of parameters itself is quadratic from the number of structural units in attention and fully connected layers. This does not scale well, especially when a model generalizes poorly for objective reasons and needs to memorize more.
From other side, database technology provides searchable spatial data structures with logarithmic (on average) lookup time complexity. Memoria has specially designed associative memory – multiary relation complying with Paul Smolensky’s requirements for compositional neuro-symbolic representations. Unlike traditional database technologies where relations link points together, associative memory links together sub-volumes. And a point is a special case of unit volume. Like with neural networks splitting space with hyper-planes, associative memory splits space with volumes, and infinitely many actual data points may fit into a single volume.
Given those ‘hybrid’ properties of Memoria’s associative memory, it’s a much better candidate for using with connectionist ML than classical graphs- and relations-based data structures (like classical RDF-like knowledge graphs).
Running complex queries over classical relational and graph data is a costly process, both in terms of memory and compute. Querying ‘hybrid’ advanced data structures like associative may be even more costly, because we need to use sampling-like algorithms for that. While it’s nothing special from algorithmic perspective, we do need a specialized hardware for achieving maximal efficiency. Memoria Acceleration Architecture (MAA) may use associative memory as one of its design and performance targets.
MC-AIXI-CTW
AIXI is a theoretical mathematical formalism for artificial general intelligence. It’s a reinforcement learning agent, maximizing the expected total reward received from the environment. AIXI is a clever mathematical trick that is based on so-called Universal Prior (UP). It’s universal, because it already contains all possible solutions for all problems packed into a format of a probability distribution. AIXI is an ultimate, universal RL-based agent, but it’s uncomputable. So, it’s not feasible in its ultimate form. Navertheless it’s a simple and elegant formalism demonstrating how very different algoritms can be get working together as a single holistic system, by reducing everything to probabilistic string prediction. Auto-regressive LLMs are also just predicting next token in a text, but a lot of ‘magic’ implicitly happens behind the scene.
AIXI is infeasible to implement, but surprisingly it can be approximated. One of such known approximations is MC-AIXI-CTW. It approximates Universal Prior with variable-order markov models (VMM), represented as trees. For unknown string probability estimations it uses Context Tree Weighting method.
What is interesting about MC-AIXI-CTW, is that:
- It’s based on a language model, backed with VMMs. Very much like with NN-bassed LLMs, intelligence is proportional to the model’s abilities to estimate probability of unknown strings correctly. This is what we call ‘generalization’ of a model.
- It’s an agent acting in a environment according to some RL policy. So, ulike a raw LLM, it’s an almost-ready-to-use AI.
- VMM, implemented as a tree, is much easier interpretable and hybridizable than a neural network.
- VMM is a database, requiring latency-optimized architecture. And can be an interesting benchmark for MAA.
Unfortunately for AIXI approximations, they have been lost in the shade of DL revolution in 2010th. Now, with broad interest resurrecting to many previously forgotten AI approaches, AIXI may see its second life. What we do need here is specialized hardware (and related software) to accelerate this type of probabilistic models.
Hardware
Neural network is just a bunch of dense arrays – pretty simple data structures. Matrix multiplication generates simple and predictable memory access pattern. Computations can be easily scheduled statically ahead-of-time and at the scale of an entire cluster.
In case of Hybrid AI we need full set of hardware architectures, optimized for static and dynamic parallelism, optimized for minimizing memory access latency and maximizing throughput. There is no way to provide a single capable architecture. Instead, we need a constructor to build an architecture, specialized for a specific problem class. Memoria Accelerated Architecture (MAA) is addressing this issue.
Software
NN-oriented ML frameworks are relatively simple. Neural network is a bunch of dense arrays, the only fundamental data structure we need. We also need powerful optimizing compiler converting data-flow graph of a program into sequence of computational kernels invocation, handling alos data flow in the process. Specifics of most neural networks is that computations have highly regular and predictable data flow, so there is a lot of opportunity for static-time optimizations, even claster-wide.
Symbolic AI, explicitly or implicitly, relies on some search technique is state space. So, represetation of the state space becomes crucial. The state space may be huge and highly irrecular requiring large complex data structures and software and hardware that can leverage dynamic parallelism.
Sufficiently general reasoning engine is much more complex than an advanced relational DBMS and includes it. Relational algebra is a derivative of relational calculus – that is a tractable subset of first-order logic (FOL). RDBMS sacrifice expressiveness of FOL for predictable performance, they also lose the deductive component – ability to infer new facts from existing ones. Reasoning engines may generate large intermediate state and/or results and operate on large datasets, so building it on top of a powerful and generic query engine is essential.
Memoria Framework provides necessary basic and advanced elements for building standalone and hybrid reasoning engines. The main elements of the stack are:
- Hermes data format, deeply ontegrated with the rest of the framework, including hardware acceleration at the level of MAA if/when necessary. Hermes is flexible enough to support all the necessary structured formats like knowlege graphs, no thierd-party libraries are needed.
- Hermes-based HRPC accelerated communication protocol.
- Rich set of trivial and advanced data structures. No external database or storage system is ever needed.
- Accelerated query execution engine. Turing complete superset of SQL/Datalog and beyond. Backward chaining mode (query execution) and RETE-based forward-chaining engine for event-driven computations: robotics, agents, embedded systems and IoT.
- Computational storage layer. Memoria fully redefines storage and processing stacks comparing to a traditional ones, based on CPUs and monolithic operating systems with integrated complex storage layers (file systems). Complex distributed and heterogeneous architectures become much simpler.
Beyond Reasoning
LLMs are considered to be language models. Language models trained on human-generated textual data are much more than just that. They are capturing not only the language itself, but also functional approximations of higher mental functions, including agency. Conversation with a sufficiently powerful and properly-tuned LLM looks and feels like a conversation with a real well-educated person with encyclopedic knowledge. Apparent human-likeness of conversations with LLMs is impressive, especially when we take into account implicit functional aspects of it (intuitive understanding), but it should not be deceiving. LLMs do not feel or really experience what they say the way we do: generative processes in LLMs are sufficiently different from ours. We should not attribute any of our mental states to them (we may do it, but only with great precaution).
Despite ‘mental states’ of LLMs (if any) are very different from ours, their observable effects are consistent with corresponding effects of our mental states (providing that there is enough training data). This is the reason why it feels so human-like in conversations. Certain higher mental functions are known to be really hard to formalize, they are nevertheless very important for human-machine integration. This entire topic has been considered largely uninteresting in AI/ML community. But recent second dawn of LLM-based multi-agent systems have brought old question to the table again. There is much more to human cognition than just ‘reasoning’ and ‘learning’, that is may be just 1% of all mental activity related to the problem solving. If we want our computational agents be integrateable into human-centric interactions, or humans be integrable into societies of computational agents, both parties need to understand each others at the emotional, intuitive, unconscious levels.
Capturing mental states and higher mental functions in LLMs via machine learning has been already proven efficient, but the real problem is that internal state of black box ML models is hardly interpretable in terms of external objects those models interact with. LLMs work, until they don’t. And if they don’t, there is no specific way to fix it. Moreover, textual datasets are extremely skewed at textual descriptions of mental states and higher mental functions, seriously limiting LLM’s abilities to reason about them. Mental states describing external objects are very well represented in textual data. The same is true for certain emotional state, but now their descriptions depend on some context, introducing subjectivity into interpretations.
Certain important mental states have no textual expression at all. They may be not even reportable. One of foundational questions that may trick and freak programmers is “How do you write programs?”. Only the most experienced programmers notice that the very process of program creation is not really accessible. Less experienced programmers may say something about methodologies and philosophy. But these things are just reflections on generalisations of the process, not how actual mechanisms of writing a program emerge in our heads. The same is true for any other types of reasoning (and accessible mental activity in general): there is a lot of an inaccessible (or intuitive) component of the process happening in the background. Getting this access may be the key to making our thinking processes efficient and our MLhttps://github.com/victor-smirnov/digital-philosophy/blob/master/Artificial%20Intelligence.md models economically viable.
It’s yet to be proven that reducing skewedness of datasets by enriching them with textual descriptions of inner mental processes (intermediate cognitive material) will improve performance of language models in general and agent-related tests. Nevertheless, it’s a solid and grounded intuition because it works this way for humans. In order to get this data from our minds, we need to develop intrapersonal intelligence – the ability to understand yourself, including your thoughts, feelings, motivations, and fears, and to use that understanding to make decisions and communicate – in a way that is compatible with AI. Figuratively speaking, we need to understand our ‘inner machine’ and describe it in terms of algorithms and data structures expressed as scripts for agents.
There is an ongoing multidisciplinary research process around Memoria targeting unification of intrapersonal intelligence concept from Psychology (Howard Gardner) with Philosophy of Mind, Physiology, Mathematics and Computer Science. The goal is to build a conceptual bridge between first-person experience, first-person self-psychology and computations in the form of minimalist computational models of fundamental higher mental functions like Observer and beingness, feelings, intuition and many others.
Inrapersonal intelligence has been proven to play crucial role in many aspects of general human intelligence: self-awareness and emotional regulation, motivation and goal-setting, critical thinking and reflection, problem-solving and decision-making, personal growth and adaptability, interpersonal relationships. These and many others aspects of intrapersonal intelligence can positively impact overall cognitive abilities and personal success. The basic idea is enhance it with advanced computational theories of mind that is expected to extend conscious access and self-control, because much more complex mind states become accessible and interpretable. The Hard problem of consciousness isn’t that hard if we can see our ‘inner machine’. Seamless integration with The Machinekind is also much simpler this way.
On the HW/SW side of things insights form intrapersonal intelligence are expected to give new advanced algorithms and data structure, integrated circuits, programming language concepts, software development and collaboration patterns and many other cool things. Memoria Framework will be adopting those things once they are available. The first candidate of the road is the concept of self-referential Turing Machine implemented with DSLEngine.
Memoria as a Dataset
LLMs create a lot of opportunities for opensource software. A lot of code is currently unmaintained and forgotten. OSS authors struggles looking for users attention to their projects. If this code is used for training LLMs it will be used. LMM this this sense create completely new opportunity for idealistically-motivated authors. Even if their works do not have sufficient direct audience right now, the message hidden in thier works will live in the models trained on these works.
Memoria, as a project, does recognize this opportunity to provide not only direct software artifacts, but also indirect generative architectural patters if/when the project is used for LLM training. Memoria itself is going to use this pattern for automation of the project evolution. More details of this later…