Colliding Minds
Part 2. Can mental models based on collision detection form the basis for intelligence in animals and machines?
I promise that starting with a rich set of data to work with is so much easier than starting with a sparse set of data. We’re talking orders of magnitude easier.
This is the second half of a 2 part paper on intelligence in biological and non-biological machines. You can read the first half first, but you don’t have to. If you want to skip to the ML/AI strategy you want part 3.
I’m just going to jump right in, talking about a theory of mind that I believe is true for biological intelligence — including but not limited to human intelligence AND can be applied directly to an artificial construct so that it too can present an intelligent and self-aware mind.
What follows is the theory itself described in a set of axioms supported by examples for illustration. We’ll see how the theory applies to language, physics, abstractions, simple planning and long-term strategy. I’m no neuroscientist nor am I a machine learning guru so this will fall very short of an academic paper and certainly will not be in an academic format.
I do hope that you get something useful out of reading through it though and of course, thank you for taking the time to consider this humble offering.

Experience is fundamental
Talking, hearing, moving, touching, seeing, smelling, tasting, doing, watching, running, breathing, eating, secreting… the sensate experience of being an organic machine. A robust ‘analog’ interaction with one’s self, one’s environment and one’s peers is fundamental to the emergence of intelligence as we know it.
(This sounds self-evident but as usual the details of how this happens and why it’s important will become more clear as the theory is defined.)
Awareness is simulation of self
One’s concept of self is what happens when you replay recorded experiences in a simulation. These simulations are available for introspection, reflection, deconstruction and, most importantly, for modeling predictions.

When consciously choosing action, a simulation of self is used to predict outcomes and make decisions regarding successful goal achievement. More simply put, we imagine doing something to decide if it is what we want to do. What should be obvious but is nevertheless novel to anyone who has not considered it, is that when we ‘imagine’ we are replaying sensory data from past experiences and fitting that data to a predictive model based on present circumstance. This is the simulation I’m referring to.
This is not a simple representation. This is taking old sensory data and comparing it with new sensory data, within the context of an existing model. The differences between the two data sets provide an additional layer of feedback that is more nuanced than raw data alone and allows for an incredible opportunity for optimization and learning. When you consider that the new data can be incorporated into the model as even more data comes in, we have an infinite amount of precision available to us, limited only by time, physics and the will to focus.
Simulated sensory data is way more dense and diverse than we typically consider as part of our conscious experience. We live at a necessarily superficial level in our own bodies, not fully experiencing the vast amounts of input coming in. You might say that our memory and therefore much of our experience of self is ‘sparse’ in that it is the peaks and valleys we tend to store in memory and replay in simulation, rather than the full gamut of experience. This sparsity is by necessity, as we have mere nanoseconds for these electrochemical processes to occur in moment to moment action oriented thinking. Of course deep thought and purposeful self-reflection can enhance the level of detail when called upon.
Reflective Prediction through Collision
We’ve achieved awareness but to what purpose? We can turn data into information and information into action. Transforming data into action uses up energy. Using energy requires motivation (typically we want to conserve energy). What motivations do we have? What, again, is our purpose.
Survival, growth, gratification, procreation, legacy. All are subject to competition. We seek to use the data available to us in pursuit of and in competition as a result of these motivations. Success in most scenarios relies upon more than random interactions. We must change our circumstances to to improve our chance of success.

Prediction is the most effective way to do this. Prediction is the use of past experience to estimate future potential. Predictions are only possible with the help of historic modeling. Even the most basic forms of life are capable of limited historic modeling. They retain memories of what was tried and rates of success for each attempt inform the next.
This is why we study, practice and train ourselves to improve our natural abilities and do so with repetitive yet varied exercises. In this way we are creating the prediction model which will be our benchmark. We can a) replay an idealized version of this data to control our actions and then b) continue to refine those actions against permutations of the model even as the action itself is playing out. A very well trained individual knows when to react and when to follow through regardless of what may be going on around them. This too is based on many attempts, many successes and failures with the resulting model updated and stored for later use.
Sh*t Just got Real (or the Conclusion)
Now we have a pretty well defined methodology that can be used to achieve a purpose by improving upon past experiences through predictive modeling using simulations of self in a competitive and reflective process. Yeah. What do we have again? Those are nice words but can we actually make something that does all of that stuff? Sure can.
One proposal for the mechanism for prediction is one of differential calculus as it relates to collision detection or fitness to a function. Put another way — how many attempts does it take to work a tangram puzzle?

The shapes can fit together many different ways and not all pieces are the same, a mental inventory is required. Just manipulating the shapes can require fine motor control. Try it when all you have is a silhouette rather than an image of the pieces shown together. Now try doing it without vision, based on touch alone (with a cut out of the shapes rather than a flat image as the model). Now imagine coming up with recognizable silhouettes of things without a model to work against at all, just a set of real world silhouettes and the memory of prior attempts.
By knocking the pieces together enough times and experiencing the pieces through your sensory input system you can establish the shapes, the sizes, the number, the environment and more. By colliding these objects into each other the interactions themselves will provide ample feedback for working out anything you might want to know. Think of it as a bayesian trial and error approach. Bayesian because it incorporates statistical learning to improve upon purely random attempts.

This is the current state of the art for machine learning by the way. So we’re getting close, I can feel it.
The same thing happens inside your internal mental state. Sensory data played back leaves an electrochemical signal. This signal can be compared to other sensory data in much the same way you can compare two physical objects with your external senses. Similar signals can be grouped, signals that happen in tandem can be tagged to form relationships. The source of the data adds another dimension, time and location another, among many.
What we end up with is a data structure best described as a signal graph. Time stacked, feature clustered sensory signal nodes interconnected with other nodes through multiple edge dimensions — all receiving and often generating new data from both external inputs and internal simulations. Where does one section of the graph collide with another, when does it collide with itself? When it’s learning and thinking of course, the only true definition of intelligence.
That’s it, just three axioms. Experience, Awareness and Reflection. The operative components mentioned are analog (bayesian) prediction functions, simulated experience models and sensory signal graphs.
With these we can build a mind, a self aware mind, given enough streaming sensor data to work with.
Let’s see some examples
It’s always fun and helpful to illustrate a concept with some practical context. We’ll start off with the typical stuff and progress into ever more complex subjects.
Language.
An experience of self is very likely not enough to provide language. There must be reflection. Self reflection is difficult, time consuming and exhausting. Watching someone else do half the work is way more enjoyable and satisfying. Thus language may have evolved to share work, freeing up energy for other uses.
Until now, language has been primarily studied as an abstraction of the mind, expressed through sound or gesture. Oh I’m sure there are a few studies with a more holistic approach but really, when you look at the cutting edge linguistics work, it’s all about abstractions, phonemes, categories and symbols. I’ll say this once — you’re doing it the hard way. To be fair, if you are attempting to deduce meaning from sound or pictographs alone you have to do it the hard way, which in many cases has been the approach when the goal is automation rather than intelligence.
So what’s the easy way? It’s right there inside you and around you. I mean say it out loud and you’ll feel what I mean. Read these words out loud. As you are doing this, pay special attention to the sensation of speaking the words. Notice the movements of your facial muscles, your tongue, your throat and larynx. Your diaphragm and chest cavity muscles are used to control the breath flowing up and out. Sub-vocal sounds travel into your inner ear while your voice leaves as compressed air you’ve expelled and propagates as sound waves into your environment. Notice your eyes, your neck, your shoulders and posture. Notice of course the words themselves on the screen as your eyes absorb light waves interpreted into characters and words you are able to pronounce verbally. As you read this paragraph out loud it is an embodied experience generating trillions of sensations, most of which still do not register in your consciousness even though you are trying to experience them.
What’s so easy about that? Nothing yet, but I promise that starting with a rich set of data to work with is so much easier than starting with a sparse set of data. We’re talking orders of magnitude easier. The pain is in setting up the collection system and keeping it going. Mining it doesn’t require nearly as much effort, though it is still pretty energy intensive.
We’re not done yet though. An experience of self is very likely not enough to provide language. There must be reflection. Self reflection is difficult, time consuming and exhausting. Watching someone else do half the work is way more enjoyable and satisfying. Thus language may have evolved to share work, freeing up energy for other uses. So how does this relate back to awareness and simulation? Well, it turns out that if you can simulate yourself, you can also simulate someone else. You can compare your simulation of yourself with the simulation of the other person — which is a perfect opportunity for two way reflection, a more efficient means of communication. When shorthand can be used to communicate, assuming a shared “reflected” mental model, then much of the information being shared is already present with the other person making it way faster and much more efficient (except when there is a disconnect or a mis-alignment of these models, e.g. one party’s assumptions do not line up with the other party’s assumptions and hilarious consequences result).
Physics.
We experience the world and our selves by reflection which is a form of measurement. Measuring our own bodies and internal state is of course how we form a model of our self. Millions of sensory nerve cells provide the inputs, while a variety of sense organs, including our brain of course, record and analyze the data coming in.
One example is our ability to determine spacial dimensions by measuring how long it takes for different sensory inputs to reach a central processing system. In the below diagram we have stimulation happening at A and B. Let’s assume it is a pressure sensation. The signals from each location travel to a centralized location at a constant rate. The difference between the signals can be described as distance or time.

Now take this simple model and multiply it by trillions of tactile and motor neurons in the body. Let the body interact with itself, bumping, sliding and tapping various inputs. The bending of limbs, the contracting of muscles, the movement of fluids and food and air through the body all generate those inputs. In the case of an artificial form, as much sensory data as possible would be needed. What the lower threshold of inputs might be for this is an interesting problem to consider.
How would such input be stored for use in measurement? Possibly as 3d points in space, relative to each other? Obviously our biology doesn’t store data in such way. Neuroscientists are actively at work trying to determine how we do store such data. It’s not a matter of if, only one of how. In the case of an artificial life form, we have plenty of ways to store such information — factoring the data into useful models is where we have work to do.
Having established dimensions, perception of space, using only tactile and internal state we can infer a whole realm of the physical world. Adding sensors that can pick up and record external energy will complete the picture. Both are required to establish accurate mental models of our place within an environment.
Types of external energy we have category terms for are light, sound and thermal heat. Chemical reactivity is another form of input that can be very useful as well. All of this sensory input contributes to our model of the physical world and our ability to represent it as abstract concepts such as gravity, force, acceleration, mass, etc.
Abstraction
Abstract thought is considered to be the pinnacle of organic intelligence. We know so very little about the mechanisms that lead to it, what enables it to occur and how we might try to understand it outside our own experience of it.
To avoid confusion, let’s define this concept a little more clearly. An abstraction is something that is not simply a labeled object or event. It exists as an idea or shared concept and may or may not have a relation to anything real. Much of our shared experience can therefore be described as an abstraction. Categories, numbers, units of measurement, time, directionality, etc. these are all abstractions of observable phenomenon that are not related to distinct concrete objects or events. The reality is that all of these things are generalizations that only exist in our minds and are mutable within a shared experience and agreed upon consensus view.
So how do we come up with this stuff? What leads us to label something as belonging to a category and how do we associate that category with all the other abstractions we use to establish meaning?
Let’s use a classic example. Red. Is it a frequency of light absorbed by our cones and rods, a set of phonemes or a description of a set of objects? What is red?




We reference that in our signal graph.
When we read “Red”,“Rojo”, “Rouje” or “红” for instance, the pattern for the word is
When we say the word or any other written symbol for this abstraction the experience of that soundwave is recorded and referenced in the graph.
Optimizations
Simulating ourselves (and others) within the environment is all about measuring internal and external state. When we factor that data into useful models we can use them for future comparison and prediction making. The measuring part is a never ending priming process. Changes in measurements compared to mental models of what is expected are learning opportunities and allow for the establishment of causation and relationships.
In natural life many of the causation relationships described have been codified into sensory organs, both in the distributed nervous and limbic systems as well as in the central nervous system. Though emergent from an evolutionary point of view these organs are now a part of a shared genetic library and do not emerge in an individual life but are already present. These are also the kinds of capabilities that need to be present in an artificial life, though they also need not be emergent and can be provided by design, for instance as software routines or systems on a chip. Establishing what predetermined sensory analytics capabilities are needed will be essential for designing an artificial life with a chance at being aware of and learning from it’s environment.
Multi-sensory reinforcement
I’ve only got one tool, a brain. It has some variation, a few specialized organs and some specialty job outsourced pre-processing, but for the most part it takes incoming signals and compares it with prior recordings of those same signals. It’s pretty flexible in that any part of it can do the job but some areas have evolved to receive specific types of signals and are primed to handle them from early development. So again, how can this general purpose hardware provide such amazing feats.
To address this we will need an example. How about a coin dropping on a concrete floor. This is an experience we can all imagine and if not it can be easily re-experienced in a moments time. Imagine the signal that happens when you see the coin drop and hit the floor. Another signal happens when you hear the coin hit the floor and bounce. These are your basic inputs from the experience in front of you. If the coin bounced into you there would also have been a tactile response and a mechanoreceptor response.


This is all well and good but in actual experience there is an entire cascade of signals that fire when you witness an event. You really don’t have any way to know from immediate feedback what it is you’re looking at. To gain context you must take the data coming in and compare it to existing models. Basically a memory search. However, you do have several vectors to start with so it should be fairly quick.
Potential search results can come from a variety of experiences. Certainly it would be compared against past sounds and past visual scenes connected to those sounds, unless the visual is itself somehow more distinctive than the sound in which case if might be the visual that is searched on first. If you recognize the coin you might have a muted motor sensor response of the name of the coin coming from your mouth. The tactile feel of the coin could ‘cross your mind’ and bring with it facts about the coin, such as that it is a piece of metal, metal is a hard rigid substance, the sound of metal things hitting something is sharp and may ring. The coin is a small object, the sound of small objects hitting a concrete floor often result in a reverberated sound due to bouncing and sound traveling through the concrete itself. The value of the coin might come up, if you had been thinking about money recently. A visualization of this activity would be as a pulse of energy branching out and firing off stored responses somehow related to the initial input, backtesting for relevance.
Depending on other factors this cascade can continue as a “stream of consciousness” which should rather be termed a “stream of re-experience”. In all cases as the input sensory patterns collide with existing sensory memories reinforcement learning is occurring, binding together these multiple sensory inputs into an aggregate model of a coin dropping. The most distinct features of this model could then be used to compare against other similar multi-sensory experiences for use in learning, labeling and predicting the world around you.
Conclusion
We haven’t really touched on anything specific to Machine Learning yet. I apologize for the tease. It seems that establishing the axioms for a self-aware, self-teaching intelligent system took longer than expected.
ML/AI are up next though, so if you want to keep reading, jump to the draft here.