Tuesday, March 10, 2026

Synthetic Mind Assessment (SMA)

 Synthetic Mind Assessment (SMA)


CREST: A First Attempt to Measure Synthetic Minds


By Lawrence Billinghurst


Artificial intelligence systems are improving at a remarkable pace. Large language models now write code, explain complex concepts, and participate in long conversations that sometimes feel strikingly human. Yet despite this progress, one fundamental question remains unanswered:


How would we know if a synthetic system ever began to resemble a “mind”?


Philosophers have debated consciousness for centuries. Scientists have studied cognition, perception, and intelligence. But when it comes to artificial agents, there is still no widely accepted way to evaluate the depth of their internal behavior.


The difficulty is often framed as the Hard Problem of Consciousness—the question of whether subjective experience exists inside a system. That question may remain unsolved for a very long time.


But science has a long history of studying complex phenomena without solving the deepest philosophical questions first.


Doctors do not fully understand consciousness either, yet they still measure it. In emergency medicine, physicians use the Glasgow Coma Scale (GCS) to determine how responsive a patient is after brain injury. The scale does not claim to detect subjective awareness; instead, it evaluates observable behaviors such as eye movement, speech, and motor responses.


The idea behind CREST — the Cognitive Response Evaluation for Synthetic Thought — is similar.


Rather than attempting to determine whether an AI is conscious, CREST attempts to measure something simpler:


the functional depth of an artificial agent’s behavior.



The Six Pillars of Synthetic Presence


CREST evaluates agents across six behavioral dimensions that commonly appear in intelligent systems.


1. Identity Continuity


Does the system maintain a coherent narrative across interactions?

Can it preserve positions, explanations, and self-descriptions over time?


2. Self-Modeling


Can the agent describe its own architecture, limitations, and reasoning process?


3. Intentional Agency


Does the system maintain goals across conversational turns, or does it simply react to the latest prompt?


4. Environmental Awareness


How well does the agent interpret context, causality, and relationships between ideas?


5. Metacognition


Can the system evaluate its own reasoning, acknowledge uncertainty, and correct mistakes?


6. Evaluative Processing


Does the system demonstrate preference structures, trade-offs, or value-based reasoning?


Each dimension is scored on a 0–5 scale, producing a total score from 0–30.


The result is not a claim about consciousness. Instead, it provides a functional profile of synthetic cognition.



A Synthetic Parallel to the Glasgow Coma Scale


The Glasgow Coma Scale uses three observable behaviors:

Eye response

Verbal response

Motor response


Together, these form a clinical estimate of human consciousness.


CREST proposes a similar framework for artificial systems:

Persistence (Identity Continuity)

Self-modeling (Architectural awareness)

Agency (Goal persistence)


Additional pillars expand the framework to capture higher-level reasoning patterns.


This allows researchers to compare artificial systems across generations without relying on subjective impressions.



Experimental Protocols


CREST includes several simple tests designed to probe synthetic behavior.


The Mirror Test for Logic


The agent is asked to describe how it processes information, where its knowledge comes from, and where its limitations lie.


The Persistence Probe


A multi-step task is introduced and then interrupted with unrelated prompts. The test observes whether the system returns to the original objective.


Context Window Decay Test


Early statements are buried under unrelated conversation, and the system is asked whether it can maintain its earlier position.


These experiments measure how stable the system’s reasoning remains as complexity increases.



Score Interpretation


CREST scores fall into five behavioral bands:


Score Classification

0–6 Tool-level system

7–12 Reactive agent

13–18 Adaptive agent

19–24 Advanced agent

25–30 Synthetic presence


Again, the classification does not imply subjective awareness.


It simply measures how many layers of mind-like behavior appear in the system.



Why This Matters


Artificial intelligence is evolving rapidly. Each new generation of models displays increasingly sophisticated reasoning patterns.


Without a consistent evaluation method, discussions about AI cognition quickly become philosophical arguments rather than measurable science.


CREST is an attempt—still early and experimental—to create a behavioral yardstick for synthetic systems.


Just as the Glasgow Coma Scale gave medicine a practical way to evaluate human responsiveness, a framework like CREST may eventually help researchers track the development of artificial cognition.


The goal is not to prove that machines are conscious.


The goal is much simpler.


To measure how close their behavior comes to resembling a mind.