[Edited November 9, 2015]
My previous post addressed John Searle’s Chinese Room argument. There I remarked that Searle has two related arguments: the Chinese Room argument (CRA) and the Syntax-and-Semantics argument (SSA). This post will address the SSA. In summary, my conclusion is that the SSA, like the CRA, is vacuous, doing no genuine work, but only creating the illusion of an argument through the use of misleading language. There are several misleading elements to clear up, making the refutation quite long. The post is made even longer by some appendices, but you can skip those if you like.
As before, I’ll take as my main source for Searle’s argument his 2009 article at Scholarpedia. The first paragraph of the section “Statement of the Argument” is what I’m calling the CRA. The remainder of that section down to the “Conclusion” is what I’m calling the SSA. The SSA is stated in the form of a “deductive proof”:
(P1) Implemented programs are syntactical processes.
(P2) Minds have semantic contents.
(P3) Syntax by itself is neither sufficient for nor constitutive of semantics.
(C) Therefore, the implemented programs are not by themselves constitutive of, nor sufficient for, minds. In short, Strong Artifical Intelligence is false.
First let me say something about the relationship between the two arguments, since Searle has left that unclear. The CRA and SSA are both arguments against Strong AI. (I’ll adopt without further comment Searle’s sense of the term “Strong AI”.) The SSA states this explicitly. The CRA concludes that no computer can understand Chinese solely “on the basis of implementing the appropriate program for understanding Chinese”, which clearly contradicts Strong AI. In Searle’s 1980 paper (“Minds, Brains and Programs”), his first on the subject, this was his main argument against Strong AI, and it is what most people interested in the subject seem to understand by the term “Chinese Room Argument”. In that first paper, Searle mentioned the subject of syntax and semantics, but it was only later that he developed these thoughts into the 3-premise argument that I’m calling the SSA.
Given that the CRA and SSA both argue for the same conclusion, we might think that they are independent arguments, requiring separate refutations. Indeed, it’s only because I think it’s possible to interpret them that way that I am addressing them in separate posts. However, the Scholarpedia version of the SSA makes them interdependent, though it’s ambiguous as to the nature and direction of the dependency. On the one hand, Searle claims that the CRA is “underlain by” the SSA, which suggests that the SSA supports the CRA. Also, in his response to the Systems Reply, he attempts to use P3 of the SSA (“The principle that the syntax is not sufficient for the semantics”) to justify the CRA’s assumption that nothing in the CR understands Chinese. But on the other hand, the SSA proceeds in the opposite direction, using the alleged absence of understanding of Chinese to support P3. Having it both ways round means that he is arguing in a circle.
Let’s have a closer look at the SSA’s argument in support of P3:
The purpose of the Chinese Room thought experiment was to dramatically illustrate this point [P3]. It is obvious in the thought experiment that the man has all the syntax necessary to answer questions in Chinese, but he still does not understand a word of Chinese.
In the past I’ve questioned whether Searle’s word “illustrate” should be interpreted as claiming support, but now I think no other interpretation makes sense. The SEP entry on this subject also interprets this as a supporting relationship: “The Chinese Room thought experiment itself is the support for the third premise.” Note that there is no theoretical argument here as to why syntax is insufficient for semantics. It relies purely on the alleged absence of understanding of Chinese in the CR. Setting aside any other problems with this argument, it commits exactly the same fallacy as does the CRA. It relies on the same unstated assumption: Since Searle doesn’t understand any Chinese, nothing in the CR understands any Chinese. In my refutation of the CRA, I explained how this move depends on an equivocation over the ambiguous first phrase (“Searle doesn’t understand any Chinese”) followed by an illegitimate jump to the conclusion (“nothing in the CR understands any Chinese”). I pointed out that its conclusion already contradicts Strong AI, and so the remainder of the argument does no significant work. Insofar as the SSA depends on the above-stated argument in support of P3, it is committing just the same fallacy as the CRA, and I consider it already refuted by my earlier post. If the SSA is interpreted that way, I can rest my case here.
Alternatively, we can ignore the SSA’s appeal to the CR, and interpret the SSA as an independent argument from more general principles, not drawing on the CR. I suspect Searle originally intended it that way, and only added the appeal to the CR when he realised he needed some support for the SSA’s major premise, P3. For the remainder of this post, I will proceed on the basis of this interpretation. It leaves P3 with no support, but it seems to me that Searle has contrived to make P3 seem undeniable anyway, as we’ll see later. He seems to consider all his premises so self-evident that he has previously referred to them as “axioms” (“Is the Brain’s Mind a Computer Program?”, Scientific American, January 1990). In that article, he didn’t make the above argument for P3, claiming instead that, “At one level this principle is true by definition.”
Searle describes the SSA as a “deductive proof”. But philosophy is not mathematics. And even in mathematics, little of interest can be proved by a one-step deduction. A philosophical argument may sometimes be clarified by a statement of premises and conclusion. But all the real work still remains to be done in justifying those premises. Any controversial elements in the conclusion are merely transferred to one or more of the premises. We should be able to look below the headline premises and see the substantive argument underneath. But, when we do that with the SSA, we find nothing of substance. I will argue that the SSA relies on vague and misleading use of the words “syntax” and “syntactical”. This vagueness makes it hard to see how the work of the argument is being divided between P1 and P3, and helps to obscure the fact that no real work is being done at all.
Proceeding to address the argument in detail, my first step is to slightly clarify some of the wording of the premises and conclusion. First, the expression “implemented programs” is potentially ambiguous. It could be taken as referring just to the program code in memory, but it makes more sense to take it as referring to the process of program execution, which makes it consistent with Searle’s use of the word “processes” in P1. So I’ll replace “implemented programs” by “execution of programs”. Second, I think the expression “nor constitutive of” is redundant, so I’ll delete it for the sake of brevity. As far as I’m concerned, Searle only needs to show that the execution of programs is not “sufficient for” minds. Finally, I’ll make an insignificant but convenient change from plural to singular. The argument then becomes:
(P1a) The execution of a program is a syntactical process.
(P2a) Minds have semantic contents.
(P3a) Syntax is not sufficient for semantics.
(Ca) The execution of a program is not sufficient for a mind.
Instead of making an argument directly about minds, Searle chooses to make an argument about “semantics”, and then trivially derive a conclusion about minds. Though I have some reservations about his use of the terms “semantic contents” and “semantics”, P2a is relatively uncontroversial, and here I’ll accept it unchallenged. We can then reduce Searle’s argument to the following shorter one:
(P1a) The execution of a program is a syntactical process.
(P3a) Syntax is not sufficient for semantics.
(Cb) The execution of a program is not sufficient for semantics.
This is the core of the SSA, and it works in two steps: (1) it gets us to accept the expression “syntactical process” in place of “the execution of a program”, and then (2) it switches from “syntactical process” to “syntax”. Let’s look at step (2) first. You may think the switch from “syntactical process” to “syntax” is just a minor change of grammar. But it can have a big effect on how we read P3. If this switch hadn’t been made, the premise would have been:
(P3b) A syntactical process is not sufficient for semantics.
Given the premise in this form, a reader might be inclined to ask: “Just what is it to be a syntactical process, and what properties of such a process render it insufficient for semantics?” In other words, with P3b it’s easier for a reader to see that there is a substantive question to be addressed. Switching to P3a makes the premise seem so obviously undeniable that a reader might be inclined to accept it without further reflection. Why? Because P3a lacks any mention of a process, and instead opposes “semantics” directly to “syntax”. The words “syntax” and “semantics” usually refer to two distinct properties of languages (or expressions in languages), and in that sense it’s incoherent (a category error) to talk of one being sufficient for the other. You may say this is not Searle’s intended sense of the words “syntax” and “semantics”. But his unnecessary switch to the words “syntax” and “semantics” (from “syntactical process” and “semantic contents”) encourages such a misreading. He goes even further in this direction later on, when he reformulates the premise as “Syntax is not semantics”. I suggest that Searle himself has sometimes unwittingly conflated different senses of these words, which would help explain his earlier claim that P3 is “true by definition”.
If we look at the full text of the argument, we see a gradual and unexplained slide towards the use of increasingly questionable and leading language to describe the process of program execution:
– “[process] defined purely formally”
– “[process defined purely] syntactically”
– “syntactical process”
– “purely syntactical operations”
The latter terms are more leading, in that the terms alone might be taken as suggesting an inconsistency with semantics. Searle gives no justification or explanation for his use of any of these terms beyond the first, and even that one is not explained clearly. To say that the process of program execution is formally defined is to say no more than that at some level of abstraction (particularly the machine code level) it can be modelled by a precisely specifiable algorithm. Since the computer’s memory states are discrete (just two possible states at the level of binary flip-flops), the state of the computer is precisely specifiable. And the operation of the processor conforms so reliably to the rules of execution that, given the state of the computer at one time, its state after the execution of the next instruction can be reliably and precisely predicted. What more is there to be said than that? What is gained by calling such a process “syntactical”? Let’s call a spade a spade. If there’s nothing more to be said than “precisely specifiable”, let’s just say “precisely specifiable”.
In the text following P1, Searle writes that “the notion same implemented program specifies an equivalence class defined purely in terms of syntactical manipulation”. But the word “syntactical” can perfectly well be replaced here by “precisely specifiable”. To say that two computers are executing the same program is to say no more than that at some level of abstraction they can be modelled by the same precisely specifiable algorithm.
It should be said that Searle is not the only writer to use the word “syntactical” (or “syntactic”) in relation to the execution of programs. It seems to be quite widespread. For example, Daniel Dennett, a staunch critic of the Chinese Room, speaks of “syntactic engines”. I suspect that this usage of these terms has arisen from a misguided association between precisely specified algorithms and mathematical formal systems. I address this point in Appendix 3. At least in Dennett’s case, I don’t think it’s leading him to make any mistakes, but is simply an unfortunate way of expressing himself, which works to his own rhetorical disadvantage. What really matters here is not what word Searle uses but whether he has made any substantive argument. We must ask, what do you mean by “syntactical” and why is such a process insufficient for semantics? He never answers that question.
Another unhelpful term he uses is “formal symbol manipulation”. Under his discussion of P1 he claims that “The computer operates purely by manipulating formal symbols…”. But he doesn’t explain why this is insufficient for semantics. Moreover, I will argue in Appendices 1 and 3 that this claim is misleading.
In the 1990 article cited above, Searle made a different argument, based on the idea that symbols can stand for anything an observer wants:
“The second point is that symbols are manipulated without reference to any meanings. The symbols of the program can stand for anything the programmer or user wants. In this sense the program has syntax but no semantics.” [“Is the Brain’s Mind a Computer Program?”, Scientific American, January 1990]
Again, no genuine argument has been made here, as there’s no explanation of why we should proceed from each sentence to the next, and it’s far from clear that they follow. Moreover, the first two sentences are both misleadingly ambiguous. I’ll address them at greater length in Appendix 2, but here’s a brief response to the second sentence. Can the user of a chess program interpret the on-screen symbols any way he likes? Can he interpret a knight symbol as a rook, or even as a Monopoly house? Not sensibly. The conventional interpretation is forced on him, not only by the appearance of the knight symbol, but also by the way it behaves. It follows the rules for a knight, and not for a rook. The interpretation is forced by the program. What’s true for on-screen states is equally true for memory states. Searle is probably thinking about memory states taken out of context. If we look at a byte of memory on its own, without the context of the program, then there’s a sense in which we can interpret it in any number of ways: as an integer, as an ASCII character, as any chess piece, etc. But why should such context-less interpretations be relevant? Searle doesn’t tell us.
I suggest that the SSA is primarily motivated by the intuition that you can’t get mind or meaning from the mere mindless execution of instructions. This is similar to the more common intuition that you can’t get mind or meaning from the mere mindless interactions of particles. Not satisfied with expressing such an intuition, Searle has tried to find ways of supporting it with an argument, but has failed to come up with anything of substance, let alone the sort of decisive “proof” that he claims. He has been misled by a common usage of terms like “formal symbol manipulation” and “syntactical” in relation to program execution, and wrongly jumped from such usage to the conclusion that program execution cannot give rise to “semantics”. There’s nothing more to his argument than that.
Of course, the passages I’ve quoted here have been only a small sample of Searle’s writing on the subject. I can’t possibly go through every line that he’s written in order to prove a negative, that none of it amounts to anything of substance. But the Scholarpedia article is relatively short, and I think you can verify for yourself that it contains nothing of substance. If you’re satisfied of that, then you have sufficient reason at least to be very skeptical about the CRA and SSA.
There ends my refutation of the SSA. My goal has not been to show that the execution of a program is sufficient for meaning or mind, but merely to show that Searle has given us no reason to accept the contrary. I’ve tried to avoid relying on any positive philosophical position of my own, as I don’t want any reader’s rejection of my own position to get in the way of seeing that Searle’s argument amounts to nothing. However, I’ve added some appendices, in which I express some more positive views.
APPENDIX 1. Levels of abstraction.
It should be understood that, when we talk about the world, we are modelling it. We model the world at various levels of abstraction, and with various types of abstraction (or we could say various types of model). We can talk about cities, about buildings in cities, about bricks in buildings, about molecules in bricks, and so on. It would be pointless (even meaningless) to ask which of of these entities (cities, buildings, bricks and molecules) are the “real” ones. Do bricks really exist, or are there only molecules (or atoms, or quarks, or quantum fields)? All of these concepts are abstractions which play a useful role in our useful models. The abstractions I’ve mentioned so far are pretty much physical ones, though “city” is stretching that point. Other abstractions could be cautiously described as “less physical” or “more abstract”. That includes such things as beliefs and desires, or “intentional states” as they are sometimes called.
When we compare different levels of abstraction, we may say such things as “x happens at level X, but y happens at level Y”. For example, at the molecular level there are only chemical interactions, but at the macroscopic level organisms are going about their lives. This way of speaking can create the impression that there are two parallel processes, and lead to a misguided strong emergentism: the entities and properties at the higher level seem to appear rather miraculously from a lower-level process that lacks those properties. But this misguided impression can be avoided by remembering that there is only one process, looked at in different ways. That’s not to deny that it’s sometimes useful to talk of different processes, especially since the different models may be addressing different aspects and parts of the process. There’s nothing wrong with such talk, as long as we don’t let it mislead us.
As with other real-world objects and processes, computers can be modelled with various levels and types of abstraction. In thinking about a computer executing a program, we could think at a hardware or physical level, for example about transistors switching flows of electrons. But mostly when we think about the execution of a program we think at a software or computational level, abstracting out the physical details. One software level of abstraction is that of machine code. But we can also think about higher levels of abstraction. If we’re programming in an interpreted language, like interpreted BASIC, then it’s useful to think at the level of that language. Since the BASIC interpreter is itself a program, when the BASIC program is being executed we have two programs being run. But the programs are not being run side-by-side, as when we run two programs in different windows on a PC. There’s a sense in which execution of the BASIC program just is execution of the interpreter. Putting it more carefully, we are talking about the same process at two different levels of abstraction. We can also speak at different levels of abstraction when talking about a program that’s been written in a modular, hierarchical way, with one subroutine calling other subroutines, which call still others. When we talk about what a higher-level subroutine is doing, we’re talking at a higher level of abstraction. If we say that our high-level subroutine calls the SORT subroutine to sort some list, we are abstracting out all the detail of the sorting work that goes on within the SORT subroutine. Yet another type of high-level abstraction occurs when we talk about what’s going on at the levels at which we typically describe our interaction with the system. For example, we may say that the program (or the system) is playing chess, that it moved P-K4, that it made a good move, etc. All these statements are modelling the program’s execution at a very high level of abstraction.
In thinking about computation and AI it’s important to keep these matters in mind. It would, for example, be a mistake to take too literally a claim that a program only executes machine-code instructions, or only engages in “formal symbol manipulation”. That may be all we see when we choose to model the process at the machine code level. But at other levels of abstraction the program is doing more sophisticated things, like playing chess. There may also be a temptation to privilege the machine-code level of abstraction, and say that what’s happening at that level is what’s really happening. To say that would be to make the same mistake as saying that only atoms really exist, and bricks don’t. Or only neurons really exist, and beliefs don’t. There is no such privileged level or type of abstraction. When programming we typically focus on a level at which we can see the execution of precisely specifiable instructions, such as the machine-code level. And that may incline us to assume incorrectly that every computational level of abstraction must be precisely specifiable. The fact that an AI would follow a precisely specifiable algorithm at the machine code level is no more relevant than the fact that the human brain could (in principle) be very precisely simulated.
APPENDIX 2. Meaning
Let me return to a passage I quoted earlier, and take this as a way into a broader discussion of meaning.
“The second point is that symbols are manipulated without reference to any meanings. The symbols of the program can stand for anything the programmer or user wants. In this sense the program has syntax but no semantics.” [“Is the Brain’s Mind a Computer Program?”, Scientific American, January 1990]
First let’s note an ambiguity in Searle’s second sentence. “The symbols of the program” could refer either to the symbols representing instructions or to the symbols representing data. I assume he means the latter, but I’ll start by addressing the former, as I think that case is easier to understand.
Let’s think about machine code instructions, like the JUMP instruction, which tells the processor to continue execution of the program from an address other than the following one. The JUMP instruction is represented in RAM by a certain state of a memory cell. (Looking at the level of binary flip-flops, we could say that it’s represented by a certain sequence of flip-flops.) When the processor encounters a cell in that state, it JUMPs. We can think of that state as a symbol for the JUMP instruction. It’s no less a symbol than is the printed word “JUMP” in an assembly language listing of the program. One is more easily read by processors, the other is more easily read by humans. But, with the right equipment, a human could read the symbol in RAM. And in principle we could equip a computer with an optical character recognition system so that it could automatically read and execute the program from the printed assembly language listing. In principle a human could execute the program by reading the symbols (on paper or in RAM) and executing them, rather like Searle in the Chinese Room. Whether it’s a human or a processor executing the program, they are both doing the same thing: reading the symbol for JUMP and then JUMPing. Of course, JUMP here is not the ordinary English word “jump”. (What the processor does has some resemblance to “jumping” in the English sense, and that gives the symbol “JUMP” useful mnemonic value. We could give the machine instructions non-mnemonic names, like “ALPHA”, “BETA”, etc, but that would just make them harder to remember.) The meaning of the JUMP symbol lies just in what it tells the executor of the program (human or processor) to do, no more and no less. The symbol has the same meaning to either the human or the processor: it tells the executor to continue execution from a different address. The processor is just as capable of acting in accordance with this meaning as is the human, though of course it does it in an automatic, mindless way. There should be no problem accepting this, as long as we don’t fall into the trap of seeing meaning as a quasi-dualistic property which somehow gets added to symbols in a mysterious way, or of conflating meaning with the conscious appreciation of meaning. Our talk of meaning (in the semantic sense, relating to symbols) is a useful way of understanding the role that symbols play.
We might be tempted to say that the meaning of the JUMP symbol is given by the formal specification of the instruction set that the engineers presumably had in hand when they designed the processor. But that’s past history. In the context of discussing the particular processor in front of us, the meaning of the JUMP symbol is given just by the fact that, when the processor encounters that symbol, its consistent behaviour is to continue execution from a different address, i.e. to do what’s usually called “JUMP”. Contrary to Searle’s second sentence (if we apply it to instructions), we can’t take this symbol as standing for anything we want. It would make no sense to take it as standing for ADD (unless we perversely used “ADD” to mean JUMP). ADD and JUMP are two very different operations, and there is a fact as to which is being executed at a given time.
Although the meaning of the JUMP symbol can be specified, it doesn’t have to be, in the sense that such specifications (or definitions) are just a useful way of describing the behaviour of the executor, or specifying what the executor should do. The executor doesn’t need them. A human executor might attend to such specifications while learning what to do, but eventually he could reach a state in which he is able to execute the machine instructions without attending to the specifications (rather the way that we may learn the grammar of a foreign language from reading rules in a book, but eventually learn to speak fluently without attending to any rules). He would then be in a similar situation to the processor, which does not use any specifications of the instructions it’s executing. Given that a computer doesn’t make use of any specifications of its machine instructions, why should the fact that such instructions can be precisely specified be of any relevance in considering the situation that the computer is in? (Strictly speaking, a processor may execute micro-code, which could be considered using a specification, but we can ignore that complication here.)
We might be tempted to jump from the first thought in this sequence to the others:
1. The execution of an instruction can be precisely specified.
2. The execution of an instruction can be specified without reference to its meaning.
3. An instruction can be executed without reference to its meaning.
Thought #2 is confused. A specification of how to execute an instruction symbol gives the meaning of the symbol. There is nothing more to its meaning than that. Thought #3 is confused too. When the executor executes the instruction, it might attend to a specification (which gives the meaning), as in the case of a human who hasn’t yet learned to execute the instruction automatically. Even when there is no attending (or “reference”) to a specification, the executor is still executing the instruction in the way that is appropriate to the meaning of the symbol. We can call this “acting in accordance with” the meaning of the symbol, as long as we don’t interpret “acting in accordance with” as “attending to”. In other words, a processor doesn’t use any specification of the meaning of the symbol, and in that irrelevant sense it doesn’t make “reference to the symbol’s meaning”, but it does act in accordance with the meaning of the symbol. Searle’s use of the unclear expression “without reference to meaning” fails to observe this important distinction, and so is misleading.
Now let’s proceed to consider data symbols, starting at the machine code level. What should we make of Searle’s assertion that such “symbols are manipulated without reference to any meanings”? I say this is confused in a similar way to thought #3 above. The processor manipulates data symbols in a way that is in accordance with their meaning, i.e. appropriate to their meaning, but it doesn’t need to make any “reference” to their meaning. And I’ll show again that, at the machine code level, our assignment of meanings to symbols is constrained by the processor’s behaviour. Contrary to Searle’s assertion, the symbols can’t reasonably be taken as standing for anything we want.
Since we’re looking at the machine code level, let’s think about the interpretation of flip-flops as 0s and 1s. If we can interpret the states as standing for anything we want, we should be able to interpret both states of the flip-flops as 0s. But that would clearly be absurd. The whole point of our modelling the world is to help us make sense of it. We couldn’t make any sense of what the computer is doing at the level of flip-flips if we treated both states as the same! But though it’s less obvious, we also couldn’t make sense of computer operations at this level if we reversed our usual interpretation of the states, so that the state usually interpreted as 1 is now interpreted as 0, and vice versa. Consider an ADD instruction, which takes two numbers and returns the sum. 0000 + 0000 = 0000. But 1111 + 1111 does not equal 1111. So, after reversing our usual interpretation, the ADD instruction is no longer adding. OK, you might say, we can change our interpretation of the instruction too, and call it something else. But, as far as I know, the operation the instruction is now doing is not a standard one. There may be no pre-existing name for it. Forcing us to reinterpret ADD as some other, strange operation is working against our goal of making sense of what the computer is doing. And that would be just the start of our problems. We would soon be tying ourselves in knots trying to make sense of the situation. The standard interpretation is fixed not just by the initial decision of the computer’s designers to interpret the binary states that way round, but by the fact that they designed the processor to work on the basis of that interpretation. Now that the processor works that way, the interpretation is locked in. Searle is looking at the memory states in isolation, instead of taking them in the context of the processor. It’s that context that gives the states their meaning, and makes them symbols, not just physical states.
When we focus on the machine code level of abstraction, the only meanings we can see are the simple meanings that arise from the behaviour of the processor. At higher levels of abstraction, the meanings arise from the behaviour of the program (in combination with the processor), and, since there’s no limit in principle to the complexity and sophistication of programs, there’s no limit to the complexity and sophistication of the meanings. At this higher level too, it’s not true that we can interpret symbols however we like. Consider a computer running a chess program, displaying the (virtual) board on its screen. It will be convenient to think about the symbols on the computer’s screen, though we could also think about corresponding internal memory states. We can’t reasonably interpret the screen as showing a game of Monopoly, so we can’t reasonably interpret the knight symbols as Monopoly houses. Nor can we reasonably interpret the knight symbols as rooks, because they don’t behave like rooks. A knight symbol represents a knight because the computer has been programmed to treat it as a knight, and that interpretation is now locked into the system. It would make little sense to say that the knight symbol is manipulated “without reference to any meaning”. The system manipulates the knight symbol in accordance with its meaning, because it manipulates it in just the way that is appropriate for a knight (and not for a rook).
Searle’s mistake is to consider states of of memory (or screen) on their own, independently of any context. The relevant context here includes the rest of the program and the processor, which constitute the causal system that produces and/or interprets those memory states. Searle’s attitude is analogous to saying that we could interpret the words of a book as numbers in base-26 notation; or as numbers in binary notation, taking the letter “A” as 0, and other letters as 1; or any number of other pointless interpretations. These interpretations are pointless because they ignore the relevant context, which includes the process by which the book was produced. But note that a computer system is different from a book, in that it is itself a causal system, producing further states, so the meanings of its states can be fixed by its own behaviour, and not just by the process that produced the system in the first place.
Of course, there may be meanings to which a computer is indifferent. A text editor program is indifferent to the meanings of the words that the user is typing. The program does not manipulate the words in accordance with their meanings. An AI program of the sort we have today, such as Siri, uses words in accordance with their usual meanings to some extent, though nowhere near a fully human extent. But consider an English-speaking AI that could use language in just as sophisticated a way as a human. Searle allows that, in principle, an AI could have just the same behaviour as a human, by virtue of executing the right sort of program, e.g. a highly detailed simulation of a human brain. (He says that “I, in the Chinese Room, behave exactly as if I understood Chinese, but I do not.”) That means it would be just as creative as a human, capable of making up jokes, writing original poetry, and perhaps engaging in philosophical discussion. Let’s consider that scenario. It doesn’t matter for present purposes whether such an AI would be conscious or not. If you like, assume that it would be a non-conscious “zombie”. Since we could understand what it was saying, its words must have the same meanings as ours. That kind of program would be sufficient to fix the meanings of its words as the ordinary meanings. If the AI is a full-brain simulation of my brain, the meanings of its words could be considered “derived” from my meanings, but they would be “derived” in much the same sense that my meanings were in turn “derived” from those of the people from whom I’ve learned my linguistic habits. The AI would merely have acquired its linguistic habits in an unusual way. And once it started running, it would thereafter acquire new linguistic habits in the same ways I do, by picking them up from the speech it encounters, occasionally looking up words in dictionaries, and occasionally inventing new words and meanings of its own. The language of a community of such AIs would evolve over time, adding new words and meanings, in the same manner that human languages do. If an AI started using a new word correctly as a result of finding its meaning defined in a dictionary, would Searle still insist that the AI is manipulating the word “without reference to any meaning”?
I consider my view of meaning to be much the same as Wittgenstein’s. Roughly speaking, meaning lies in use. I would recommend reading Wittgenstein’s example of the builder and his assistant, at the start of “Philosophical Investigations”. I think that does a similar job to my discussion of executing machine code instructions, in helping us see how ordinary and unmysterious meaning is. In that example, the meanings of the four words of the builder and his assistant lie in nothing more than their habits of using those words. The meanings would be just the same if we replaced the humans with machines having the same habits.
I’ve made no attempt here to explain mind or consciousness. Consciousness is not called the “hard problem” for nothing, so I chose to say nothing about it here. But I’ve said a little about meaning, because that’s not such a hard problem, as long as we don’t conflate it with consciousness. Searle chose to make the SSA an argument about meaning/semantics. He tries to limit this to meaning/semantics “of the sort that is associated with human understanding”, by which he apparently means the understanding of a conscious system. He’s given us no more reason to accept his conclusion about that subset of semantics than to accept a similar conclusion about semantics more broadly. For the purposes of my positive account I’ve widened the discussion to meaning/semantics more broadly, because I think that’s the best way to demystify the concept, and I think Searle is drawing a misleading dichotomy. Meaning is meaning, whether we’re talking about conscious or non-conscious systems. The conscious appreciation of meaning is another matter. In my view, those who insist on conflating the subjects of meaning and consciousness will never understand either of them.
APPENDIX 3. Formal Systems
It seems to me that the tendency to use the terms “syntactical” and “formal symbol manipulation” in relation to computer program execution has arisen from making a misguided association between precisely specified algorithms and mathematical formal systems. It’s true that we can specify an algorithm in the format of a mathematical formal system, but doing so is of little benefit to these discussions. To see that, I’ll proceed by sketching such a formal system.
Let’s say we want to model the process of program execution at the machine code level. Let the well-formed formulas of the system be strings of binary digits representing the possible states of the computer’s memory. We’ll need to include the processor’s internal registers. For example we might let the first 64 bits of each formula correspond to the processor’s program counter, which points to the next instruction to be executed. Then our single “axiom” will correspond to the initial state of the computer, with our program and starting data in memory. Our “theorems” will correspond to subsequent states of the computer, after the execution of each instruction. Our single “rule of inference” will tell us how to execute one instruction, whichever one is currently pointed to by the program counter. This single rule could be broken down into sub-rules, one for each different instruction in the instruction set. But I call it one rule in order to emphasise that there is no choice of rules to be applied, as there is in the case of a mathematical formal system. In the mathematical formal system, it’s open to the mathematician to decide which rule to apply to produce the next theorem, and there are many possible theorems he could produce. That’s why we can’t think of the mathematical system as specifying an algorithm. But in the case of program execution it’s more natural to think in terms of an algorithm than of a set of rules.
Mathematicians have sometimes formalised an area of mathematics (say number theory) by giving a set of axioms and precisely specified rules of inference. It is then possible, in principle, to derive theorems from the axioms purely by following these formal rules. The application of such rules is sometimes called “formal symbol manipulation”. Since the rules can be applied without using any prior knowledge of the meanings of the symbols, it has sometimes been said that the symbols of the formal system have no meaning, and consequently the word “formal” may have become associated with meaninglessness in some people’s eyes. But it’s not true to say that the symbols have no meaning at all. After all, the very fact that different rules are applicable to different symbols makes it useful to think of them as having different meanings. Different symbols mean different things to the reader, telling the reader what can be done with those symbols. So the axioms and rules confer a meaning on the symbols. And, because the axioms and rules have been chosen to make the symbols correspond to our mathematical practice with the familiar symbols, we can say that the meanings of the symbols are related to the familiar ones, e.g. the meaning of “+” in the formal system is related to the familiar meaning of “+”. I say “related” and not the same, because Godel showed us that the axioms and rules don’t confer the full meaning that the symbols ordinarily have. There are aspects of our normal mathematical practice with the symbols that are not captured by the axioms and rules. Consequently, there is no concept of truth within the formal system. Nevertheless, the formulas of the formal system have corresponding mathematical statements which may be true or false. It is this very correspondence that allowed Godel to say that there are formulas that are unprovable in the formal system but which nevertheless correspond to true mathematical statements.
On the other hand, in the case of our formal system specifying program execution, there is no corresponding concept of truth. There is no sense in which the state of a computer can be said to be true or false. (Consequently, Godel’s result cannot be applied to such formal systems, pace Roger Penrose.) This is the second major difference between the two types of formal system, and explains why it’s peculiar to use the terms “axiom”, “theorem” and “rule of inference” in this context. In short, nothing has been gained by talking about our algorithm in the language of mathematical formal systems, and the terms “formal symbol manipulation” and “syntactical” are unhelpful.
We can now more readily address a particular remark that Searle makes under P1 of the SSA:
“The computer operates purely by manipulating formal symbols, usually thought of as 0s and 1s, but they could be Chinese symbols or anything else, provided they are precisely specified formally.”
In the formal system I’ve described above, at the machine-code level, there is no explicit reference to any symbols apart from binary digits. Within our written specification we might represent these binary digits by the marks “X” and “Y”, or even use “1” and “0” to correspond to the digits 0 and 1 respectively. That would make no difference, except that we would have adopted a more confusing notation. As I explained in Appendix 2, our decision to interpret the flip-flops the usual way round is not arbitrary, and we cannot sensibly interpret them as anything we like. In other words, given that our formal system is intended to model a particular real-world process, we have no choice of symbol interpretations, only an insignificant choice of notation.
This is analogous to the fact that, when mathematicians axiomatise a pre-existing area of mathematics, their formal systems don’t usually use arbitrary symbols. They use the pre-existing mathematical symbols, and choose the axioms and rules of the system in such a way as to ensure an appropriate correspondence between the pre-existing use of those symbols and their use in the formal system. They could make the mark “-” in the formal system correspond to the mark “+” in our pre-existing mathematical practice, but that would only create a confusing notation. If the axioms and rules involving the mark are ones that are appropriate to addition, then the symbol corresponds to addition no matter what mark we choose. We cannot interpret the symbols however we like.
Of course, there needn’t exist any written specification of our computer system. A written specification is only a description of the system. So, when we say that the system is “formally specified” we must mean it in some more abstract sense. All we mean is that the system has the kind of regularity that could be modelled well by a formal specification. In this more abstract sense, the question of notation doesn’t even arise. If we say that the computer manipulates the “formal symbols” 0 and 1, all this really means is that, at a certain level of abstraction, it works with discrete binary states (e.g. flip-flops) which are manipulated in a very regular way, such that we can describe the process well with a precisely specified algorithm, and such that it’s meaningful to assign the states the values 0 and 1.
There is clearly no possibility of interpreting the binary symbols of the machine code model as Chinese characters, since there are only two different symbols! In a sense, strings of binary digits could be interpreted as Chinese characters. But this model doesn’t pick out any such strings. For comparison, note that it does pick out strings representing opcodes, i.e. strings of digits which tell the processor which instruction to execute: e.g. “10010111” for JUMP, and “00111001” for ADD. Our written formal system needn’t include the symbols “JUMP” and “ADD”. It could refer to the binary strings directly. But for the sake of a human reader it might be convenient to define “JUMP” to mean “10010111”, and thereafter write “JUMP” instead of “10010111”. Instead of a rule saying “if the next instruction is 10010111…”, it could then say “if the next instruction is JUMP…”. But there would be no point in defining Chinese characters in this way, because this formal system has no rules that pick out strings and manipulate them in a way that’s appropriate to Chinese characters. There is no useful sense in which Chinese characters are being manipulated in accordance with a formal system for the machine code level.
Suppose our program is one that answers Chinese questions. Then there must be some level of abstraction at which we can talk about what’s happening in terms of Chinese characters. For example, we could say that the system has just answered the question “ABC” with the answer “XYZ” (where for convenience I use “ABC” and “XYZ” to represent sequences of Chinese characters). But such statements don’t constitute an algorithm for answering Chinese questions. There needn’t be any level at which the process can be modelled by a formal system (i.e. a precisely specified algorithm) that picks out Chinese characters, in the way that our machine code model picks out machine code instructions. In other words, there need not exist any formal system that manipulates Chinese characters. There need not be any “formal manipulation” of Chinese characters. Supporters of classical or “symbolic” AI may be looking for a formal system at the level of such high-level symbols. But supporters of “sub-symbolic” AI are not restricting themselves in that way, and doubt that any such system could produce human-level verbal behaviour. Searle seems oblivious to sub-symbolic approaches to AI.