1
Systems

"Contrariwise, if it was so, it might be; and if it were so, it would be; but as it isn't, it ain't. That's logic."
--Tweedledee[11]

"Logic is a little tweeting bird chirping in a meadow. Logic is a wreath of pretty flowers, which smell bad."
--Mister Spock[12]

"If you say anything I will cut off your head! If you do not say anything I will cut off your head!"
--Ganto

All of human intellectual history concerns systems in various forms. Artistic and philosophical movements, methods of interpretation, the various "-isms" have all been systems into which humans have tried to cast their perceptions of the world. Even the modern movements and schools that decry systematisation are themselves systems of thought. Surrealism is a reality, and Deconstruction can be deconstructed. Humans have probed the intellectual limits of the systems to which they have confined their thinking, and every so often, after encountering a barrier, someone raises a rallying cry to jump out of the system, to go beyond its perceived limitations. A flood of books and papers is published, the movement grows to dominate the careers of a generation of thinkers and creators, and then somebody hits the new wall, peeps beyond it, and begins the next phase. Yet no matter how many walls we jump, it seems we are always confined within some sort of system, one of Wordsworth's "pensive citadels". It seems that we need the guiding force of a system in which to think, that we are incapable of thought without a system, a mythos, a frame of reference. Pirsig, in another reference to the need for common context as a prerequisite for communication, has written that to transgress the mythos is to become insane. The pervasiveness of systems in both computer science and literature has already been remarked upon in the introductory chapter. This chapter expands the coverage of systematisation.

We begin with a discussion of the earliest known legal system, and then we treat the incautious rush to systematisation that followed Newton's systematisation of mechanics. This movement is the forebear of modern computer science. It is Newton's thought that led to the formalisation of reason in the nineteenth century, to the fascination with symbol manipulation in the mathematics of the late nineteenth and the twentieth centuries, and to the roots of that branch of mathematics now known as computer science. The chapter concludes with an instruction in the method of formal logic known as the propositional calculus, the formalisation of reason that formed the basis of much of modern mathematics, along with some of its applications.

The Code of Hammurabi

The earliest recorded legal system was the code formulated by Hammurabi in Babylon in the twenty-third century B.C. Amidst statements of the divine origin of this code and much bellicose fanfare about his own greatness, he stated in the epilogue of the code his purpose in dictating this set of rules:

...that the strong might not oppose the weak, and that they should give justice to the orphan and the widow... for the pronouncing of judgements in the land, for the rendering of decisions for the land, and for the righting of wrong, my weighty words have I written on upon my monument, and in the presence of my image as king of righteousness have I established.[13]

The format of these laws is simple; each law is a conditional sentence, stating an imperative which is to apply if the associated condition is satisfied. For example,

If a woman hate her husband, and say: "Thou shalt not have me," they shall inquire into her antecedents for her defects; and if she have been a careful mistress and been without reproach and her husband have been going about and greatly belittling her, that woman has no blame. She shall receive her dowry and go to her father's house.[14]

Although Hammurabi was not conscious of it, he was working within the same abstract ideal of systematisation that has driven modern science. Hammurabi was conscious that the code that he was creating would exist independently of him, that it was at least capable of outliving him and continuing to be understood and applied long after his death. This is why he caused it to be inscribed on several monuments throughout his kingdom. Such an idea, that this collection of rules, this legal system, should continue to dispense justice, render decisions, and right wrongs long after its creator was gone! Hammurabi's Code was an attempt to reduce traditional standards of behaviour to a simple set of rules, so that in cases where the rules applied, there need be no judge. Anyone who could read and understand the Code of Hammurabi could find the rule or rules corresponding to a given case, and read off the settlement or punishment, mechanically, deterministically. So the process of justice, in the domain in which the rules applied, could be reduced to rote, automated. The conjunction of the Code and its reader would be equivalent to a judge.

Of course, the Code of Hammurabi served only a finite domain, and if one tries to give a general definition of "justice", one leaves oneself open to a Socratic reductio ad absurdum. This is why school children learn that the judicial system interprets laws instead of merely applying laws; the letter can never fully contain the spirit. That legislators attempt to promulgate formal and precise statements can be seen by examining any written law. Laws are written in a peculiar prose style, one that is stilted, often cumbersome and difficult to understand without paying acute attention to every word. And still, despite best efforts at precision, laws have consequences that are unimagined, and sometimes counterproductive, when applied.

The United States' Freedom of Information Act is a germane example, being a product of the Information Age. When this law was enacted in the 1970's, it was intended as a way of promoting civil liberties by specifically allowing public access to many files that the government of the United States maintains. The Freedom of Information Act was proposed in tandem with the Privacy Act, which allows individuals access to most files that the government maintains on themselves in particular. However, in practice, the majority of requests for information under the Freedom of Information Act have come from corporations trying to obtain information collected by the government on their competitors, and the resulting cost of retrieving information in order to comply with the requests has been much greater than originally projected. This has been another timely, practical demonstration that the cost of information retrieval is becoming a limit to information accessibility. This legal example and many other cases demonstrate a need for caution in attempts to reduce whole areas of thought to collections of formal rules. But most of the systematisations in modern western thought have been far from cautious.

Lavoisier: The Systematisation of Language

The urge toward mathematisation came into its own during the Enlightenment. Isaac Newton began a rush toward mathematisation with his Principia Mathematica, converting the empirical science of physics into a mathematical science. Natural philosophers could now treat abstract mathematical properties instead of concrete mechanical properties, and yet arrive at conclusions that held in the mechanical universe. Some people were uncomfortable with this abstraction and felt that the correspondence between mathematics and phenomena in the real world was itself a phenomenon that needed explaining. Nevertheless, mathematical models became all the rage.

All of a sudden, people wanted to mathematise everything. Newton himself did not limit his systematisation to mechanics, but also explored chemistry. He performed some chemical experiments in order to construct his tables of chemical affinities, expressing the tendency of solutes to displace each other in chemical solutions. Newton's ambition for chemistry was to incorporate it into his system of physics by explaining chemical behaviour in terms of mechanical models, reducing chemical reactions to mechanical interactions of particles of matter. He envisioned a single, unified, hidden harmony that generated all the apparent harmonies of natural philosophy, a fundamental law that implies the laws of all these apparently distinct fields and thus unifies them. This is similar to the ambition of contemporary physicists of developing a Grand Unified Theory, a single law behind all the apparently distinct types of force.

Antoine Laurent Lavoisier was an eighteenth-century French chemist who, simply put, did for chemistry what Newton had done for physics a century earlier. Lavoisier laid much of the foundation of modern chemistry. Before Lavoisier, chemistry had been a convoluted, arcane art of apparatus and procedure rooted in medieval alchemy. Lavoisier put chemistry on a sound experimental and theoretical footing. Newton's projected unification of chemistry with physics would have taken a giant step toward the unification of all science into a corpus of knowledge derivable from a simple set of elementary laws. But it was too complicated a scheme to have been practical in the eighteenth century. The mathematician Laplace, a friend of Lavoisier, held that it was possible, in theory, to predict the workings of the universe solely on the basis of microscopic, atomic interactions. In reality, however, a layer of abstraction was needed between physics and chemistry in order to prevent mechanics from running amok and overwhelming chemists with detail. The unification of chemistry and physics did not begin in earnest until the kinematics of the nineteenth century, and in the twentieth century it has been seen that chemical behaviour is not understandable in terms of physics without the aid of the modern theory of quantum mechanics.

But what can be the relevance of an eighteenth-century chemist to the study with which we are concerned? Surely eighteenth-century chemistry has very little to do with modern pure mathematics and computer science. What we are concerned with is not Lavoisier's chemistry, but his approach to the study of that chemistry, which can be abstracted away from that particular science. In his Traité élémentaire de chimie, the chemical analogue of Newton's Principia, Lavoisier described, for the first time in an elementary text on chemistry, his oxygen theory and his new method of chemical nomenclature. The oxygen theory, although it suffered from a misidentification of oxygen as the principle of acidity (Lavoisier coined the term "oxygen" from the Greek meaning "I produce acid"), explained the process of combustion and set chemistry on the right track toward an understanding of heat. Although Lavoisier's formulation of the oxygen theory is another beautiful instance of discovering the hidden harmony behind the apparent, to give it the attention it deserves would be a task far beyond the scope of this work, and therefore we mercilessly omit it from further discussion here.

The new method of chemical nomenclature was developed in conjunction with contemporaries, and the need for it had been recognised by many chemists of the time, but, as in most of the joint projects in which he participated, Lavoisier was its principal architect. The scheme was derived from his systematic study of chemical reactions. It is to Lavoisier that we owe the term for the class of compounds "oxides" and the common suffixes "-ous" and "-ic", and it is because of his view of the importance of precision in scientific language that we write chemical "equations", in a pseudo-algebraic notation. In this belief he was a follower of the philosopher Étienne Bonnot, Abbé de Condillac, a contemporary of Lavoisier who enunciated the correspondence between the use of language and the use of analysis and stressed that imagination must have the guidance, the channelling, of analysis. In the introduction to his Traité, Lavoisier wrote:

L'impossibilité d'isoler la nomenclature de la science et la science de la nomenclature tient à ce que tout science physique est nécessairement formée de trois choses: la série des faits qui constituent la science; les idées qui les rappellent; les mots qui les expriment. Le mot doit faire naître l'idée; l'idée doit peindre le fait: ce sont trois empreintes d'un même cachet; et, comme ce sont les mots qui conservent les idées et qui les transmettent, il en resulte qu'on ne peut perfectionner le langage sans perfectionner la science, ni la science sans le langage, et que, quelque certains que fussent les faits, quelque justes que fussent les idées qu'ils auraient fait naître, ils ne transmettraient encore que des impressions fausses, si nous n'avions pas des expressions exactes pour les rendre.

This recognition of the isomorphism between the symbols of a language and the concepts that they identify, and between the syntactic and semantic rules of a language and the rules of proper reasoning, was to be the vogue in mathematics in the early twentieth century. Again, we see a foreshadowing of a great era of thought whose full import was unknown by those who were participating in its development.

Lavoisier himself met with ill fortune after the French Revolution, during the Terror. He had been a tax farmer, that is, one who paid the government for the privilege of collecting taxes--in effect, a tax collector working on the basis of commissions. The system of tax farming was, obviously, subject to abuse, although Lavoisier seems not to have been guilty of taking more than his fair share. Nevertheless, he went to the guillotine, after a mockery of a trial which lasted about fifteen minutes. His scientific friends were either powerless or afraid to defend him.

Cuvier: The Systematisation of Biology

In the biology of the eighteenth century we see the difficulties of our own information explosion reflected in miniature. The flood of new species arriving from the New World had strained the traditional, Aristotelian classification system to the breaking point. When Aristotle designed his classification system, he had assumed a static number of species. Thus, just as the growth in the amount of published information in the twentieth century has demanded new classification schemes in libraries, exploration of the New World demanded a new taxonomy for biology. Library classification schemes are arbitrary; it is a matter of convention what subject corresponds to what range of call numbers. Many scientists abhor arbitrariness. Much of science is the discovery of inherent patterns, regularities that seem to exist independently of the operation of the human observer, and so scientists want to believe in some "natural" arrangement, instead of having to concoct an arbitrary one. Some naturalists maintained that there was a "natural" taxonomy that corresponded to the divine plan of life. This construction, if it existed, would be not merely an artifice existing only in the minds of humans, but a plan followed in nature, and thus would have an undeniable, independent meaning. John Ray, a British taxonomist, believed in the existence of some natural scheme but thought that it was beyond the reach of human inquiry. Therefore Ray, to be followed shortly by Linnaeus and others, proposed an arbitrary system of classification. This acceptance of arbitrariness incensed some naturalists, and the argument over the possibility of a natural system was not settled until Darwin linked taxonomic concepts such as species and genus to evolutionary lines of descent.

Into the battle over classification stepped Georges Cuvier, a naturalist originally from the principality of Montbéliard on the Franco-Swiss border. Cuvier had been educated in Germany in preparation for employment as an administrator, but there had been no job open for him upon the completion of his formal education. He had therefore journeyed to Normandy to make his living as a private tutor for the child of an aristocratic family. In Normandy, around the time that Lavoisier was losing his head in Paris, Cuvier had the free time to indulge his hobby of natural history. He eventually turned it into his vocation, taking up a post at the Museum of Natural History in Paris.

Cuvier explored the relationship between anatomy and physiology, and formulated two anatomical rules of inference, which occupied a position in his natural history tenuously analogous to that of Newton's laws of motion in mechanics. Cuvier's rule of Correlation of Parts was the doctrine that every part of an animal is related to every other part, and his rule of Subordination of Characters set forth the determination of an animal's minor characteristics by its major ones. Applied together, these rules allowed the reconstruction of entire animals from single parts, in more or less detail. For example, bones in a leg structured for running imply a powerful musculature, which in turn demands high heart and lung capacity, which places constraints on diet and therefore on the digestive tract, and so on. Such a chain of implication or inference could be followed exhaustively to map out an entire animal from just a few of its parts. Indeed, Cuvier had practical success with reconstructing complete animals from incomplete fossilised specimens recovered from sites around the Paris basin.

This inference of anatomical form and physiological function that Cuvier performed recalls an analogue in modern computer science. Type inference is the determination of what kind of information a variable holds, given examples of its usage. For example, if variable x is assigned value 42, then it can be inferred that x is some type of number, either integer or real, but certainly not a true-or-false value or a set. Again, if y is assigned value false, then y must be a true-or-false value. Now suppose z is assigned the value of y at some point in the program. Then the type of z must be the same as the type of y, namely, true-or-false. This is a simple example of the same style of following chains of inferences that Cuvier used in his reconstructions by Correlation of Parts. Type inference is used in state-of-the-art editors to fill in the declarations of variables automatically at the same time as the programmer is typing the statements of a program into the editor.

The Mechanics of Society

It was inevitable that the wave of systematisation triggered by Newton should eventually bottom out with the ultimate attempt at mathematisation: the reduction of human judgement to numerical methods. The study of statistics arose in the nineteenth century from mathematical probability, which aimed to be the mathematical science of the reasonable person. Adolphe Quetelet was a Belgian mathematician, educated in France, who aimed to be the Newton of a study that he termed "mécanique sociale"--social mechanics--and later, adopting the terminology of Auguste Comte, the founder of scientific positivism, "physique sociale"--social physics. He had got approval from the government of the Netherlands for an astronomical observatory in Brussels, and he travelled to France in 1823 to study astronomy from the masters. He learned of Laplace's error analysis, which was used to correct for errors in astronomical measurements. He became convinced that what Poisson had dubbed the "Law of Large Numbers" could be applied not only to the study of games of chance and to the observational errors in positional astronomy, but also to social systems. The Law of Large Numbers, today familiar to every beginning student of probability, is the observation that as the number of repetitions of a probabilistic event increases, the totals of the observed outcomes generally approach more and more closely the respective probabilities of those outcomes. So that if a fair coin is flipped three times, it would not be surprising if it always landed heads-up. But if that same coin were flipped three hundred times, we should doubt its fairness if the number of heads were more than, say, one hundred seventy. Each separate coin-flip has a fixed probability, and other than that its particular outcome is unpredictable. But given a large number of coin-flips, one can make reasonably confident predictions.

It was during this time that increasingly bureaucratic governments were making available statistical data such as census records. These were eagerly devoured and helped to establish the new science of statistics. The word "statistics" originally meant the study of political facts and figures; it was only later appropriated to designate a field of mathematics. All sorts of statistical and probabilistic analyses were performed, many of them patently fallacious. Because of its novelty and incomplete mathematisation, statistics was an ideal "scientific" crutch for pet arguments, something to lend them credibility. For example, the Englishman John Arbuthnot founded on probability his argument in support of divine providence. He pointed to a 18/17 ratio of male births to female births and asserted that the only explanation for a lack of any nontrivial year-to-year fluctuation in this ratio was divine action. The slight excess of male births, he held, was evidence that God had provided a compensation for the deaths of men in wars and other hazardous occupations. He was completely ignoring the Law of Large Numbers, and one of the Bernoullis pointed out that the same 18/17 ratio could be obtained by rolling a 35-sided die--there was no need to bring God into it explicitly. Quetelet and his contemporaries were struck by large-scale order in systems that seemed founded on chaotic actions. Vital statistics, occurrence of crime, and even the number of dead letters in the Paris postal system displayed clockwork regularity.

Quetelet, along with some political reformers of the time, saw crime as a manifestation of the unhealthiness of society. Because crime seems a deterministic phenomenon when analysed on the large scale in terms of statistics, they asserted that the free will of the criminal was merely an accidental cause. The real sickness lay within the society of which criminals were parts. The actions of individual criminals were unpredictable, just as the result of any particular coin-toss cannot be known in advance, but the total of all crimes was an indicator of the state of the social system of which the individual criminals were members. Quetelet consciously formed an analogy between social systems and physical systems. Societies experienced restoring forces which tended to maintain the status quo, and perturbational forces which caused unrest. A scientifically organised government would plot its trajectory, attempting to predict the magnitudes and directions of perturbational forces and to apply balancing forces to correct for them. Quetelet's social analogue for the physical centre of mass was l'homme moyen--the hypothetical "average man" of a society, around whom the variations of all its people centred. Societies had stable and unstable equilibria, and possessed quantities of inertia. They could undergo gradual, smooth change or the sharp modifications of revolution. Quetelet was acutely aware of this possibility; his observatory at Brussels had been turned into a fortress in the Belgian revolution of 1830. "Quel sera l'autre Newton qui exposera les lois de cette autre mécanique céleste?"[15]--Who will be the other Newton who will expose the laws of this other celestial mechanics, asked Quetelet. But unlike the celestial mechanics illuminated by Newton and Laplace, the mechanics of society dealt with some qualities and tendencies so nebulously characterised that objective quantification was impossible. Quetelet was aware of this:

Certaines qualités morales sont à peu près dans le même cas que les qualités physiques; et l'on peut les apprécier, en admettant qu'elles sont proportionelles aux effets qu'elles produisent: ainsi l'on ne ferait pas difficulté de dire qu'un ouvrier a deux ou trois fois plus d'activité qu'un autre, si, toutes choses égales d'ailleurs, il fait chacque jour un travail double ou triple du travail fait par cet autre ouvrier. Ici, les effets sont purement physiques, comme l'état du compression du ressort quand il s'agissait de l'estimation des forces; nous ne faisons qu'admettre l'hypothèse que les causes sont proportionelles aux effets produits par elles. Mais, dans un grand nombre de cas, cette appréciation devient impraticable. Quand l'activité de l'homme se répand sur des travaux immatériels, par exemple, quelle sera notre mesure, lors même que des ouvrages tels que des livres, des statues ou des tableaux seront produits, car comment apprécier les recherches et les méditations qu'ils auront nécessitées? Le nombre des ouvrages pourrait tout au plus donner une idée de la fécondité d'un auteur, comme le nombre des enfants mis au monde fait connaître la fécondité d'une femme, c'est-à-dire en n'ayant aucun égard à la valeur de l'oeuvre produite.[16]

This inherent lack of objectivity prevented the complete realisation of Quetelet's ambition for a mathematics of society. How does one measure the badness of a crime? What method can there be of measuring the magnitude and the direction of the perturbational force exerted on French society by Napoleon? The ambitions of the statisticians of the nineteenth century left their mark upon learning in the form of the disciplines now known as the social sciences: economics, psychology, political science, sociology, anthropology, and so forth.

In 1951, Isaac Asimov published perhaps the most well-known piece of popular fiction founded upon an extension of the mathematics of society, Foundation. This science fiction novel centred on the fictional mathematical science of "psychohistory", whereby historical trends and forces are completely mathematised and these mathematical constructions are used to predict trends and major events of the future. One may say that the predictive power of psychohistory has an even wider scope than that of economics, and at the same time greater objectivity and accuracy. (This is the novel's essential fantasy.) Although Foundation suffers from defects typical of science fiction, such as contrived explanatory dialogue, preoccupation with technological extrapolations and "scientific" language, and often shallow characterisation, its continuing popularity distinguishes it in twentieth-century literature. It is an irony of contemporary education that most of the people who read it know nothing of its roots in the nineteenth-century mechanics of society.

The early social scientists' view of social systems is similar to that of modern computer scientists on computing systems. An especially striking parallel lies in the theory of computer operating systems. A real-world computer is a finite piece of hardware with limited numbers of input/output channels, storage devices, and processors. Ideally each process would run in its own world. In fact, one function of an operating system is to create a "virtual machine" for each process, to allow it to make believe that it has its own world and to minimise the infringement of the outside world. Each process would like to have unlimited resources for its exclusive use, and never to be delayed, preempted, or otherwise interfered with. But there are never enough resources to go around. One process's freedom must end where another's begins. Operating systems are built around basic principles of how resources should be allocated, and whose demands should be satisfied first. These principles are selected so as to maximise efficiency in the environment served by the operating system. An operating system can expect coöperation from its processes, because it is to each process's benefit to adhere to the spirit of the operating system's policies on allocating its resources. Thus, in the ideal computer system, processes recognise that the operating system exists for the common good and organise their work to minimise pressure on the operating system. The system's administrative tasks are light and there is little waste. One characteristic shared by all computer operating systems is the axiom of finite progress: if a process wants to do something, then eventually, perhaps after some finite waiting time, it will be allowed to proceed.

A nation is a finite piece of land with limited means of production and natural and human resources. Ideally each individual in the nation would have absolute freedom to organise their world in the way that would please them. But there are never enough resources to go around. One individual's freedom must end where another's begins. Governments are founded on basic values and beliefs about how nations should be run, and how strife and contention among their peoples should be dealt with. Governments cannot expect full coöperation from individuals, because individuals tend to act on a personal level for the good of themselves and their families and friends. This kind of algorithm has a name in computer science. It is known as the greedy algorithm, for obvious reasons. Although there are some problems for which the simple greedy algorithm is efficient, in many cases greater efficiency can be obtained by organising tasks on a higher level. This higher-level organisation, akin to that of a government bureaucracy, reduces or eliminates needless computation and duplication of efforts.

In his 1888 novel Looking Backward, Edward Bellamy decried the inefficiency of the greedy algorithm that he saw operating in his society. Writing in an era of trusts and robber-barons, he pointed to the waste of human effort caused by the redundancies of competition. Looking Backward is an exposition of Bellamy's (improbable) socialist utopia of the year 2000, framed in the story of a Boston gentleman who falls into a mesmeric trance in 1887 and is discovered and revived by citizens of the utopia in 2000. (It is regrettable that Bellamy chose the pseudo-science of mesmerism as a device for effecting the voyage through time. In any case, the mechanism is incidental.) Bellamy's utopia is the extreme case of what he saw happening in his time. The formation of trusts has been taken to its conclusion, and all production in the United States has been consolidated under one trust, the government, which is administered benevolently. The waste and duplication that come with competition have been eliminated. Looking Backward was one of the most popular novels of its time. "Bellamy clubs" sprang up around the nation. And yet the inertia of society seems to have got the better of Bellamy's views. Bellamy was merely one of Quetelet's minor perturbing forces.

If societies could be analysed statistically, then so could social groups. This was the basis of Francis Galton's programme of eugenics. Galton and the eugenicists originated the now familiar phrase "nature versus nurture", and they were firmly on the side of nature. Since they were dealing with groups and not with individuals, they believed that environmental influences could be ignored as random perturbations that would not much affect the mean type that represented a group. They thought that there was sufficient mobility, at least in English society, so that innate ability or inability would manifest itself despite environmental handicaps or aids. Social and ethnic groups were characterised by their observed tendencies toward crime and other "undesirable" behaviours. Controlled breeding to eradicate these tendencies was advocated.

Galton was anti-religious, and his eugenics programme, like Comte's positivism, aimed to dismantle religion and put a "scientific" substitute in its place. The incentive to morality of eternal life in heaven was to be replaced with the promise of self-perpetuation through breeding priveleges. Galton was a cousin of Charles Darwin--this itself was a piece of evidence in favour of the primacy of nature over nurture--and Darwin's Origin of Species removed the last block to Galton's dismantling of Christianity. The Argument from Design argues for the existence of God based on the near-perfect adaptation of organisms to their environments. Darwin had replaced the direct intervention of God with the mechanism of natural selection (although this scheme of selection was still possibly a divine institution), just as Bernoulli had replaced Arbuthnot's God with a 35-sided die. A decade after the publication of the Origin of Species, Galton wrote to Darwin that he felt that religion's last support had been obliterated.[17]

Galton fancied himself fighting for a New World. He took steps to ensure the continuing of his science of eugenics after his death, attempting to engrain it in the universities by endowing a chair in eugenics at the University of London. As an old man, he concluded a lecture on "Probability, the Foundation of Eugenics" with these words:

Enough has been written elsewhere by myself and others to show that whenever public opinion is strongly roused it will lead to action, however contradictory it may be to previous custom and sentiment. Considering that public opinion is guided by the sense of what best serves the interests of society as a whole, it is reasonable to expect that it will be strongly exerted in favour of Eugenics when a sufficiency of evidence shall have been collected to make the truths on which it rests plain to all. That moment has not yet arrived. Enough is already known to those who have studied the question to leave no doubt in their minds about the general results, but not enough is quantitatively known to justify legislation or other action except in extreme cases. Continued studies will be required for some time to come, and the pace must not be hurried. When the desired fullness of information shall have been acquired, then, and not till then, will be the fit moment to proclaim a `Jehad,' or Holy War against customs and prejudices that impair the physical and moral qualities of our race.[18]

The Abstraction of Abstraction: Modern Algebra

What all the personages discussed so far in this section were attempting, with varying degrees of rigour, was the reduction of a fairly well-defined field of thought to a formal system of reasoning. Hammurabi began legal codification. Lavoisier applied Condillac's idea of scientific proof being concomitant with linguistic manipulations that respect semantic rules. Cuvier introduced rules of inference, though very informal ones, to biology. Quetelet saw large-scale social regularities arising from the free choices of individuals and systematised the study of these regularities. Galton took these ideas farther in his attempt to form the applied social science of eugenics. These thinkers by no means constitute a complete array of the tendency toward systematisation; as said previously, systematisation was everywhere after Newton. They are, however, representative examples. Although each of them was working under different circumstances, each applied to his work the idea behind systemisation, that scientific inference can be reduced to a more or less mechanical procedure of applying explicit rules of inference to currently known facts.

It is remarkable that while Newton was an Englishman, all the Newtonian systematisers discussed herein were Continental. This reflects a general tendency. In Great Britain, science was a gentlemanly pursuit; the majority of those who had the leisure time and finances to devote to scientific work were gentlemen of independent means. But science and mathematics were not part of the gentleman's education; he was more concerned with the classics. As a result, most Englishmen who had the opportunity to do science were ignorant of its intellectual tools. They puttered around their laboratories and mixed things together rather haphazardly, and every now and then something interesting would happen, which they would then try to explain. As news of an English discovery would come across the Channel, the French would fit it into a theoretical framework using a precise programme of experimentation. English gentlemen-scientists often seem bumblers, but they did inject an originality into science which the Continental scientists, with the rigidity of their formal education, were unable to provide. Thus every so often an English discovery in practice provided the impetus for a new Continental development in theory.

It is fitting, then, that it was an English insight which transformed the thinking behind this wave of systematisation from intimations in the back of the mind to explicit statements. Much of the genesis of modern algebra took place at Cambridge during the first third of the nineteenth century. In 1812, George Peacock, Charles Babbage, and John Herschel, then undergraduates at Cambridge, formed the Analytical Society, an organisation concerned with the reform of mathematical notation in England. This concern with notation was a sign of things to come; we have seen this mode of thought at work already in Lavoisier during the previous century. Because of the proximity of all these scholars to each other the notions behind abstract algebra must have been common knowledge among them. Any debate about priority among them, therefore, would be pointless.[19] Although the first textbook on modern, abstract algebra was published by George Peacock in 1830, abstract algebra existed within the Cambridge community far earlier, as is revealed by some unpublished manuscripts of Babbage that predate Peacock's text.[20]

The primary insight behind modern algebra had been in the backs of mathematicians' minds for centuries. Peacock, Babbage, and others merely enunciated it. Historically, the particulars of arithmetic led to the generalisations of algebra. The only numbers available to the first arithmeticians were the positive integers, the "counting numbers". If you think back to the early period in your education during which you first learned about negative numbers, perhaps you may remember how foreign the concept seemed initially. Now that you have internalised it and use it all the time when thinking about algebra, it seems a perfectly natural extension of the integers. But return to your early childhood for the nonce and contemplate: what possible physical significance can a negative number have? If I have only two loaves of bread in my pantry, I can't remove two hundred loaves. Once I've eaten two loaves, I have no more. It seems impossible to subtract two hundred from two. How can the subtraction of a greater quantity from a lesser one have any meaning?

Again, consider a later phase of your education in which you learned about "imaginary numbers". (Perhaps you've not yet studied imaginary numbers. If you haven't, then skip these few sentences; they're not essential.) The obvious etymology of this term says a great deal about the manner in which this new type of number was perceived. Imaginary numbers were initially a means to an end. By inventing a symbol, `i', and setting it equal to the square root of -1, mathematicians were able to overcome the seeming impasse of having produced a term that made reference to an apparently nonsensical quantity. Since there is no number whose square is -1, mathematicians invented such a number, and constructed a consistent mathematical system around it. This business of extending mathematics inductively was enunciated publicly only in 1830 in Peacock's A Treatise on Algebra. He termed it the Principle of the Permanence of Equivalent Forms. What mathematicians were doing, said Peacock (along with his Cambridge colleagues), was not constructing a new type of number, but rather a new symbol. There was no need to think of the symbol `i' as representing any physical quantity, because it was introduced during the process of algebraic manipulation of symbols. It played no role in the construction of mathematical relations from given physical conditions, and it played no role in the final solution. Its appearances and disappearances in equations were strictly the result of pushing symbols around on a piece of paper. Thus algebra, as a science of symbol manipulation, should not be viewed as a generalisation of arithmetic. "Contrariwise", as Tweedledee would say, it is arithmetic that should be viewed as a particularisation of algebra. The only priority that arithmetic has over algebra is a historical one; being an apparent harmony, it was the first to be noticed by humans. Heraclitus would have been delighted at the work of the founders of modern algebra. (Actually, neither algebra nor arithmetic can assume absolute supremacy over the other. In the 1930's, Kurt Gödel observed that although algebraic symbols can represent arithmetical quantities and operations, those same symbols can themselves be numerically encoded, and thereby placed within the domain of arithmetic. So the algebra-arithmetic reduction goes both ways.)

As always, there were some who felt uncomfortable at the upsetting of the traditional view. The algebraists, they thought, had gone too far. New algebraic entities such as the negative numbers and the imaginary numbers were fine when viewed as means to an arithmetical end, but to give algebra priority over arithmetic would be to abandon reality. What this debate comes down to is the question of what constitutes science. A prominent force in nineteenth-century science was natural theology, the perception of science as an effort to find God's design in Nature. The laws of conventional algebra, as formulated by induction from arithmetic, could not be otherwise. They were the only laws possible. Arithmetic, and hence conventional algebra, must be commutative; 1+2 must be the same number as 2+1. In contrast, the view of algebra as a science of symbol manipulation rendered these laws not natural and ineluctable laws of relationships among numbers, but arbitrary laws of relationships among symbols. Conventional algebra became only one of many possible algebras, and the modelling of its laws upon the laws of arithmetic became only a human arrangement of convenience, not a Divine mandate. One of the most well-known mathematicians of the time, Sir William Rowan Hamilton, wrote of his initial reaction to Peacock's work:

When I first read that work... and indeed for a long time afterwards, it seemed to me, I own... that the author designed to reduce algebra to a mere system of symbols, and nothing more; an affair of pothooks and hangers, of black strokes upon white paper, to be made according to a fixed but arbitrary set of rules: and I refused, in my own mind, to give the high name of Science to the results of such a system; as I should, even now, think it a stretch of courtesy, however it may be allowed by custom, to speak of chess as a "science," though it may well be called a "scientific game."[21]

In the midst of all this educated debate about the construction of algebra there appeared a fair unknown who was to have a great influence on the subject. Unlike most British scientists and mathematicians of the time, George Boole was born into the lower class, the son of a shopkeeper. He struggled to provide himself with more than the usual smattering of education. At sixteen he became a school teacher's assistant, in a position much like that of Dickens's Nicholas Nickleby, though, one would assume, not as abusive. His mathematical education began with his familiarising himself with algebra in preparation for teaching it in his own school. This gave him the perspective for his discovery: it was all new to him. To the Continental mathematicians, algebra was second nature. They did not think about doing algebra; they just did it. Boole recognised that the symbols of algebra were abstractions, that they need not be thought of as representing numbers. Beginning students of algebra learn how to transform problems stated in English into mathematical language. For example: Sixteen years ago, Matthew's age was twice Rachel's. Rachel was two then. How old are Matthew and Rachel now?

Let m be Matthew's current age.
Let r be Rachel's current age.
0. m-16 = 2.(r-16) given
1. r-16 = 2 given
2. m-16 = 2.2 substitution[1, 0]
3. m-16 = 4 multiplication[2]
4. 4+16 = 4+16 reflexivity
5. (m-16)+16 = 4+16substitution[3, 4]
6. m+(-16+16) = 4+16associativity of + [5]
7. m+0 = 4+16 addition[6]
8. m+0 = 20 addition[7]
9. m = 20 identity for + [8]
10. 2+16 = 2+16 reflexivity
11. (r-16)+16 = 2+16substitution[10, 1]
12. (r-16)+16 = 18 addition[11]
13. r+(-16+16) = 18associativity of + [12]
14. r+0 = 18 addition[13]
15. r = 18 identity for + [14]
So, from statement [9], Matthew's current age is 20, and from statement [15], Rachel's current age is 18.

Phew! Note that I have chosen to gloss over the details of arithmetic by treating "addition" as a rule of inference; I could get even more picky if I wanted to, by breaking addition down into sub-operations! It has lately been the style in high school algebra texts to batter these nitty-gritty properties into the student's head. Who cares about "reflexivity"? You mean I have to write down that 4+16 = 4+16?! But it's obvious! It's a simple problem; why introduce these overly formal fetters of notation? The motivation for the nitty-gritty algebraic properties that most beginning students of algebra are now compelled to learn has filtered down into the high schools from the universities. At the introductory level it's more of a nuisance than a beautiful simplification. (At least, I remember thinking of it as nothing but a nuisance when I was first learning algebra.) The idea is that using formal rules of inference, we can abstract algebra away from the domain of arithmetic values, and speak not of one canonical algebra but of many possible algebras. Common high school algebra has the properties of reflexivity, commutativity and associativity of + and ., &c. But other algebras do not. In matrix algebra, for example, multiplication of a matrix by another matrix, symbolised by `x', is not commutative.

Note how the above example was solved. First, a scheme of abbreviation for the problem was introduced. I chose to assign the meaning "Matthew's current age" to the symbol m and the meaning "Rachel's current age" to the symbol r. Consider the following problem:

Shelley and Raymond have each spent sixteen pence on borogoves. Now Shelley has twice as much money left as does Raymond. Raymond has twopence left. How much money did Shelley and Raymond have to begin with?

Sound familiar? This problem is from a different domain, one of money instead of one of age. And yet we already have its solution! All that we have to do is give appropriate meanings to the symbols in the first example above. Let m be the amount of money that Shelley had to begin with, and let r be the amount of money that Raymond had to begin with. Then all the steps in the derivation, which concern the symbols m and r and not any particular meanings assigned to those symbols, still hold, and we can read off the answer to this particular problem using the new scheme of abbreviation: Shelley and Raymond began with twenty pence and eighteen pence, respectively. The rules of algebra can be thought of as merely rules of textual manipulation, rules that tell how to push symbols around on a piece of paper. Arithmetic happens only when one assigns arithmetical meanings to those symbols. This idea of the independence of algebraic rules and symbols from any particular interpretation was the genesis of symbolic logic. The wave of systematisation had finally come crashing back down upon mathematics, and now even mathematics was getting mathematised. The process of reasoning with language was reduced to an algebraic system.

Boole's text on formal logic, An Investigation of the Laws of Thought, has the tone of a manifesto in places:


That Language is an instrument of human reason, and not merely a medium for the expression of thought, is a truth generally admitted. It is proposed in this chapter to inquire what it is that renders Language thus subservient to the most important of our intellectual faculties. In the various steps of this inquiry we shall be led to consider the constitution of Language, considered as a system adapted to an end or purpose; to investigate its elements; to seek to determine their mutual relation and dependence; and to inquire in what manner they contribute to the attainment of the end to which, as co-ordinate parts of a system, they have respect....

Definition.--A sign is an arbitrary mark, having a fixed interpretation, and susceptible of combination with other signs in subjection to fixed laws dependent upon their mutual interpretation.

3. Let us consider the particulars involved in the above description separately.

(1.) In the first place, a sign is an arbitrary mark. It is clearly indifferent what particular word or token we associate with a given idea, provided that the association once made is permanent. The Romans expressed by the word "civitas" what we designate by the word "state." But both they and we might equally well have employed any other word to represent the same conception. Nothing, indeed, in the nature of Language would prevent us from using a mere letter in the same sense. Were this done, the laws according to which that letter would be required to be used would be essentially the same with the laws which govern the use of "civitas" in the Latin, and of "state" in the English language, so far at least as the use of those words is regulated by any general principles common to all languages alike.[22]


The first few chapters of The Laws of Thought are well worth reading, although Boole's highbrow prose style may be somewhat daunting. It no doubt arises from his early study of the classics. Boole was addressing himself to a community of thinkers who were already in the habit of reasoning formally using English, so what he accentuated was the correspondance between their informal ideas of logical argument and his algebraic system. Therefore, Boole's examples make a good introduction to formal logic for a student who has been in the habit of thinking about mathematics informally for some time. We shall not give Boole's presentation of the laws of formal reasoning here, because his system, being the first of its kind, appears rather naïve to modern readers.

The Propositional Calculus

When we reason formally, we are employing the notion of incompatibility of statements. Two statements are incompatible when they cannot both be true. For example, the statement "it is raining" is incompatible with the statement "the sidewalk is dry", so if we are told that it is raining we can infer that the sidewalk is not dry. In "natural languages", the languages used by humans, words and their associated syntaxes such as "not", "and", "or", and "if... then..." are employed to express compatibility and incompatibility. These modes of expression can be formalised to yield a basic system of formal logic known as the propositional calculus. Statements are termed "propositions", because they are proposed for consideration, and may turn out to be true or false. Unlike arithmetic, which deals with an infinite universe of values, the propositional calculus concerns only these two values true and false. These are the two truth values. This system is an idealisation of the universe that we deal with when we reason. When we reason, we think not in the binary, black-and-white distinction of true or false, but rather with levels of certainty or confidence, shades of grey. In the propositional calculus, every proposition has a truth value of either true or false. The objective of derivations or proofs in the propositional calculus is to ascertain these truth values. This is a contrived universe, to be sure, without the richness of expression, ambiguities, and power of suggestion characteristic of natural languages. But it is precisely this constraint to rigid formality that makes the propositional calculus such a valuable tool for constructing and analysing logical arguments. It trades expressive power for tractability.

The assertion made when one uses the adverb "not" can be viewed as a statement of the incompatibility of a proposition with itself. To say that some proposition P is incompatible with itself is just a fancy way of saying that P is false, for if P cannot be true at the same time as itself, then it simply cannot be true. It may seem counterproductive to go to the trouble of saying "P is incompatible with P" when all that we want to say is "it is not the case that P", but the motivation here is indeed toward simplicity: we are reducing the many English words that express combinations of compatibilities and incompatibilities to the single, simple concept of incompatibility, and we must fit the word "not" into this paradigm.

Any proposition of the form "P or Q" is true when either or both of P and Q is true, and false only when both are false. Thus it asserts the incompatibility of not-P with not-Q. We have already seen that not-P is equivalent to "P is incompatible with P", and similarly for not-Q, so "P or Q" says that P being incompatible with itself is incompatible with Q being incompatible with itself. Again, this form of expression may seem convoluted, but it is necessary if we are to inquire as to what we mean when we use words such as "not" and "and". So, for example, "it is raining or the sidewalk is dry" says that an absence of rain is incompatible with the sidewalk's not being dry.

Propositions of the form "P and Q" are true only when both P is true and Q is true. This is another way of saying "it is not the case that not-P or not-Q". So if it is raining and the sidewalk is wet, it cannot be the case that neither is it raining nor is the sidewalk wet. This equivalence may seem a triviality, a mere play with words that need not be stated explicitly. The English language makes it seem so. But one must remember that we are treating a system much more basic than any language as complex as English. Thus any statement using "and" may be converted to an equivalent statement using "not" and "or". Since we know how to reduce "not" and "or" to basic relations of incompatibilities, we can therefore reduce "and" to such basic relations. A simpler way of finding an expression for "and" in terms of incompatibilities is to note that "P is incompatible with Q" is false exactly when "P and Q" is true, namely, when both P is true and Q is true. So an "and" is the negation (the "not", to use the word that we have been using) of incompatibility.

Another reduction can be applied to implications, that is, "if... then..." statements. For any propositions P, Q, "if P then Q" has the same meaning as "not-P or Q". Such a statement is false only when its "if" part is true and its "then" part is false, for what it asserts is that whenever the "if" part is true, the "then" part must be true. It makes no claim about the truth of the "then" part when the "if" part is false. Thus, when P is false, "if P then Q" is true, even when Q is false. The only situation in which an implication is false is when its "if" part is true and its "then" part is false. If I can't get you to believe that, I'll eat my hat.

Having properly noted how all statements in the propositional calculus can be expressed in terms of incompatibilities, we can now introduce some higher-level symbols that mimic English usage to some extent and therefore will make it easier to translate back and forth between English and the language of the propositional calculus.

Propositions are built from simple propositions and connectives. A simple proposition is a letter that stands for some assertion. For example, one can say, "Let P represent `The Rabbit will be late'" in a scheme of abbreviation. Then the symbol `P' is a simple proposition. Simple propositions are also called atomic propositions, or simply atoms of the language of the propositional calculus, since they cannot be divided into parts. When we begin using formal logic to study mathematical assertions, we will augment the language of the propositional calculus by allowing algebraic statements as simple propositions. (For example, `((x=42)(x=54))' would be a legal proposition.) Compound propositions can be built from other compound propositions and atoms using the connectives. Here is a table of connectives and their meanings. Incompatibility is denoted by the operator `/', which can be read as "is incompatible with".
Symbol English Equivalent Example In Terms of Incompatibilities Conditions for Truth
and (PQ) ((P/Q)/(P/Q)) both P and Q are true
or (PQ) ((P/P)/(Q/Q)) one or both of P, Q is true
~ not (~P) (P/P) P is false
=> if... then... (P=>Q) (P/(Q/Q)) Q is true or P is false
<=> if and only if (P<=>Q) (((P/(Q/Q))/(Q/(P/P))) / ((P/(Q/Q))/(Q/(P/P)))) P, Q are both true or both false
This table summarises the syntax and semantics of the propositional calculus. A language's syntax is the way in which its symbols are put together. When you studied parts of speech and sentence diagramming and so forth in school, you were studying the syntax of English. Semantics are rules that say which syntactically correct sentences have meaning, and what meanings they have. For example, "The is half-constructed beard policeman's" is not syntactically correct; its parts of speech are in the wrong places. "The policeman's beard is half-constructed" is syntactically correct, but its semantics are questionable. "The policeman's beard is cut closely" and "The new building is half-constructed" are correct both syntactically and semantically. In the table above, the "Example" column gives the syntax of the connectives, and the "Conditions for Truth" column gives their semantics. Let P represent "the Hatter is mad", let Q represent "the March Hare is mad", and let R represent "it is six o'clock". Then the following symbolic sentences have the following meanings:
Proposition Meaning
(PQ) Both the Hatter and the March Hare are mad.
(PQ) At least one of the Hatter and the March Hare is mad.
(~(P)) The Hatter is not mad.
(P=>Q) If the Hatter is mad, then the March Hare is mad.
(P<=>Q) The Hatter is mad if and only if the March Hare is mad.
((PQ)=>R) If both the Hatter and the March Hare are mad, then it is six o'clock.
Let us consider the connectives one-by-one. But before reading further, use the table to verify for yourself all the relationships that have been stated so far.

`' is fairly intuitive. Like the English word "and" when used to connect two clauses, it is a conjunction that goes between the clauses and asserts that both of them are true. `', similarly, behaves like the English word "or". It is a mistake to treat `' and `' as fully equivalent to the English words "and" and "or", respectively. Each of the two conjuncts (propositions joined by an `') or disjuncts (propositions joined by an `') must be propositions. Thus, the English sentence "x equals forty-two or fifty-four" cannot be translated as `(x=4254)', because that is not a well-formed sentence in the language of the propositional calculus; "54" by itself is not a sentence. One must instead write `((x=42)(x=54))'. The logical operators `' and `' in the are somewhat similar to the arithmetic operators `.' and `+', respectively. See for yourself that `' distributes over `' just as `.' distributes over `+': (P(QR)) = ((PQ)(PR)). Also, both `' and `' are commutative: (PQ) = (QP), and (PQ) = (QP). Finally, the identity elements for `' and for `' are, respectively, true and false, as the identity elements for `.' and `+' are, respectively, 1 and 0: Ptrue = P, and Pfalse = P.

`~', when placed before a sentence, has the effect of prefixing the English equivalent of the sentence with "It is not the case that". So, for example, using the above scheme of abbreviation, `~(P)' means "It is not the case that the Hatter is mad". Of course, with most propositions, a less cumbersome way of expressing the meaning in English is usually apparent--in this case, "The Hatter is not mad".

`=>' is read as "if... then..." or "implies". The proposition before the arrow (the "if" part) is called the antecedent; the proposition after the arrow is called the consequent. A common misunderstanding of beginning students of formal logic is attributing a causal meaning to statements of implication. Implication is not the same as causation. For example, suppose the sun has just exploded. And suppose, being horribly and irrationally afraid for my person, I have shut myself up inside a heavily shielded room, so that the only event that would be capable of harming me is the explosion of the sun. Suppose also that my teddy bear is in the room with me. The earth being about eight light-minutes from the sun, if, after about eight light-minutes, my teddy bear has ceased to exist, then I will also have ceased to exist. So if we let P represent "my teddy bear has ceased to exist" and let Q represent "I have ceased to exist", P=>Q. But there is no causality in this relationship; my teddy bear didn't kill me, the exploding sun did. All that we are saying is that if my teddy bear has ceased to exist, then it is certain that I have ceased to exist, because the only event that could have destroyed my teddy bear must also have destroyed me.

`<=>', or equivalence, is a shorthand for a double implication. Two propositions are equivalent if they have the same truth value for all possible combinations of truth values of their constituent atoms. For all propositions P, Q, `(P<=>Q)' means `((P=>Q)(Q=>P))'. See for yourself that this is what is meant by the English expression "if and only if".

The connectives, together with parentheses and the atoms P, Q, ..., constitute the alphabet of the propositional calculus. Just as one can make much more nonsense than words with the alphabet of English, one can make nonsense using the alphabet of the propositional calculus. In English it is possible to violate rules of syntax and semantics in a suggestive manner and thus to give nonsense a sort of meaning. The propositional calculus is too restricted a language to be able to tolerate any nonsense. So, for example `PQ)((R~' is meaningless in the propositional calculus. A string of symbols of the alphabet of the propositional calculus is well-formed if and only if it is meaningful according to the syntactic rules of the propositional calculus. Any well-formed expression can be broken down into a syntax tree according to the syntactic rules that govern its construction. The root node of the tree represents the entire expression, and its children represent the sub-expressions. Similarly, each child has its own children representing its own sub-expressions. Atoms, having no sub-expressions, are represented by leaves in the syntax tree, which have no children. Evaluating a compound expression can be thought of as the process of labelling the sub-trees of its syntax tree with truth values. One is given the truth values of the atoms, so the leaves can be labelled immediately. When all the children of some node in the tree have been labelled, their parent can be labelled, using the semantic rules of the propositional calculus. Using this procedure, one can build up truth values from the leaves to the root. Computer scientists sometimes refer to these labels as attributes of the tree, and to the labelling process as tree attribution or tree decoration.

Each of the connectives has a precedence or binding strength. Precedences are used to obviate the need for parentheses in many cases. In high school algebra, multiplication and division have a higher precedence than addition and subtraction, so xy+z has the same meaning as (xy)+z, which is different from x(y+z). Similarly, in the propositional calculus the order of precedences, from highest to lowest, is ~, and , => and <=>. Here are some propositions with parentheses removed, and the corresponding propositions with parentheses:

PQ=>R ((PQ)=>R)
~QR ((~Q)R)
~PR<=>Q (((~P)R)<=>Q)

EXERCISES

In exercises 0 to 9, use the following scheme of abbreviation:
B: The butler did it.
M: The maid did it.
D: The dog did it.
Express each of the following propositions in the language of the propositional calculus:

Express each of the following propositions in English: For exercises 10 to 19, evaluate (i.e., find the truth value of) each of the propositions in the previous exercises for all possible combinations of truth values of B, M, and D. (There are eight possible combinations.)

Proofs in the Propositional Calculus

So far we have been using the propositional calculus only to represent propositions. We can also use it to derive propositions, using inference rules. That is, we can prove theorems in the propositional calculus. An inference rule says that whenever propositions of a certain form are known to be true, then a proposition of another certain form must also be true. Each connective has a use rule and a proof rule. Use rules allow the decomposition of compound propositions into simpler propositions, and proof rules allow building up compound propositions from simpler propositions. When you are writing derivations, you will use a use rule when you have a term bound up in some complex proposition that you want to extract and use. You will use a proof rule when you have derived elements of a compound proposition and want to construct the proposition from them. Here is a table of the inference rules:
~=><=>false
proof: X Y
XY
Y
XY YX
X=>false
~X
[assume X
...
Y]
X=>Y
X=>Y Y=>X
X<=>Y
X ~X
false
use: XY
X Y
XY X=>Z Y=>Z
Z
~~X
X
X=>Y X
Y
X<=>Y
X=>Y Y=>X
false
X
The horizontal bar in each inference rule means "therefore". Whenever we have all the assertions of the forms above the bar, we can derive an assertion of one of the forms below the bar. Examine these inference rules and see that they correspond with an intuitive idea of logical deduction. Derivations are annotated in a style similar to that of the conventional two-column proofs in geometry. At the beginning we set out what we want to prove. Each line is referred to by its number, and each annotation in the right-hand column tells what other lines and which inference rule or theorem were used to derive the assertion on the same line in the left-hand column.

The =>-proof rule is the most complex in terms of notation. Here is a very short sample derivation using the =>-proof rule:
Show PQ=>P
0.[assume PQ Show P
1.P] -use, 0
2.PQ=>P =>-proof, 0..1
The idea of =>-proof is to take the lazy way out: all that we have to show is that the consequent is true whenever the antecedent is true. We needn't prove anything about the consequent when the antecedent is false. So we assume that the antecedent is true, and, given that assumption, try to prove that the consequent is true. The lines inside the brackets are called a fantasy. A fantasy is a sequence of lines beginning with an assumption for =>-proof. The "assume" line of the fantasy is annotated with the consequent that we are trying to prove. (See line 0 in the proof above.) When we have established the consequent inside the fantasy, we end the fantasy and then write the implication. Fantasies may be nested, as in the following proof:

Show Q=>(P=>Q)

0. [assume Q              Show P=>Q
1.    [assume P           Show Q
2.       Q]               0
3.    P=>Q]               =>-proof, 1..2
4. Q=>(P=>Q)              =>-proof, 0..3
Confusing fantasy and reality is a fallacy. After a fantasy has been closed, nothing within it can ever be referenced in the derivation. Assertions inside a fantasy may depend upon the fantasy's assumption, which does not hold outside the fantasy. The assumption was made only to establish the consequent of an implication when the antecedent is true. Once the implication has been proved, that assumption and everything that depends upon it must be discarded. Consider the following fallacious attempt at a proof:

P: I have a hammer.
Q: I hammer in the morning.
R: I hammer in the evening.
S: The moon is made of green cheese.
Givens:
0. P=>Q If I have a hammer, then I hammer in the morning.
1. Q=>R If I hammer in the morning, then I hammer in the evening.
2. R=>S If I hammer in the evening, then the moon is made of green cheese.

Show S (This can't be done!)

0. [assume P       Show S
1.    Q            =>-use, given 0, 0
2.    R            =>-use, given 1, 1
3.    S ]          =>-use, given 2, 2
4. P=>S            =>-proof, 0..3
5. S               =>-proof, 0, 4 WRONG!
This proof is a legitimate derivation of "If I have a hammer, then the moon is made of green cheese" entwined with the fallacious derivation that the moon is made of green cheese. (Looks like I don't have a hammer.) The final line incorrectly concludes S by referencing the assumption in line 0, which is no longer in effect.

Be careful with assumptions. Don't just assume P when you want to show P. Such a freedom of assumption would circumvent the need for proof. Remember, an assumption introduces a fantasy, and outside the fantasy the assumption is invalid. So such a frivolous assumption would be of no use to you. If all else fails, you can always use the method of proof by contradiction. In this set of inference rules, it's called "~-proof". The same method is used in proving geometry theorems. To prove that an assertion is true, one can assume that it is false (in other words, assume its negation) and then show that such an assumption leads to a contradiction. Don't confuse ~-proof with false-use! This mistake is common among beginners with this system of notation, so be sure that you understand the difference between these two rules. ~-proof involves terminating a fantasy, whereas false-use does not. Thus, these two rules have very different effects upon the structure of a proof.

The rules that involve the special proposition false may seem strange. Whenever we write a proposition on a line within a proof, we are asserting that that proposition is true (within whatever fantasy we may currently be in). So when we write false, we are asserting that false is true. (War is peace! Freedom is slavery! The moon is made of green cheese!) Obviously, false can never be true. When we find that we are able to write false by using the inference rules, therefore, we must have made an assumption that has caused a contradiction. This is how the three inference rules that use false are set up. false can be proven from any two contradictory propositions. false can be used to derive any proposition at all, since, if there is a contradiction, any proposition can be asserted. When false occurs inside a fantasy, the assumption upon which the fantasy is founded must be contradictory, and therefore the fantasy can be closed and its assumption negated, using ~-proof.

As in the examples here, each time you enter a fantasy, annotate the proof with what you are currently trying to show. Remember, in an =>-proof fantasy, one assumes the antecedent and tries to prove only the consequent. This sub-proof is simpler than the main proof in which it is embedded. Within the fantasy, think not of what you are attempting to establish at the conclusion of the entire proof, but only of what you are attempting to establish at the conclusion of this sub-proof. When you are within such a sub-proof, do not let your eyes wander up to previous annotations, because then you may become confused as to what you are attempting to show.

The rest of the inference rules can be best understood by seeing them at work in derivations. You should examine some examples and work through some yourself. Some examples and exercises are provided here, and more can be found in any elementary text on logic, in slightly varying notations.

Show (P=>Q)<=>(~Q=>~P) (contrapositive)

0.  [assume P=>Q             Show ~Q=>~P
1.     [assume ~Q            Show ~P
2.        [assume P          Show false
3.           Q               =>-use, 0, 2
4.           false ]        false-proof, 3, 1
5.        P=>false          =>-proof, 2..4
6.        ~P ]               ~-proof, 5
7.     ~Q=>~P ]              =>-proof, 1..6
8.  (P=>Q)=>(~Q=>~P)         =>-proof, 0..7
9.  [assume ~Q=>~P           Show P=>Q
10.    [assume P             Show Q
11.       [assume ~Q         Show false
12.          ~P              =>-use, 9, 11
13.          false]         false-proof, 10, 12
14.       ~Q=>false         =>-proof, 11.13
15.       ~~Q                ~-proof, 14
16.       Q ]                ~-use, 15
17.    P=>Q ]                =>-proof, 10..16
18. (~Q=>~P)=>(P=>Q)         =>-proof, 9..17
19. (P=>Q)<=>(~Q=>~P)       <=>-proof, 8, 18

Show ~(PQ)<=>~P~Q (DeMorgan's -Law)

0. [assume ~(PQ) Show ~P~Q

1. [assume P Show false

2. PQ -proof, 1

3. false ] false-proof, 2, 0

4. P=>false =>-proof, 1..3

5. ~P ~-proof, 4

6. [assume Q Show false

7. PQ -proof, 6

8. false ] false-proof, 7, 0

9. Q=>false =>-proof, 6..8

10. ~Q ~-proof, 9

11. ~P~Q ] -proof, 5, 10

12. ~(PQ)=>~P~Q =>-proof, 0..11

13. [assume ~P~Q Show ~(PQ)

14. ~P -use, 13

15. ~Q -use, 13

16. [assume PQ Show false

17. [assume P Show false

18. false ] false-proof, 17, 14

19. P=>false =>-proof, 17..18

20. [assume Q Show false

21. false ] false-proof, 20, 15

22. Q=>false =>-proof, 20..21

23. false ] -use, 16, 19, 22

24. PQ=>false =>-proof, 16..23

25. ~(PQ) ] ~-proof, 24

26. ~P~Q=>~(PQ) =>-proof, 13..25

27. ~(PQ)<=>~P~Q <=>-proof, 12, 26

EXERCISES

Prove the following theorems:

Program Correctness and the Predicate Calculus

Why do we need logic? Formal logic constricts one's thinking, fetters one's creativity. But formal logic constricts one's thinking to provable statements, eliminates uncertain reasoning. The argument for formal logic is that of Mister Spock against Doctor McCoy: certainly we need logical, analytical thinking, as much as we need its counterpart. If we restrict our reasoning to formal mathematics but still allow our intuitions to jump outside of the confines of formal mathematics, then, as Wordsworth said, "the prison unto which we doom / Ourselves no prison is." Therefore one should not reject formal logic as dry and boring; it has its place.

My first computer had a forty-column, all-uppercase display and came with a whopping sixteen kilobytes of memory. Even by current standards, less than ten years after it was produced, this machine is antiquated. But it served my purposes. Like most students of programming, I learned haphazardly. I read the manuals that came with the machine, hacked in BASIC, eventually learned the assembly language for its microprocessor. I became somewhat renowned locally, and people began to use my programs. This went on for years, all through high school, but all the time I was ignorant. I had no idea of what was behind my programming.

When I wrote a program, I would feel it out by intuition, writing lines of code as I went along. When I was done I would test it on what I thought were some representative examples. Often, for any reasonably complex program, I would find bugs, and then I would go back into the code and fix them. I worked not so much by pure reasoning as by observation to make my programs correct. Usually, by the time I finished debugging, my program was as correct as it would have been had I reasoned it out logically instead of hacking it, though usually not as clear. But there were some exceptions. When I was in junior high school a national (but fairly insignificant) computer periodical printed a short program that I had written to renumber lines of programs written in BASIC. To be brief, it didn't work. I ended up sending the magazine a letter of retraction. Of course, they never again accepted anything from me. I had not reasoned it out, and I had not tested it thoroughly. The design of test cases is always haphazard; the only way to be sure that a set of test cases is truly representative of all possible inputs to a program is to analyze the program logically. And, if one finds it necessary to do that, why not just use logic to help one design the program correctly in the first place?!

There are many examples of situations in which the use of logic and the careful consideration of the assumptions upon which one's program depends would have prevented or corrected significant programming errors. Galaga, a popular arcade game, is structured as a progression of "levels". On each level, the player must defend against attackers that enter the screen in a particular pattern. Upon completion of a level, the level number is incremented and play proceeds to the level indicated by this number. So a game begins on level 1 and proceeds in order through level 2, level 3, and on up, until all the player's spaceships have been destroyed by attackers. However, the programmer of Galaga neglected to note that this program could store only level numbers from 0 to 255. If you know a little about computer architecture, you see why: this variable was being stored in a byte, which is eight binary (i.e., base 2) digits wide. Just as the odometer on a car can record only mileages from 0 to 99 999, a byte can store only numbers in binary from 0 to 11111111, which are 0 to 255 in decimal. The effect of this is that when a player completes level 255, the level number flips to 0. Since 0 is not a legal level number, the machine cannot find a pattern for it, and appears to become stuck in an endless loop trying to locate a nonexistent pattern. The only way out of this is to reinitialise the machine. This bug usually goes unnoticed, because most players are not proficient enough to reach level 255. I detected this bug in the summer of 1985. If the programmer had been keeping track of his assumptions, he would have realised that eight binary digits are not always sufficient and that therefore either the level number should use more binary digits or there should be a test in the program that checks whether the level number has reached 255, and, if so, performs some appropriate action, such as ending the current game and giving the player a free game with which to continue playing.

In the fall of 1988, Robert Morris, a first-year graduate student in the Department of Computer Science at Cornell University, wrote a worm[23] program that was meant to be benign and silent and only to infiltrate host machines. These machines were parts of the Internet, a conglomeration of computer networks. Had his program worked as intended, it would have been a valuable demonstration of the vulnerability of many networked computers to malicious attack. Unfortunately, Morris was not conscientious in his implementation. His code contained many bugs, the most serious of which was in a routine designed to prevent hosts from becoming clogged with multiple copies of the worm. This part of the program listened for other copies of the worm running on the same machine, and killed some of them if any were found. But in the process some copies of the worm were excluded from further killings and thus became immortal. Also, worms marked for killing did some intensive computation before terminating and thus loaded down the computer. All these properties were buried in convoluted and obtuse code, and their deleterious potential apparently went unnoticed by the author. The worm caused havoc on the Internet, causing many sites to disconnect themselves in panic and many people to spend their time attempting to prevent infestation. Morris left the university shortly afterward and has since been indicted under the Computer Fraud and Abuse Act. If he had invested time in examining the assumptions under which his program was operating and the assertions that it established, he would not have made these errors.

A program can be viewed as a sequence of operations or statements. These operations can be as simple as hardware-level machine instructions or as complex as whole subroutines. They alter the state of the machine, that is, they change the values of variables and alter the flow of control in the program. In between statements, assertions about the state of the machine can be made. If one knows the current state, and one is given a statement to be executed, one can predict the state of the machine after the execution of that statement. This is the debugging approach: look at the code, find out what's happening in the machine, and determine how what's happening differs from what you want to happen. Conversely, if one knows the current state and has specified the state that is to be established, one can construct a statement or sequence of statements that will take the machine from here to there, from the current state to the desired state. This approach is basic to what David Gries terms "the science of programming". If these logical assertions that express the state of the machine are constructed at the same time as the statements that they describe, then the program and the proof of correctness can be developed hand-in-hand. When the programmer completes this process, they will have a finished program and a proof of its correctness, and will have spent no time on debugging. The trade-off, though, is that it takes longer to be logically meticulous about program statements than it does to blurt out the program and worry later about correctness. In situations where time is limited and bugs would not have catastrophic effects, hacking still has its virtues. However, in abstract algorithms and in programs or modules of programs that have well-defined tasks, development of proof and program hand-in-hand saves time and effort.

A logical assertion that is assumed to be true just prior to the execution of a statement is called the precondition of that statement. If the statement is executed in a context in which its precondition is true, then it establishes its postcondition. For example:

precondition: x=42

y := x

postcondition: (x=42)(y=42)

If the postcondition of one statement matches or implies the precondition of another statement, then those two statements can be chained together into one compound statement that has the precondition of the first and the postcondition of the second:

P S0 Q Q S1 R

P S0;S1 R

There are formal systems employing formal logic to prove programs correct, but the discussion of the application of formal logic to program correctness will be more informal in this book.

Before treating the use of logical assertions within programs to prove programs correct, we must extend the propositional calculus. Our extension is called the predicate calculus, and this name is descriptive. The fundamental units with which we have been working in the propositional calculus are sentences, complete propositions. The predicate calculus allows us a finer resolution. Sentences can be constructed from separate predicates and subjects. In addition, the predicate calculus allows us to write proofs about sentences with the wordings "all" and "some". These are called universally quantified and existentially quantified sentences, respectively. Predicates are templates that can be attached to a fixed number of subjects. You can think of a predicate as having a specified number of sockets, each of which can hold a subject. In order for the predicate to be applied, all the sockets must be filled by subjects. A predicate having n sockets is called an n-ary predicate. For example, here is a definition of a unary predicate:

F(x): x is my friend

`x' is underlined in the sentence in the scheme of abbreviation to show that it is a placeholder, a marker, for a socket. When a subject is substituted in, all occurrences of the corresponding placeholder are replaced by the subject. For example, consider the following scheme of abbreviation, and some propositions constructed using it:

A : Parmenides F(x) : x says that all is One

B : Zeno G(x, y) : x defends y

H(x) : x is a deconstructionist

I(x) : x is Jacques Derrida

F(A) : Parmenides says that all is One.

G(B, A) : Zeno defends Parmenides.

H(B) : Zeno is a deconstructionist.

H(B)F(A)=>I(B) : If Zeno is a deconstructionist and Parmenides says that all is One, then Zeno is Jacques Derrida.

We now introduce two new symbols, `' and `', the universal and existential quantifiers. A proposition can be quantified by preceding it with one or more quantifiers. The universal quantifier `' says that any sentence that can be constructed from the following predicate by substituting the same subject for each occurrence of the quantified variable is true. This holds for all subjects in the universe. This is why the universal quantifier is an upside-down `A', meaning "for all". The existential quantifier `' says that such a sentence is true for at least one subject in the universe, but says nothing about which particular subject it might be. This is why the existential quantifier is a backwards `E', meaning "there exists". `' and `' have the same precedence as `~', so parentheses must be used to apply them to sentences that use `', `', `=>', and `<=>'. Using the scheme of abbreviation above, we can construct a few examples. (For simplicity in using these predicates, our universe of discourse here will be restricted to the set of all people.)

xF(x) : Everyone says that all is One.

xG(x, A) : Someone defends Parmenides.

xG(A, x) : Parmenides defends someone.

x(H(x)=>F(x)) : All deconstructionists say that all is One.

x(H(x)~F(x)) : There is some deconstructionist who does not say that all is One.

xyG(x, y) : Someone defends everyone. (That is, there is some one particular person who defends everybody.)

xyG(y, x) : Everyone defends someone. (That is, there is some one particular person who is defended by everyone. Sort of like Oliver North.)

xyG(x, y) : Everyone defends someone. (The person being defended can be a different person for each defender.)

The final three examples above demonstrate the potential for ambiguity in natural languages even when describing very simple quantifications. You should be wary of such ambiguities when translating between a natural language and logical notation, and when you are thinking in a natural language.

As in the fourth and fifth examples above, `=>' is often used with `' and `' with `' to restrict the scope of a quantification. In a proposition of the form x(G(x)=>H(x)), the antecedent can be thought of as a gate or a guard. For all subjects x, if x satisfies the guard G, then it must satisfy the predicate H. However, if x does not satisfy G, then no assertion about it is made; it escapes the quantification. Similarly, in a proposition of the form x(G(x)H(x)), where G is some identifying property such as "is a deconstructionist", the application of H is restricted to those subjects that satisfy G. It is a mistake to confuse the symbols involved in these two cases. Think about the meanings of propositions of the forms x(G(x)=>H(x)) and x(G(x)H(x)), and recognise this.

A variable that occurs next to a quantifier is said to be bound within the quantified expression. Variables that are not bound within an expression are free. So, for example, in the proposition yG(x, y), x is free and y is bound. (The possibility of unbound variables will be discussed shortly.) These definitions are important in what follows. To make derivations using quantifiers, we introduce two new derivation rules and two proof techniques.

proof: F(k), where k is arbitrary F(A)

xF(x) xF(x)

use: xF(x) xF(x)

F(A) F(k), where k is arbitrary

The -use rule states that if it is known that a predicate (that is, either an atomic predicate or a compound proposition in which one or more variables are bound) is true for all subjects, then it can be derived that that predicate is true for any particular subject. This is a weakening of the original assertion, since although xF(x) => F(A), ~(F(A) => xF(x)). The -proof rule states that if it is known that a predicate is true for a particular subject, then it can be derived that that predicate is true for some (unspecified) subject. Again, this is a weakening of the assertion, since the derived proposition holds no information about which particular subject or subjects the predicate applies to. The other two rules are more complicated. The -proof rule states that if a property can be proven for an arbitrary subject, then that property holds for all subjects. What we mean by "arbitrary subject" is a subject that has no special, distinguishing properties. For if a property can be proven for a subject that is so plain-vanilla that it could be any subject, then, intuitively, that property holds for all subjects. Formally, what we mean by "arbitrary" is that the variable shall not have occurred anywhere in the proof up until the point at which it is introduced as arbitrary. For if no proposition contains the variable, then we cannot possibly have any information about it, and therefore it is arbitrary, a tabula rasa. Note that this arbitrary variable is unbound. The -use rule states that if it is known that a property F holds for some subject, then that unknown subject can be given a name, in the form of an arbitrary variable. Again, in order for this rule to be valid, the variable that is introduced by it must not have occurred before anywhere within the proof.

For the purposes of the present discussion, it is sufficient that the reader understand how to write expressions in the predicate calculus; it is not necessary to know how to construct proofs in the predicate calculus. For those who are interested, we briefly present one sample proof and a few exercises. More may be found in any text on elementary logic.

Show ~xF(x) <=> x~F(x)

0. [assume ~xF(x) Show x~F(x)

1. [assume ~x~F(x) Show false

2. [assume ~F(k) Show false

3. x~F(x) -proof, 2

4. false ] false-proof, 3, 1

5. ~F(k)=>false =>-proof, 2..4

6. ~~F(k) ~-proof, 5

7. F(k) ~-use, 6

8. xF(x) -proof, 7

9. false ] false-proof, 8, 0

10. ~x~F(x)=>false =>-proof, 1..9

11. ~~x~F(x) ~-proof, 10

12. x~F(x) ] ~-use, 11

13. ~xF(x) => x~F(x) =>-proof, 0..12

14. [assume x~F(x) Show ~xF(x)

15. ~F(y) -use, 14

16. [assume xF(x) Show false

17. F(y) -use, 15

18. false ] false-proof, 17, 15

19. xF(x)=>false =>-proof, 16..18

20. ~xF(x) ] ~-proof, 19

21. x~F(x) => ~xF(x) =>-proof, 14..20

22. ~xF(x) <=> x~F(x) <=>-proof, 13, 21

EXERCISES

0. Prove that ~xF(x) <=> x~F(x).

In exercises 1 to 5, let the universe of discourse be the set of all people, let L(x, y) denote "x loves y", and let A, B, and C be people. Express the given propositions in the language of the predicate calculus.


6. What seems strange about the proposition in exercise 5?

Proving Loops Correct

The iteration statement or loop is a common programming language construct. Being so common, it is all too often taken for granted. A logical formalisation of loops can guide the programmer in the construction of loops, and make them clearer and more efficient. We have already seen how statements accompanied by preconditions and postconditions can be concatenated by matching up the conditions, and how, given a pair of conditions, a sequence of statements can be constructed to accomplish the transition from one condition to the other. Now we will examine correctness proofs for loops.

Like all statements, the loop has a precondition and a postcondition. There are also two other conditions associated with a loop. These are the invariant and the guard. The invariant is an assertion in terms of the loop variables that is always true at the beginning and at the end of every iteration; it is both a precondition and a postcondition of the sequence of statements that constitutes the loop body. The name is logical; the truth of the invariant does not vary from one iteration to the next. Execution of a loop begins at the beginning of the first iteration. Since the invariant must be true at the beginning and at the end of every iteration, and in particular at the beginning of the first iteration, it follows from this that the precondition of the loop must imply the invariant. The precondition is established by statements preceding the loop itself that initialise the loop's variables. The guard is so called because it decides whether or not the loop will continue on to another iteration; it guards entry to the loop body. The guard is evaluated before every iteration, and when it is false, the loop terminates. As well as these conditions, there is a decreasing bound function, a function of the loop's variables that yields an upper bound on the number of iterations yet to be executed. (An upper bound on a function f is simply a function b such that b(x)>f(x) for all values of x, even if the precise value of f(x) cannot be determined.) If the bound function is decreasing on every iteration, then it must eventually reach zero. Since every non-negative function that is bounded by zero must itself be zero, this requirement ensures that the loop eventually terminates instead of executing infinitely. Invariants and bound functions may refer both to explicit variables and to implicit variables. By "implicit variables" are meant abstract variables whose values are unknown at the time that the algorithm is being defined, such as "the number of characters yet to be typed in by the user" or "the number of days between now and September 13th, 1999". This is permitted because the invariant and the bound function are abstract statements; a Pascal compiler, for example, does not need to know about them, and therefore they need not be phrased explicitly. Here is the general form of a loop, with the assertions labelled with letters and appearing in the locations in which they must hold:

...initialiasation...

precondition: Q

do B ->

invariant and guard: PB

...loop body...

invariant: P

od

postcondition: R

Here is the same general form of a loop, expressed in the Pascal programming language:

...initialisation...

precondition: Q

while B do

begin

invariant and guard: PB

...loop body...

invariant: P

end

postcondition: R

Note that the assertions that do not constitute part of the language's syntax (i.e., all except the guard, in the Pascal language) must be expressed as comments. When annotating programs, each condition is written only once, in order to save space and to eliminate confusion. So a fully annotated loop in Pascal has the following form:

...initialisation...

precondition: Q

invariant: P

bound: b

while B do

begin

...loop body...

end

postcondition: R

Often, the precondition and the postcondition will be obvious from the invariant. In such situations, they may be omitted. Some loops are so trivial that they need not be annotated at all. A loop that initialises an array by reading in all its elements, for example, is trivial. There are two main considerations in deciding what amount of annotation should be used for any given loop: The loop should be made logically understandable, but, at the same time, the annotation should not clutter the scene so much that it becomes confusing. The more you program, the better your sense of balance between these two forces will become.

Note that the precondition of the loop body must be (implied by) the conjunction of the invariant and the guard, since these are the conditions that are true at the beginning of every iteration. Similarly, the invariant must be the postcondition of the loop body, because it is true at the end of every iteration. A loop is correct if and only if the following relationships hold:

The best approach is not to write the entire loop and only then to verify that these conditions hold, but to develop the conditions and the relationships among them as the loop is being conceived. As one begins to write the loop, the postcondition and some form of precondition are already fixed; they are dictated by the problem definition. The first step is to develop the invariant. This should be accomplished by weakening the postcondition. We know that we want the invariant to imply the postcondition when combined with the negation of the guard, so it is reasonable to form the invariant by weakening that postcondition. Thus the invariant can be viewed as a compromise between the precondition and the postcondition. Gries[24] describes four general methods for weakening a postcondition. These include replacing a constant in the postcondition by a loop variable or a function of the loop variables, enlarging the range of a variable from the range specified in the postcondition, deleting a conjunct from the postcondition, and combination methods. Since the precondition of the loop must imply the invariant, a loop initialisation should be constructed that sets up the loop variables so that the invariant is established by the time the loop is entered.

A loop solves a problem by taking small bites out of it until the whole problem has been digested. We can make this case literal with our first example. Suppose that one is faced with the problem of eating a hamburger that can be divided into NumBites separate bites, numbered from 0 to NumBites-1. We are given the operation TakeBite(k), which takes the kth bite of the burger. We want to write a loop whose precondition is that the whole burger is sitting on the plate and whose postcondition is that the whole burger has been eaten. An obvious compromise between these two conditions is that part of the burger has been eaten. More formally, we define a variable k such that at the beginning and at the end of every iteration, the first k bites of the burger have been eaten. We want a guard whose negation, when combined with this invariant, implies the postcondition. The obvious choice for this guard is that k does not equal NumBites, for when the negation of this is true (i.e., k does equal NumBites), then the invariant states that all NumBites bites in the burger have been eaten. A simple bound function for this loop is NumBites-k. Here is the complete loop. Note that the two statements that constitute the loop body are written to accomplish the operations of decreasing the bound function and maintaining the invariant.

k := 0;

precondition: The burger is uneaten, (i.e., 0 bites have been eaten), and k = 0.

invariant: The first k bites of the burger have been eaten.

bound: NumBites-k

do k!=NumBites ->

TakeBite(k);

k := k+1

od

postcondition: The entire burger has been eaten. (i.e., all bites have been eaten.)

I have taught many students who feel uncomfortable with and even hostile to the notion of loop invariants simply because they do not properly understand the nature of loop invariants. I will try here to prevent the development of some common misconceptions. If you pay close attention to only one small portion of this section, let it be these two paragraphs. The invariant is a condition. It is not a command; its nature is declarative, not imperative. "x := 0" cannot be an invariant, but "0<=x<42" can. I have had some students who, when asked to write an invariant, wrote a narrative description of what the loop was doing, such as this attempt at an invariant which occurred in a lexical scanner, in a loop whose function was to read in a string of numeric characters and construct the integer represented by them: "Start with CurrentCharacter in ['0'..'9'] and NumberValue = 0. Add each digit in the input onto the end of NumberValue. Stop when CurrentCharacter is no longer a digit." This is an adequate description of what the loop is doing, but it is not an invariant. It is rather a muddled combination of precondition, invariant, guard, and postcondition. The invariant in this case is "The prefix of the digit string that has been scanned so far is the decimal representation of NumberValue". The corresponding loop looks like this:

NumberValue := 0;

do CurrentCharacter '0'..'9' ->

NumberValue := NumberValue.10 + (ord(CurrentCharacter) - ord('0'));

CurrentCharacter := ReadNextCharacter()

od

The guard is that CurrentCharacter is a digit. Initially, the prefix is the null string (i.e., the string that comprises zero characters), because none of the digits have been scanned, and the corresponding accumulated value in NumberValue is 0. (We assume for simplicity that the null string is a valid representation for the number 0.) When the guard becomes false, all the digits have been scanned, so the current prefix is the entire digit string, and so the invariant says that NumberValue must be the value whose decimal representation is the entire digit string. This is the postcondition.

In order for the invariant to be relevant to the loop, it must be phrased in terms of the loop variables. "The sky is blue" is certainly invariant, in that it is always true at the beginning and at the end of every iteration of a loop. But such an invariant, one that is true for any loop, has nothing to say about how a particular loop should be constructed. Remember that you can use implicit loop variables as well as explicit ones in defining the invariant. Because the invariant makes a general statement about every iteration of a loop, you should avoid special cases and words such as "except" in defining invariants. If these occur, they are probably telling you that there is a simpler way of writing the loop than the one that you are thinking of.

For another example, suppose I want to write a loop to print the squares of the integers from 0 to 9. The precondition of the initialisation is simply "true", a dummy precondition that asserts nothing. The postcondition of the loop is "the squares of all the integers from 0 to 9 have been printed". A good way to weaken the postcondition in this case seems to be to replace the constant 9 by a loop variable, which we will call i. This yields the invariant "the squares of all the integers from 0 to i-1 have been printed". (Since it is more natural to let i start out at 0 instead of at -1, we choose this form of the invariant over the more naïve `the squares of all the integers from 0 to i have been printed", along with the appropriate initialisation i := 0 instead of i := -1.) The guard is dictated by the way in which the postcondition has been weakened; we know that its negation when combined with the invariant must imply the postcondition, so the guard must be i!=10. It remains to write the body of the loop. This should always be done last. The loop body has two goals: to decrease a bound function, and then to reestablish the invariant, which may have been violated when the bound function was decreased. (Remember, it is only at the beginning and at the end of each iteration that the invariant must be true. It can become false in the middle of the loop body, as long as it is reestablished by the end of the loop body.) A suitable bound function here is i-10 along with an increment statement i := i+1 in the loop body to decrease it. This is the tightest possible bound; it gives exactly the number of iterations remaining. Bound functions are not always as precise as this. If i is increased and nothing else is changed, the invariant no longer holds; the (i-1)th square has not yet been printed. The invariant can be reestablished by a statement that prints that square. So the full loop, with initialisation, is this:

i := 0;
precondition: i=0, and no squares have been printed
invariant: k(0<=k<i => k2 has been printed)
bound: i-10
do i != 10 ->
write(i2);
i := i+1
od
postcondition: i(0<=i<10 => i2 has been printed)

The "Russian peasant" multiplication algorithm is a simple way of implementing multiplication using the two simpler operations add and shift. It is used in some electronic calculators. A shift operation is a simple way of multiplying or dividing a binary number by two. A left shift moves all the digits left one place and concatenates a 0 on the right. A right shift moves all the digits right one place, dropping the rightmost digit. A left shift in decimal representation multiplies by ten, and a right shift in decimal representation divides by ten. (If the number is not a multiple of ten, then the remainder from the division is discarded.) 42 when shifted left becomes 420. 54 when shifted right becomes 5. Similarly, the binary number 11 (3 in decimal) when shifted left becomes 110 (6=3.2), and when shifted right becomes 1. The Russian peasant algorithm can be implemented as a loop, and to find its invariant we combine the precondition and the postcondition of the problem. The precondition is that we are given two numbers a, b whose product we must find. The postcondition is that we have found one number s that is the product. A natural compromise between these two situations is that we have three quantities x, y, s such that xy+s is the product. To establish this invariant at the beginning of the algorithm, we can make the initial values of x and y the two multiplicands, and set s to 0:

the multiplicands a, b have been provided by another part of the program

x := a;

y := b;

s := 0;

precondition: (xy = ab) (s = 0)

invariant: xy+s = ab

<...loop...>

postcondition: s = ab

So, although we have yet to do any work on the computation, the invariant holds at the beginning of the first iteration of the loop, because xy+0 is indeed the product. Given the invariant, the postcondition will be established when xy = 0, so the loop guard must be xy != 0. In fact, x, not y, is the number that our algorithm will cause to become 0, so we can simplify the guard to x != 0. (Since the problem is symmetric in x and y (i.e., xy is the same number as yx), we could just as well have constructed the algorithm so that y goes to 0 instead of x. The choice is arbitrary.) A bound function for the loop is the value of x. The body of the loop must decrease the bound function while maintaining the invariant. Here it is. (We assume the existence of functions ShiftLeft and ShiftRight, which perform the left and right shift operations on their arguments, and RightmostBit, which returns the rightmost binary digit of its argument.)

the multiplicands a, b have been provided by another part of the program

x := a;

y := b;

s := 0;

precondition: (xy = ab) (s = 0)

invariant: xy+s = ab

bound: x

do x != 0 ->

if RightmostBit(x) = 0 ->

x := ShiftRight(x);

y := ShiftLeft(y)

[] RightmostBit(x) = 1 ->

s := s+y;

x := x-1

fi

od

postcondition: s = ab

The loop body is an alternative statement containing two cases, each of which decreases the bound function while maintaining the invariant. (In Pascal, this statement would have the form "if RightmostBit(x) = 0 then ... else ...", with one case under the "then" and the other under the "else".) If the rightmost binary digit of x is 0, then x is shifted right (divided by two), y is shifted left (multiplied by two), and s is left unchanged. Dividing x by two and multiplying y by two does not alter the product xy, so the invariant is maintained. At the same time, the bound function x is decreased. If the rightmost binary digit of x is 1, then it is changed to 0, and y is added to s. Changing the rightmost digit from a 1 to a 0 decreases x by 1, so this takes care of the requirement that the bound function be decreased. The invariant is maintained, because (x-1)y + (s+y) has the same value as the original expression xy+s. Therefore, the loop is correct.

A real-world example from my own golden age of hacking hopefully will serve to convince the reader of the general applicability of this technique of developing a loop. Digital sampling is a technique that occurs in applications from user interfaces to process controls to digital music. The analog-to-digital converter is a basic device of digital sampling; it produces a number in a predefined, bounded range corresponding to the level of some analog input parameter. For reasons of hardware, it is cheaper and easier to build a hardware digital-to-analog converter (DAC) than it is to build a hardware analog-to-digital converter (ADC). To economise, it is possible to implement an ADC using a DAC and a comparator. A comparator compares two analog input voltages, V0 and V1. Its1-bit digital output is a low voltage if V0<=V1, and a high voltage if V0>V1.

* Thinking of the DAC and the comparator as functional "black boxes", that is, as computational entities with only inputs and outputs whose internals are irrelevant, how would you connect their inputs and outputs to build a higher-order black box that compares an analog input with a digital input? (You may wish to draw a diagram.)

* Give an algorithm that uses your new black box to implement an ADC. You should work out your own solution and then examine the one given here. You may make the following assumptions:

* The digital value corresponding to the analog level is in the range [0..MAXLEVEL].

* The output of your black box can be read by your program as the value of the Boolean variable comparator, where the value true corresponds to a high voltage and false to a low voltage.

* The input of the DAC comes from the integer variable dac_input.

* The comparator and the DAC function instantaneously; there is no need to wait in between setting their inputs and reading their outputs.

* The level of the input is constant during the execution of your algorithm.

The solution here uses a general technique called binary search in which the range of uncertainty is halved on each iteration of the loop. In this context of determining a value given only "higher" or "lower" responses to one's guesses, the strategy of binary search is often referred to as successive approximation. If I am to guess a whole number based on responses of "higher" or "lower", and initially I know that this number N lies in a range 0<=N<2E, there is a strategy that allows me to arrive at N within a maximum of E+1 guesses. The postcondition here is that we have correctly guessed N. This condition can be viewed as the state of having pinched down a range of uncertainty so that it comprises only one number. The invariant, then, is arrived at by the method of enlarging the range of a variable. The range is bounded as follows:

0 <= lower_bound <= voltage < upper_bound <= 2E

This invariant can be established using the initialisation

lower_bound := 0; upper_bound := 2E

Given the invariant, the range will have converged to a single value when the upper bound is just above the lower bound. The guard, then, is

upper_bound != lower_bound+1

A loose but adequate bound function is one less than the number of values still left in the range of uncertainty:

b = upper_bound-lower_bound-1

The body of the loop follows. To decrease the interval of uncertainty and thereby decrease the bound function, we guess the average of the two bounds and feed that guess through the DAC to the comparator. Using the output of the comparator we discard half the interval. (This example assumes that the output of the DAC is connected to the V0 input of the comparator, and that the analog voltage input is connected directly to the V1 input of the comparator.) In this way we converge logarithmically upon the digital value that corresponds to the input voltage level. Here is the algorithm in full:

lower_bound := 0; upper_bound := 2E;

invariant: 0 <= lower_bound <= voltage < upper_bound <= 2E

bound: upper_bound-lower_bound-1

do upper_bound != lower_bound+1 ->

dac_input := (lower_bound + upper_bound)/2;

if comparator -> upper_bound := dac_input

[] ~comparator -> lower_bound := dac_input

fi

od

postcondition: voltage = lower_bound

As mathematicians are wont to do, in solving this problem we made a simplifying assumption that is not quite true in the real world. To phrase it more bluntly, we cheated. The input level is not necessarily constant while the program is running. In fact, if it is an audio or other oscillatory signal that is being digitised, the input is certainly changing during the execution of the algorithm. How does this fact affect the accuracy of the algorithm presented above? (Think about the process by which the interval is shrunken.) How does it affect the accuracy of your own algorithm? For both algorithms, think on the following questions: Does the algorithm produce a value that is the correct digital representation of the signal at the moment at which the algorithm terminates? If not, what can you assert about the result of the algorithm? Given a fast enough computer upon which to execute, will the algorithm still work in practice?

This concludes our brief discussion on developing correct programs. Excellent further reading on this subject is [Gries 1981].

Induction

A line of dominoes is set up so that when any domino falls, it causes its successor to fall. The first domino is knocked over.

If a climber can reach a rung of a ladder, then they can reach the rung after it. The climber can reach the first rung of the ladder.

If a unit of water flows out of a siphon, the next unit of water will be drawn up into its place and will also flow out of the siphon. The first unit of water is drawn out of the siphon. For any n, if predicate F is true for all natural numbers less than n, then F is true for n (i.e., n(k(k < n => F(k)) => F(n))). F is true for subject 0 (i.e., F(0)).

The above four cases are similar. We infer on an intuitive level that all the dominoes fall, that the climber can reach all the rungs of the ladder, that all the water flows out of the siphon, and that F is true for all natural numbers (nF(n)). Each of these arguments rests upon the notion of induction. Induction, in general, is a process of going from specifics to universalities. Given the specifics that one falling domino causes the next domino to fall and that the first domino does fall, one can induce the universality that all the dominoes fall. Given the specifics that if a particular property holds for a natural number then it holds for the next natural number and that the property holds for a particular natural number k, one can induce the universality that it holds for all natural numbers greater than or equal to k. In general, given the specifics that if a particular property holds for a subject in some ordering then it holds for the next subject(s) in the ordering and that the property holds for some particular subject k in the ordering, one can induce the universality that it holds for all subjects in the ordering that are direct or indirect successors of k. (A subject can have more than one predecessor and/or more than one successor. Natural numbers have unique predecessors and successors because they constitute a special kind of ordering called a linear ordering. But arrangements of subjects need not be linear. They can contain branches and joins, just as a chain of dominoes can contain branches and joins.) An inductive proof consists of a base case and an inductive step. The base case is the first domino, the bottom rung of the ladder, the base upon which the chain of implication is founded. Usually most of the work in an inductive proof happens in the inductive step. This step is a universally quantified implication that moves the property from k to the successor(s) of k. It is proven either by -proof or by another induction.

There are two forms of induction. In strong induction, the inductive case takes as its antecedent the fact that all subjects in the ordering that are less than the particular subject k possess the property in question. Sometimes this much information is necessary in order to prove that the property holds for k. In many interesting cases, however, the only information needed is that the property holds for the immediate predecessor(s) of k. This is called weak induction. This text will treat only weak induction. The rules for inductive proof may be summarised as follows:

weak induction: F(n) k(F(k) => F(k+1)) strong induction: F(n) k(m(n<=m<k => F(m)) => F(k))

x(x>=n => F(x)) x(x>=n => F(x))

Here is a sample inductive proof about summations. Algebraic details have been eliminated and reasons have been omitted for brevity. In an inductive proof, the assumption for =>-proof is called the inductive hypothesis. Many presentations of inductive proofs omit the final step that uses the =>-proof rule and simply understand it to be present, but in this book it is included in order to make the technique more obvious for beginners.

Show n(i = (n)(n+1)/2) by weak induction:

base case:

[i=0,0] i = 0

= (0)(0+1)/2

inductive case:

[assume [i=0,k]i = (k)(k+1)/2 Show [i=0,k+1] i = (k+1)((k+1)+1)/2

[i=0,k+1] i = [i=0,k] i + (k+1)

= (k)(k+1)/2 + (k+1) from the inductive hypothesis

= (k2 + 3k + 2)/2

= (k+1)(k+2)/2

= (k+1)((k+1)+1)/2 ]

[i=0,k] i = (k)(k+1)/2 => [i=0,k+1] i = (k+1)((k+1)+1)/2 =>-proof

Finally, it should be noted that induction may be done on expressions that contain more than one variable. Under such circumstances, one of the variables must be chosen and induction must be done only on that variable. So, to prove xyzP(x, y, z) by induction on z, for example, one proves xyP(x, y, 0) as the base case and xy(P(x, y, k)=>P(x, y, k+1)) as the inductive step. Since the x and the y in the inductive step are universally quantified, anything can be substituted for their occurrences in the inductive hypothesis. For example, the inductive hypothesis in the above-mentioned proof of xyzP(x, y, z) would be xyP(x, y, k), and this could be applied to produce statements such as P(x+1, y, k), P(x, k, k), P(a, b, k), &c., wherever they are needed.

Another peculiarity of multi-variable inductive proofs is the possibility of employing smaller inductive proofs, with induction done on a different variable, as sub-proofs. These sub-proofs can be termed lemmas. These lemmas can employ their own inductive hypotheses and also the inductive hypothesis of the main proof. Examples of multi-variable proofs and proofs involving lemmas will be presented in the next section.

It is interesting to use logic to describe the kinds of sets to which the method of induction applies. In this introduction we have treated dominoes in chains, rungs on a ladder, units of water flowing into a siphon, and the natural numbers. All these sets have a property in common that allows induction to be used to prove assertions about them. (If the discussion that follows is confusing to you, these paragraphs may be skipped.) The "less than" relation on the natural numbers, symbolised by `<', is familiar to you from algebra. There are similar relations on the dominoes, the ladder rungs, and the units of water flowing into the siphon. On the set of dominoes, the relation is defined by the order of the dominoes in the chain. If domino x would fall before domino y, then x<y. On the set of ladder rungs the relation is described by the expression "lower than". On the units of water flowing into a siphon, the relation is one of temporal precedence: the units of water flow into the siphon in some order in time. Thus with any of these sets, the relation `<' can be defined on the elements. This ordering relation is used in what follows. Recall that the strong form of induction states that if it is true that if some property P holds for all the elements in the set under consideration that are less than some element y, then P must hold for y also, then P must hold for all the elements of the set under consideration. This can be stated formally:[25]

xP(x) <=> y(x(x<y => P(x)) => P(y))

Remember that any implication A=>B is equivalent to the statement ~AB. (In the exercises in the propositional calculus, this identity was termed "implication reduction".) Remember also that ~xF(x) <=> x~F(x). These two theorems can be combined to yield the theorem (xF(x) => B) <=> (x~F(x) B). This theorem can be applied to the right-hand side of the statement above to yield

xP(x) <=> y(x~(x<y => P(x)) P(y))

Applying implication reduction in the sub-expression ~(x<y => P(x)) yields ~(~(x<y) P(x)). DeMorgan's -Law and ~-use transform this into x<y ~P(x). So the full expression becomes

xP(x) <=> y(x(x<y ~P(x)) P(y))

The predicate P defines a set. In fact, it defines two sets, all the elements in the universe for which P is true and all the elements in the universe for which P is false. We can give these sets names. Let us call the set of all the elements of the universe for which P is false "NOT_P". Then for any element x, "P(x)" can be rewritten as "xNOT_P", and "~P(x)" can be rewritten as "xNOT_P". Under these translations, the expression that we are working with becomes

x(xNOT_P) <=> y(x(x<y xNOT_P) yNOT_P)

Now the left-hand side of this equivalence becomes recognisable as a fancy way of stating that NOT_P is empty:

(NOT_P = ) <=> y(x(x<y xNOT_P) yNOT_P)

If both sides of a true equivalence are negated, the resulting equivalence is true. So, again by applying quantifier negation, DeMorgan's laws, and implication reduction, the expression can be transformed through the following steps:

(NOT_P != ) <=> ~y(x(x<y xNOT_P) yNOT_P) negation of both sides

(NOT_P != ) <=> y~(x(x<y xNOT_P) yNOT_P) quantifier negation

(NOT_P != ) <=> y(~x(x<y xNOT_P) yNOT_P) DeMorgan's -Law (with ~-use)

(NOT_P != ) <=> y(x~(x<y xNOT_P) yNOT_P) quantifier negation

(NOT_P != ) <=> y(x(~(x<y) xNOT_P) yNOT_P) DeMorgan's -Law

(NOT_P != ) <=> y(x(x<y => xNOT_P) yNOT_P) implication reduction

This final statement is termed "well-foundedness". In English, it says that if NOT_P is nonempty, then NOT_P must contain some element y such that all elements of U that are less than y must be excluded from NOT_P. Take a moment to verify this meaning in your own thoughts, and review how this form was derived starting from the principle of strong induction.

We are interested in the question of when the method of induction applies in general, for any property P. So, since P is an arbitrary property, the set NOT_P is an arbitrary subset of the universe. Therefore, in order for induction to apply to a universe U, the above statement must hold for all subsets S of the universe U:

(S != ) <=> y(x(x<y => xS) yS)

A universe U with a "less than" relation < is well-founded if and only if every nonempty subset of U contains a minimal element with respect to the relation <. That is, no matter how one chops up U into subsets, in any subset S one can always find some particular element y such that there are no elements of S that are less than y. This makes sense, intuitively. Induction works by establishing chains of implication that grow upwards according to the structure that is imposed on the set by the < relation. Each of these chains must be "founded" upon some minimal element. One can imagine arranging an infinite set of dominoes so that no chain of the dominoes has a beginning or an end. The simplest way to do this is to place a domino next to each integer on a number line. This arrangement has no beginning and no end; there is no least integer and there is no greatest integer. Note that one can use induction on subsets of this set that are well-founded under the relation `<'. In particular, induction applies to the natural numbers, a subset of the integers. (In fact, if we scrap the conventional "less than" relation on the integers and arrange them in a different way, we can use induction on them. This illustrates the importance of the choice of `<' relation. To see some ways in which the elements of sets can be rearranged to one's advantage, see the appendix on infinity.) This can be viewed metaphorically as choosing a domino somewhere in the middle of the chain that has no beginning and no end, and knocking it over. All its successors topple. But the set on which induction is being used in this case is not the entire chain of dominoes, but the sub-chain that begins with the domino that was first knocked over. Thus this sub-chain is well-founded. The example of the integers is somewhat mundane; one can also imagine a chain of dominoes that in both directions (going toward the greater and going toward the lesser) branches out into infinitely many side-chains, each of which in turn branches out into infinitely many side chains, &c. Such a structure would have no beginning and no end.

EXERCISES

Induction on Lists

A list is a sequence of objects. The simplest possible list is the empty list ( ), which can also be denoted using the special word nil. The set of lists can be inductively defined, with nil as the base:

( ) LISTS

x(y1 ... yn) ( (y1 ... yn) LISTS => (x y1 ... yn) LISTS ) where (y1 ... yn) is a list of 0 or more elements

All lists can be generated by a chain of applications of the above rules. Nothing that cannot be derived using the rules is a list.

Anything that is not a list is called an atom. Note that in the above definition, x (called the "head" of the list) may be anything in the universe, but (y1 ... yn) (called the "tail" of the list) must be a list. Thus the construction of lists is right-recursive; the right-hand part of a list must itself be a list. Since there are no constraints on the nature of the left-hand element, it may or may not be a list itself. The left-hand element of the list ((Heraclitus) Parmenides Zeno) is the list (Heraclitus). The left-hand element of the list (Democritus Dalton Fermi) is the atom Democritus. Lists may be viewed as syntax trees, just as formulæ from the propositional and the predicate calculi may be viewed as syntax trees.

Here are some more lists and their heads and tails:

LIST	HEAD	TAIL
(((1) 1) A)	((1) 1)	(A)
(1 1 (A (2 (B))))	1	(1 (A (2 (B))))
(1 (B (2) B) 3)	1	((B (2) B) 3)
((0 0 0) DESTRUCT (0))	(0 0 0)	(DESTRUCT (0))
Remember, the left-hand element may be an atom, but the right-hand element of any list is always another list.

We can define some interesting functions over the set of lists. In the following definitions, `x', `y', and `z' represent lists, and `n' represents an atom or a list.

append-base: y(append(nil, y) = y)

append-step: nyz(append((n y), z) = (n append(y, z)))

reverse-base: reverse(nil) = nil

reverse-step: ny(reverse((n y)) = append(reverse(y), (n)))

The names of the functions are mnemonic; append concatenates two lists, and reverse reverses the elements of a list (but not the elements of any sub-lists). Try out append and reverse on some sample lists of your own choosing, and see how they operate. Once you've done this, you'll be ready for some theorems:

Right identity for append: Show x append(x, nil) = x

base case: x = nil

append(nil, nil) = nil append-base (left identity)

inductive step: x = k

[assume append(k, nil) = k Show append((n k), nil) = (n k)

append((n k), nil) = (n append(k, nil)) append-step

= (n k) ] inductive hypothesis

append(k, nil) = k => append((n k), nil) = (n k) =>-proof

Associativity of append: Show xyz(append(x,append(y, z)) = append(append(x, y), z))

base case: x = nil; choose arbitrary y, z.

append(nil, append(y, z)) = append(y, z) append-base

= append(append(nil, y), z) append-base

Therefore yz(append(nil, append(y, z)) = append(append(nil, y), z)), by -proof.

inductive step: x = k

[assume yz(append(k, append(y, z)) = append(append(k, y), z)) Show yz(append((n k), append(y, z)) = append(append((n k), y), z))

choose arbitrary y, z:

append((n k), append(y, z)) = (n append(k, append(y, z))) append-step

= (n append(append(k, y), z)) inductive hypothesis

= append((n append(k, y)), z) append-step

= append(append((n k), y), z) append-step

Therefore yz(append((n k), append(y, z)) = append(append((n k), y), z)), by -proof. ]

yz(append(k, append(y, z)) = append(append(k, y), z)) => yz(append((n k), append(y, z)) = append(append((n k), y), z)) =>-proof

Distributivity of reverse over append: Show x reverse(append(x, y)) = append(reverse(y), reverse(x))

base case: x = nil; choose arbitrary y.

reverse(append(nil, y)) = reverse(y) append-base

= append(reverse(y), nil) right identity for append

= append(reverse(y), reverse(nil)) reverse-base

Therefore, y(reverse(append(nil, y)) = append(reverse(y), reverse(nil))), by -proof.

inductive step: x = k

[assume y(reverse(append(k, y)) = append(reverse(y), reverse(k))) Show y(reverse(append((n k), y)) = append(reverse(y), reverse((n k))))

choose arbitrary y:

reverse(append((n k), y)) = reverse((n append(k, y))) append-step

= append(reverse(append(k, y)), (n)) reverse-step

= append(append(reverse(y), reverse(k)), (n)) inductive hypothesis

= append(reverse(y), append(reverse(k), (n))) associativity of append

= append(reverse(y), reverse((n k))) reverse-step

Therefore, y(reverse(append((n k), y)) = append(reverse(y), reverse((n k)))), by -proof. ]

y(reverse(append(k, y)) = append(reverse(y), reverse(k))) => y(reverse(append((n k), y)) = append(reverse(y), reverse((n k)))) =>-proof

Remember, you can do induction on only one variable at a time! If you have two variables x and y, and you are doing induction on x, then y is arbitrary; don't touch it. When proving a result involving more than one variable, you may start by doing induction on one variable and then find that you reach an impasse. You can either try starting with induction on the other variable or try to prove by induction on the other variable a lemma that will allow you to continue. Finally, remember to look for places to use the inductive hypothesis. Many students feel that they are stuck because they cannot find any definitions or theorems that would usefully transform an expression that they have derived, when in actuality the answer is staring them in the face in the form of the inductive hypothesis that they have written down at the beginning of the proof. Writing an inductive proof is like starting on one bank of a river and using a bridge to reach a point directly across the river, on the opposite bank: One must first walk along the river in order to get to the bridge. Although this seems to bring one farther from one's destination and to make the problem more complex, it is a necessary step, because the opposite bank can be reached only by crossing the bridge. Once one has crossed the bridge, one may walk along the river in the opposite direction in order to reach one's destination on the far side. The bridge is analogous to the inductive hypothesis. An inductive proof massages an antecedent expression into a form to which the inductive hypothesis applies, then uses the inductive hypothesis, and then massages the result into the form that is demanded as the consequent.

We can define a function from lists to natural numbers called length, as follows:

length-base: length(nil) = 0

length-step: length((n y)) = 1+length(y)

This function computes the length of a list.

EXERCISES

Peano Arithmetic

However strange it may seem to you at first, the natural numbers are a lot like lists. The natural numbers were given an inductive definition and symbolisation and a formal system of inference by the Italian mathematician Peano. This system rests upon the idea of the successor function. Taking the successor of a natural number is like incrementing it by 1. Here is Peano's inductive definition of the set N, the natural numbers:

0 N (0 is a natural number.)

x(x N => sx N) (The successor of any natural number is a natural number.)

xy(x N y N x!=y => sx != sy) (No two natural numbers have the same successor.)

~x(sx = 0) (0 is not the successor of any number.)

Any property that is possessed by 0, and also by the successor of any number that possesses it, is possessed by all the natural numbers.

Simpler definitions of the natural numbers have since been constructed, but Peano's will suffice for our purposes. Note that Peano's final property is only the principle of weak induction. The unary operator `s' means "successor of". So the string of symbols "s0" in the Peano system refers to the idea that is expressed in English by the word "one". "ss0" has the same meaning as "two", "sss0" the same meaning as "three", and so on. The symbol `0' is the base element, corresponding to nil in the lists world. It is easy to see that any natural number n is expressed in Peano arithmetic by stacking n `s's in front of a `0'. This may seem circular, depending upon your point of view--I've just used numbers and counting to describe a way to implement numbers and counting. One can't think of n `s's without implicitly referring to numeration. But in a way the Peano axioms say nothing about counting: all that they describe is the successor function, a way of shunting around strings of `s's. Their implementing counting is a sort of side effect. We can introduce Arabic numerals as meta-symbols to make it easier to write out numbers in Peano arithmetic: "sn", where n is an Arabic numeral, can be taken as a shorthand for a string of n `s`s. Parentheses can be included in Peano arithmetic expressions in the same way that they are employed in everyday arithmetic. Strictly speaking, the formalism should include rules to implement parenthesisation if there are going to be parentheses in formulæ, but I feel that explicit parenthesisation rules would only obstruct and confuse the student. You should, however, be aware that expressions that are not fully parenthesised are only shorthand notation for expressions that are. So the shorthand "s30.s0+s20" really means "((s(s(s(0))).s(0))+s(s(0)))". The appearance and disappearance of parentheses because of operator precedences often causes problems for students who try to substitute one expression for another. Suppose that one is working with the expression x.y, and it has been discovered that x = a+b. Then the result of the substitution must include parentheses in order to generate the correct order of operations: (a+b).y. This case arises in everyday algebra. A case more often overlooked is that of substituting a sum into a sum or a product into a product. Here parentheses must be employed to resolve the ambiguity in order of operations. (Most systems give binary arithmetic operators such as + and . left association, that is, an unparenthesised expression such as x+y+z defaults to left-to-right evaluation: (x+y)+z. For our purposes, however, it is best to be explicit.) When dealing with parenthesisation, think not of substituting one string of symbols for another but of grafting one syntax tree into the place of another. When operator precedences do not lead to the correct syntax tree, they must be overridden using parenthesisation. This view will lead to appropriate parenthesisation.

In the lists world there is an infinite number of successors for each list, although every list (except nil) has a unique predecessor. The successors of list x are all represented by (n x), with n standing for an arbitrary element of the universe. In the world of Peano arithmetic there is only one possible successor for each number. There are, however, many similarities. Identities, associativities, and distributivities exist in both the list world and the Peano arithmetic world. Here are the axioms for arithmetic in the Peano world:

plus-base: x(x+0 = x)

plus-step: xy(x+s(y) = s(x+y))

mult-base: x(x.0 = 0)

mult-step: xy(x.s(y) = x +x.y)

The operations + and . are self-explanatory, coming as they do from simple arithmetic. The plus-base and mult-base rules are right additive identity and right multiplicative zero. The left identity and left zero properties can be proven as theorems. The plus-step rule is the equivalent of the commutation x+(y+1) = (x+y)+1. The mult-step rule is the equivalent of the distribution and commutation x.(y+1) = x + x.y. Here are some proofs.

Show s20+s20 = s40: i.e., show that 2+2=4

s20+s20 = s(s20+s0) plus-step

= s2(s20+0) plus-step

= s2s20 = s40 plus-base

Show x(x+s(0) = s(x)):

x+s(0) = s(x+0) plus-step

= s(x) plus-base

Therefore, x(x+s(0) = s(x)) by -proof.

Show x(x.s(0) = x): right multiplicative identity

x.s(0) = x + x.0 mult-step

= x+0 mult-base

= x plus-base

Therefore, x(x.s(0) = x) by -proof.

Show x(0+x = x) left additive identity

base case: x = 0

0+0 = 0 plus-base

inductive step: x = k

[assume 0+k = k Show 0+s(k) = s(k)

0+s(k) = s(0+k) plus-step

= s(k) ] inductive hypothesis

0+k = k => 0+s(k) = s(k) =>-proof

Lemma: xy(s(x+y) = s(x)+y) needed in proof of commutativity of +

base case: y = 0; choose arbitrary x.

s(x+0) = s(x) plus-base

= s(x)+0 plus-base

inductive step: y = k

[assume x(s(x+k) = s(x)+k) Show x(s(x+s(k)) = s(x)+s(k))

choose arbitrary x:

s(x+s(k)) = s2(x+k) plus-step

= s(s(x)+k) inductive hypothesis

= s(x)+s(k) plus-step

Therefore, x(s(x+s(k)) = s(x)+s(k)), by -proof. ]

x(s(x+k) = s(x)+k) => x(s(x+s(k)) = s(x)+s(k)) =>-proof

Show xy(x+y = y+x) commutativity of +

base case: y=0; choose arbitrary x.

x+0 = x plus-base

= 0+x left additive identity

Therefore, x(x+0 = 0+x), by -proof.

inductive step: y=k

[assume x(x+k = k+x) Show x(x+s(k) = s(k)+x)

choose arbitrary x:

x+s(k) = s(x+k) plus-step

= s(k+x) inductive hypothesis

= s(k)+x lemma

Therefore, x(x+s(k) = s(k)+x), by -proof. ]

EXERCISES

Remember, you can employ any theorem that you have proven previously as a step in the proof of a new theorem. Also, remember to use your inductive hypothesis. Finally, you can check your work by noting that "s(x)" acts like "x+1"; if you begin with s(x)+s(y) and get s(x+y), for example, you know that you have goofed somewhere along the way, because (x+1)+(y+1) != (x+y)+1.


NEXT CHAPTER:
2. Language
[11][Carroll 1936] p. 181.
[12]Stephen Kandel, "I, Mudd" episode of Star Trek. Paramount, 1967.
[13][Harper 1904] p. 99-101.
[14][Harper 1904] p. 51.
[15][Quetelet 1848] p. 301.
[16][Quetelet 1835] p. 98-99.
[17][Porter 1986]
[18][Galton 1907]
[19][Dubbey 1977], [Becher 1980]
[20][Dubbey 1977]
[21][Pycior 1981]
[22][Boole 1854]
[23]A worm is an independent program capable of initiating an infection on its own. A virus, in contrast, resides within another program and is potent only when that program is executed.
[24][Gries 1981]
[25]The proof that follows is from [Gries 1987].