Theory & Experiment

  


http://www.mdstafleu.nl 

© 2016 M.D.Stafleu

(revised 2019, 2021)


Contents

Preface

1. The logic of theories

1.1. The Copernican revolution

1.2. The artificial character of a theory

1.3. The logical character of a theory

1.4. Concepts

1.5. Statements and their context

1.6. The logic of theories and the significance of language

2. Explanation and prediction

2.1. Physics and astronomy before Copernicus

2.2. Copernicus’ return to Platonism

2.3. Prediction and explanation

2.4. Retrograde motion

2.5. The size of the planetary orbits

2.6. Kepler on explanation and prediction

3. Four irreducible principles of explanation

3.1. Number and space in the harmony of the spheres

3.2. Explanation of change in Aristotelian cosmology

3.3. Galileo on motion

3.4. Cartesian physics

3.5. Early concepts of force

3.6. Newton’s dynamics

3.7. Absolute and relative space, time, and motion

4. Experimental philosophy in electricity and magnetism

4.1. Early magnetism and electricity

4.2. The quantification of electricity

4.3. Mathematical fields

4.4. The discovery of the electric current          

4.5. Electric current and potential difference

4.6. The electromagnetic field

5. Solution of problems

5.1. The various functions of the statements in a theory

5.2. Normal science

5.3. The generation of problems

5.4. Crisis and revolution

6. Problem shifts in the history of optics

6.1. Medieval theories of vision

6.2. Geometrical optics

6.3. Newton’s Opticks and Huygens’ Traité de la lumière

6.4. The wave theory of light

6.5. Emission and absorption of light

7. The unification of physical science

7.1. The Newtonian synthesis

7.2. Connections between separated fields of science             

7.3. The law of conservation of energy                          

7.4. Thermodynamics

7.5. Atomism in experimental philosophy and in mechanism

8. Axioms, hypotheses, and laws

8.1. The axiomatic method

8.2. Osiander: ‘On the hypotheses in this work’

8.3. Galileo and the church

8.4. Descartes on hypotheses

8.5. Newton: ‘Hypotheses non fingo’

8.6. The idea of natural law in classical and modern physics

9. The heuristics of experimental philosophy

9.1. Induction and isolation as heuristic tools

9.2. The method of mathematization: the law of gravity

9.3. The method of successive approximation

9.4. The unifying method of analogy

9.5. The application of technology

10. Conclusion

10.1. Objective and subjective tests of a theory

10.2. Immanent, transcendent, and transcendental critique

10.3. The critical function of the scientific community

10.4. Critique of scientific activity

See also Cumulative index of cited literature and index of historical persons 


Preface

Theory and Experiment is a completely revised and updated combination of Theories at work, published in 1987 by the University Press of America, Lanham, and Experimentele filosofie, published in 1998 by Buijten & Schipperheijn, Amsterdam. As a historical and epistemological counterpart to Laws for dynamic development (2015), its aim is to study the nature of theoretical thought, in particular the structure and functioning of scientific theories, and the empirical character of scientific investigation. Besides a critical review of past and present philosophies of science, it presents a philosophical analysis of the history of physics until the end of the twentieth century. (Theories at work was mostly concerned with the sixteenth and seventeenth, its sequel Experimentele filosofie with the eighteenth and nineteenth centuries).

The modern view of science, of theories and experiments, was born during the Copernican revolution. It is the revolutionary idea that the basic axioms of a theory need not be evident, but should be derived from an active investigation of nature. The dynamic motif of Copernicanism, the creed that the earth moves, is counter-intuitive, notwithstanding the fact that by an overwhelming indoctrination nowadays everyone is made to believe it. According to the Aristotelian philosophy, prevalent before and even during the Copernican revolution, any explanation should start from known and well-understood premises. Gradually it began to dawn that a more powerful use of theories is to start from the unknown, from the unobservable laws of nature. The new aim of science became the theoretical and experimental exploration of the lawful structure of reality.

Immanuel Kant’s Kritik der reinen Vernunft (Critique of pure reason, 1781, 1787), in which second edition he coined the term Copernican revolution, was intended to be a philosophical reflection on Isaac Newton’s physics and on its implications for the theory of knowledge. He introduced a new kind of mechanism (different from René Descartes’), based on a rationalistic, a priori interpretation of Newton’s mechanics. Nineteenth-century Kantianism was challenged by positivism, a revival of medieval instrumentalism, which in several varieties (in particular Carl Hempel’s logical empirism) dominated the philosophy of science between circa 1920 and circa 1960. It stressed the logic of justification of theories, neglecting both the history of science and the relevance of experiments. About 1960 another revolution started, initiated by Karl Popper’s The logic of scientific discovery (1959), Thomas Kuhn’s The structure of scientific revolutions (1962), and Imre Lakatos’ Methodology of scientific research programmes (1970). It meant more attention for historical and social aspects of science. In these philosophical debates, the history of physics was a much discussed topic. Therefore, this will not only provide us with materials to test the analysis of natural science undertaken in this book, but also with an entry to confront it with some twentieth-century philosophies.

Philosophers of science are usually much more interested in the analysis of theories than in the relevance of experiments. As a result, Isaac Newton’s experimental philosophy is largely neglected. It is not even generally recognized that he rejected René Descartes’ mechanical philosophy throughout. In Immanuel Kant’s Newtonian disguise, mechanism made a strong come-back in the nineteenth century, when many physicists started to believe that classical physics could be founded in mechanical theories. I shall argue that Newton’s experimental philosophy was much more characteristic for the history of classical physics than mechanism or instrumentalism. Even the decline of classical physics at the end of the nineteenth century was caused by the failure of mechanist theories to cope with the successes of experimental physics implying randomness in nature.

Although instrumentalism, mechanism, and experimental philosophy dominated the philosophical battle about classical physics, they had several contenders, like Aristotelianism, neo-Platonism, the romantic German Naturphilosophie, and energeticism. The debate was not restricted to philosophers, for physicists took an active part in it. The separation between scientists and philosophers, which may have caused the fateful philosopher’s neglect of Newton’s mature views, did not occur before the nineteenth century. Since then, experimental philosophy became known as experimental science.

In the course of this book I shall develop some views initiated by the twentieth-century Christian philosophers Herman Dooyeweerd and Dirk Vollenhoven. These concern the structure and functioning of theories, the relevance of empirical research, irreducible principles of explanation, the idea of natural law, and an increasing emphasis on relations of many different kinds.

Chapter 1 introduces both the Copernican revolution and the structure of a theory as a deductively connected set of statements. Chapter 2 discusses two basic functions of theories. The distinction between prediction and explanation constituted a hot topic in the debate between instrumentalist and realist astronomers.

Chapter 3 demonstrates the emergence of four mutually irreducible principles of explanation in the physical sciences, concerning quantitive, spatial, kinetic, and interactive relations. It led to a controversy between René Descartes’ mechanical philosophy, stressing motion as a fundamental principle, and experimental philosophy, propagated by Isaac Newton and others, who believed that physical investigation is only possible by interactive research. As a case study, chapter 4 describes how experimental philosophers studied magnetism and electricity during the eighteenth and nineteenth centuries, culminating in James Clerk Maxwell’s electromagnetic theory of light.

Chapter 5 is concerned with the problem solving and problem generating function of theories. Chapter 6 discusses how in the history of optics many problem shifts occurred, mostly made possible by the parallel and often independent development of the arts and crafts. The success of experimental research into infrared and ultraviolet light interacting with matter led to the downfall of mechanism, and the end of classical physics by the introduction of randomness replacing determinism.

Chapter 7 on the unification of the physical sciences first discusses the Newtonian synthesis of the Copernican achievements, next the idea of the unity of physical science as promoted by nineteenth-century romantic philosophers. It led to the discovery of the law of conservation of energy, the development of thermodynamics, and – despite romantic, instrumentalist and even theological objections – to atomic theories

Chapter 8 returns to the structure of theories, discussing the distinction between hypotheses and axioms, and the idea of natural law. Chapter 9 is involved with the heuristics of experimental philosophers, how they searched for and found natural laws. The mathematical method and its complement, successive approximation, were both applied by Newton in his Principia.The method of analogy was most effectively used by Maxwell in finding the laws called after him. The application of technology is a condition for experimental science, to be illustrated by the investigation of gases at a very low pressure, culminating in the discovery of the electron and of X-rays.

Applying the philosophical distinction between immanent, transcendent, and transcendental critique, the concluding chapter 10 reflects on the former chapters.

The critical-realistic approach is elaborated in the companion volume, Nature and freedom (2019).


 Chapter 1

The logic of theories

1.1. The Copernican revolution

The introductory chapter 1 is a synopsis of what I consider to be a theory, to be studied in the context of the Copernican era, when a modern view of theories arose (1.1). It concerns the artificial character of a theory (1.2) and its logical character (1.3), concepts (1.4), statements and their theoretical context (1.5), as well as the significance of language for the logic of theories (1.6).[1]

For various reasons, the time-honoured expression Copernican revolution has a strong appeal.[2] It appears to point to the title of Copernicus’ epoch making book, De revolutionibus orbium coelestium, libri VI (On the revolutions of the celestial spheres, six books, 1543).[3] Its publication marks the beginning of modern astronomy, but Nicolas Koppernigk did not intend to start anything like what is now understood by a revolution.

The term Copernican revolution was probably first used in 1787 by Immanuel Kant, who coined it to emphasize a radically new point of view in his own epistemology.[4] By implication, Kant recognized the revolutionary character of Copernicus’ heliocentric theory, more than Copernicus himself did. Like Thomas Kuhn in his book The Copernican revolution (1957), I shall apply the expression Copernican revolution to the historical period from 1543 to 1687. The latter year witnessed the publication of Isaac Newton’s Philosophiae naturalis principia mathematica (Mathematical principles of natural philosophy), which book was considered to deliver the final and decisive proof of the Copernican theory.

The division of history into well-defined eras is unavoidably arbitrary. More aesthetic than logical, it indicates various styles of investigation. The Copernican revolution overlaps the scientific revolution, starting with Galileo and Kepler. Besides physics and astronomy, it includes alchemy, geography, medicine, and the life sciences, if not mathematics.[5] Besides the rise of mechanical philosophy, the seventeenth-century revival of atomism is sometimes considered more characteristic for the scientific revolution than astronomy. This revival was caused by the downfall of Aristotelian philosophy, largely due to Copernicanism. During the Copernican era, atomism was a common yet speculative world view. Only in the nineteenth century it became an experimentally well-founded theory in chemistry and physics. However, more than mechanism or atomism, for Kepler and Galileo the Copernican thesis of the moving earth was the driving force. Also Newton’s Principia, the pinnacle of classical science, deals with the motion of the heavens.

In science, the Platonic-Pythagorean Renaissance indicates the period between circa 1400 and 1600. The humanist Renaissance, starting circa 1350 with Francesco Petrarch, contributed to the rise of natural science by its renewed interest in classical texts (purged from translation errors), by its criticism of scholastic Aristotelian science, and its call to return to ancient, especially Platonic, views. It was succeeded by classical physics, about 1600-1900 dominated by mechanical and experimental philosophy. The scientific revolution owed a lot to Johannes Gutenberg’s invention of movable type printing (circa 1440). Besides Copernicus’ Revolutionibus several other scientific texts were printed in 1543, such as Andreas Vesalius’ De humani corporis fabrica libri septem (Seven books on the structure of the human body); Opera Archimedis (Archimedes’ works); and Euclid’s Elements. The latter two, translated into Italian by Nicolo Tartaglia, marked a neo-Platonic interest in mathematics, and would strongly influence classical physics. William of Moerbeke translated Archimedes’ work on floating bodies, but this had no influence on medieval thought.

Inspired by Plato, Nicholas Copernicus may be considered a Renaissance scholar, as well as Giambatista della Porta,[6] Tycho Brahe, Giovanni Benedetti, William Gilbert, Galileo Galilei until 1592 (when he started to work in Padua), and Johannes Kepler until 1600 (when he moved to Prague). Kepler and Galileo started their careers as Renaissance scientists, but their most mature work is classical in spirit. They crossed the watershed between ancient and early modern science.[7] Kepler’s and Galileo’s works form a turning point in the Copernican revolution, marking the end of the neo-Platonic Renaissance and the start of classical physics with motion and force as new principles of explanation. Indeed, the Copernican ideology of the moving earth was the motor of the transition from Renaissance to classical physics. Copernicus started it, Kepler, Galileo and Descartes were its chief advocates, and Newton brought it to completion. This period is called Copernican because almost all its heroes considered themselves Copernicans. Their common creed was that the earth moves, and their common aim was to explain this.

The Copernican revolution concerned astronomy and physics, mechanics, magnetism, and optics. Simultaneously it saw a battle between several philosophies. Christianized during the Middle Ages, Aristotelian philosophy dominated the universities, and was mostly defended by conservative professors. It included a realist view of physics and cosmology, in contrast to an instrumentalist view of observational astronomy.[8] It elicited the Platonic-Pythagorean reaction, the Renaissance philosophy, with its appeal to return to the Greek and biblical sources of civilization, unpolluted by medieval corruption. Up till Galileo, most Copernicans were under the spell of this philosophy.[9] Next, mechanical philosophy replaced Platonic views. It was a reaction to Aristotelian philosophy as well. Its main spokesman was René Descartes, but he was neither the first nor the last mechanist. Galileo Galilei, Isaac Beeckman, Marin Mersenne, Christiaan Huygens, and Gottfried Leibniz adhered to it in various degrees. It saw a revival during the nineteenth century. Finally, empirism came to life, a new philosophy opposing the rationalistic trends of the preceding ones. Traces of it can be found in Kepler, Galileo, and Huygens. Francis Bacon was its prophet, and Blaise Pascal, Robert Boyle, and Isaac Newton propagated it under the flag of experimental philosophy. Newton’s Principia (1687) marks the end of the Copernican revolution, and his Opticks (1704) the beginning of the second phase of classical physics, more experimental than theoretical.

1.2. The artificial character of a theory

Would it be possible to distinguish theoretical thought from non-theoretical thought? Let me try to answer this question without discussing the far more difficult problem about the nature of thought itself. Natural, non-theoretical thought is spontaneous. It is characterized by an immediate relation between the thinking person, the subject, and the object of their thought. In theoretical thought this direct relation is interrupted, because people put theories between themselves and their object of thought. A theory is like a medium, mediating between subject and object, it is an instrument. I shall elaborate this, without committing myself to instrumentalism.

Natural and artificial seeing

In order to clarify the instrumental character of theories, let us compare these with instruments to improve human vision. Seeing is a natural activity of men, and of all animals having eyes. We see objects in our environment – a tree, a tower, a car. Occasionally, we also look at a picture of a tree. In an artificial manner, we see a tree, whereas in a natural manner, we see a picture. In a natural way, we cannot see our own face, or the phases of Venus. Using a mirror, we see naturally a picture, but artificially our own face. Using a telescope, we see naturally an image, but artificially the phases of Venus. Artificial seeing is not contrary to natural seeing, but depends on it.

Optical instruments are invented in order to see better than would be possible in a natural way. In Holland, one cannot see the Eiffel tower directly. But one can see a picture of it, a photo, a miniature, or a TV-picture. This kind of artificial seeing seems to be an exclusively human activity. Animals do not see the Eiffel tower, if they see its picture – they only see a piece of paper. Nor do animals invent instruments to improve on their seeing.

Contrary to natural seeing, artificial seeing has a history. Medieval painting, Renaissance and modern art differ widely from each other. Photography was invented in the nineteenth, television in the twentieth century. It is more than a coincidence that the telescope and the microscope were invented during the Copernican revolution. The new movement made these discoveries possible, and needed them all the same. Galileo was the first to use the telescope for astronomical observations – it is a historical event. He discovered new stars, mountains on the moon, and Jupiter’s satellites, besides Venus’ phases and the sunspots. He used these discoveries in his propaganda for the Copernican theory.

Natural and artificial thought

There is no need to define seeing – everyone knows what it is. Similarly, natural thinking is familiar. It is a natural activity of men and women, and perhaps of all animals having brains. Natural thought concerns trees, towers, or stars, good or evil deeds, families and churches, colours and paintings.

All thought is characterized by dissociation and association, by logic. Thinking beings are logical subjects, and they think about logical objects. The logical objects of natural thought are concrete, everyday things, events, and their relations. The thinker distinguishes and relates them.

A theory, too, is an object. It is certainly not a thinking subject, as it does not think. But it is not a logical object. Except for philosophers, nobody thinks about theories. A theory is a humanmade artefact.[10] People make theories, invent them, improve them, and use them. Theories are used as instruments in human thought. Theoretical thinking is natural thinking, opened up by the use of instruments. People form concepts of concrete things, of events, and of relations, and they think theoretically about these.

Contrary to natural thought, theoretical thought has a history, the history of ideas. It is generally assumed that theoretical thought originated with the Greeks, about 600 BC. The theories of the Greeks differ from the medieval theories, from those of the seventeenth century, and from the present.

Theoretical thought is concerned with statements, and statements concern concrete things, events, and relations. This constitutes a problem, the problem of the relation between artificially conceived theories, statements, and concepts on the one hand, and concrete things, events, and relations on the other hand. This problem is related to the fact that by using an instrument people enlarge their power of seeing or thinking, but simultaneously diminish their field of vision or attention. Using a microscope one can see much better than without, but at the same time one sees much less. Moreover, the other senses are eliminated. Looking at a real dog, one does not only see him, but also smell him, and hear him. This continuity of sensory experience is interrupted when looking at a picture of a dog. Therefore, a dog can see another dog, but if shown a picture of a dog, he may not recognize it. In theoretical thought people make abstractions, restricting their conceptual activity, and switching off their other modes of experience – for instance, their feelings.

Often, theoretical thought is at variance with natural thought. An example is the Copernican leading idea of the moving earth, which is counter-intuitive. It is ‘… almost against common sense to imagine some motion of the earth’, as Copernicus admits.[11] Therefore it is required that a theory be proved. Whether this is possible and to what extent will be discussed later on.

The three worlds of Karl Popper

The distinction between thinking subjects, objects to be thought about, and theoretical artefacts, more or less corresponds with Popper’s distinction of three worlds:

‘… the first is the physical world or the world of physical states; the second is the mental world or the world of mental states; and the third is the world of intelligibles, or of ideas in the objective sense; it is the world of theories in themselves, and their logical relations; of arguments in themselves; and of problem situations in themselves.’[12]

The correspondence is not perfect, however. Popper seems to think that his classification is complete and exhaustive, that everything belongs to one of his worlds. My classification is merely logical; it has merely a logical function. There are more subjects than logical subjects: mathematical subjects like numbers and spatial figures, physical subjects like atoms and stars, biological subjects like plants and animals.[13] Even humans are more than merely thinking subjects. Theye have feelings, they act, they love, they believe.

Philosophically speaking, something is a subject if it is directly and actively subjected to a given law. An object is passively and indirectly (via a subject) subjected to a law. Therefore, whether something is a subject or an object depends on the nomic context. If one wants to discuss individual subjects or objects, one needs typical laws in order to distinguish them from each other. These laws also determine typical relations between subjects and between subjects and objects. However, individual things and events have non-typical relations to each other as well. These modal subject-subject relations and subject-object relations obey general or modal laws, constituting about sixteen mutually irreducible relation frames (Dooyeweerd’s modal aspects or law spheres).

Next, there are more objects than logical objects. The magnitude of a spatial figure, its length or volume, is a spatial object. A road is a kinetic object, it does not move, but is indispensable for traffic. Food is a biological object; it does not live, but is a condition for life. A painting is an object of art. There are also far more artefacts than theories and ideas – telescopes, houses, cars, clothing.

Hence, distinguishing logical subjects, logical objects, and theories as logical instruments concerns only one of our fundamental modes of experience. The logical aspect of human experience is only one of its segments. Other aspects, like the quantitative, the spatial, the kinetic, and the physical, have played equally important parts in the history of the Copernican revolution, as will be seen in due course (chapter 3).

1.3. The logical character of a theory

A human mode of experience and relations, logic implies making distinctions and connections. One of the most important logical distinctions to be made is that between the truth and falsity of statements. Hence, the most general logical function of a theory is to prove a statement, to establish its truth content relative to other statements. Omitting statements which are more or less probable, I shall restrict myself to propositions which are hold either false or true.

Logical distinctions are made by people. As a logical subject, a person is actively subjected to logical laws, which they apply in their arguments, and which they have to obey if wishing to argue correctly.

The most important logical law is the law of non-contradiction. Within a certain context a statement and its negation cannot both be true. The qualification ‘within a certain context’ is of crucial importance. It points out that theoretical reasoning has a relative value. A statement can be true in one context, false in another one. But a theory is inconsistent if it simultaneously contains a statement and its negation.

In addition to the law of non-contradiction several other logical rules or tools of proof are available, such as syllogisms, modus tollens, modus ponens, and argumentum ad absurdum.[14] The method of complete induction, applicable wherever numbers are at stake, is not universally accepted. L.E.J. Brouwer and other intuitionists accepted a proof only if it is finite. Proof by complete induction is therefore rejected. Brouwer also rejected reductio ad absurdum as proof.

The logical definition of a theory

What is a theory? The Greek word theoria means something like contemplation (our word theatre is related), but already the earliest Greek philosophers connected theoria with proof, or deductive reasoning. I shall take for granted that a theory invariantly implies logical deduction. It is often assumed that a theory should start from well-known and accepted truths, in order to arrive at new statements or theorems. Other people maintain that scientific theories start from the unknown, from hypotheses which should explain the observable. In this case, theories are even identified with hypotheses,[15] but I shall consider hypotheses to be statements, not theories. Leaving room for both approaches, from the known to the unknown, or from the unknown to the known, I propose the following provisional definition of a theory, as far as its logical character is concerned. A theory is a deductively ordered collection of statements accepted or proved to be true. Hence, a theory is not just a set of statements, but a qualified collection.[16] This definition is purely logical. It only concerns the formal structure of a theory. Sometimes, a single statement of the type ‘If a then b’ is called a theory. However, this is only meaningful in combination with some other statement, for instance ‘It is the case that a’.

A theory is a deductively ordered set of statements, meaning that each statement is directly or indirectly connected with each other statement by way of a deductive argument, a deduction. In a technical sense, a theory is a partially ordered set, because it is never the case that each pair of statements is connected such that one is deduced from the other one.[17] This leads to the following criterion: A statement belongs to a theory if and only if it takes part in the deductive process in the theory. Because of this definition, a theory is called closed with respect to deduction, but it is quite open in other respects, as we shall see presently.[18]

Later on, various kinds of statements in a theory will be discussed (1.5, 4.1). At present, I observe that in most if not all theories, data are indispensable for the deductive process. But data are exchangeable. A datum can be replaced by its negation, as long as this does not lead to contradictions. This means that a theory is an open system. With this criterion it can be decided at any moment which statements belong to the theory.

Each theory consists of a number of independent axioms and data, and a number of theorems, which are derived from the axioms and data. Hence, in a theory two statements may be directly connected, if one is deduced from the other, or indirectly, either if both are deduced from the same set of axioms and data, or if both are used to deduce a third statement.

The deductive ordering in a theory ought to be non-circular: circular reasoning is a logical fallacy.

True statements

A theory is a set of statements taken to be true within the context of the theory. This is the most intriguing part of our definition. It is a necessary part, because of the fact that a false statement allows of any conclusion. Put otherwise, a theory is required to be consistent, i.e., free of contradictions. From a logical point of view, a statement asserted to be false is a contradiction. From a couple of contradictory statements, any statement whatever can be validly inferred.[19] The statement ‘if p then q’ is equivalent to ‘either p is false, or q is true’. Hence, if one admits both p and its negation, q is always true. Therefore, a statement asserted to be false cannot be used in a logical process.

On the other hand, very often in a theory statements are used which are known to be false in a wider context. Theories of planetary motion now state that the earth is a point, then that it is a perfect sphere, although it is very well known that these statements contradict each other, and are both wrong. The subjunctive method of using counterfactuals is so common, and so fruitful, that it cannot be ignored. Clearly, saying that only true statements in a theory are admitted is not meant to adhere to absolute truth in whatever sense. It is not even demanded that the statements are believed to be true – nobody believes the earth to be a point.

Theories as logical instruments are used by people, by logical subjects, in general not by a single person, but by a group of people, who want to use the theory together. These people must decide which statements they want to consider true, for the sake of the discussion. ‘Let us assume that …’ somebody suggests, and the discussion can only proceed if all participants are willing to accept the proposal, if only for the time being.

In other words: statements or propositions are true within a certain context – the context of the theory, and the context of the discussion between the people who use the theory. Outside this context the same statements may be false, or uncertain. But to make the deductive process possible it has to be assumed that the starting points are true. Then, if no logical mistakes are made, all deduced statements are equally true.

This leads to a most important conclusion: A theory is never able to prove a statement conclusively. The truth of any proved statement completely depends on the truth of the axioms and data from which the theory starts. A theory determines the truth of a statement relative to the truth of other statements. In how far the latter are true must be decided in a different way. A theory is an instrument to propagate truth, to transfer truth, but never to create truth.

Three logical relations

Clearly a theory functions in three logical relations:

1. In the logical subject-object relation, a theory is an instrument between subject and object. A theory is made and used by people (individual or in groups), and is concerned with logical objects, the things or events about which people want to theorize. Each statement has a non-logical content, besides a logical form.

2. A theory has a function in an argument, a discussion, a logical debate between people who want to convince each other of the truth or falsity of statements. The participants in the debate must agree on the initial assumptions and the applied methods of proof, because otherwise a discussion would be impossible. This intersubjective relation is a logical subject-subject relation.

3. In using a theory, people are bound to logical rules or logical laws. Hence a theory is indirectly subjected to logical laws; it functions in a logical subject-law relation. Strictly speaking, it is not the theory which has to obey these laws, but the people who use the theory. Only they can be responsible for any use or misuse of existing or new theories.

In all three relations, logical subjects are involved. Theories cannot be considered apart from the people who make them and who use them.

The functions of a theory

From the above definition it follows that a theory can never be intended to give a mere description of whatever state of affairs. A description can be given with the help of words and sentences, and usually by a setof sentences, like a narrative. But a mere description is not an ordered deduction, and does not constitute a proof. As a narrative, it has more a lingual than a theoretical structure (1.6). A theory has other functions than to provide a description of reality. These functions: to predict, to explain, to solve problems, to systematize our knowledge, will be discussed in the next chapters.

The distinction between a mere statement (or set of statements) and a theory as a deductive scheme may be illustrated by comparing Copernicus with his so-called precursor, Aristarchus of Samos, of whom little is known but his assumption that the earth moves around the sun.[20] This statement of heliocentricity should not be confused with Copernicus’ heliocentric theory, which transcends Aristarchus’ statement. Copernicus was aware of this difference, when he wrote:

‘… let no one suppose that I have gratuitously asserted, with the Pythagoreans, the motion of the earth; strong proof will be found in my exposition of the circles …’[21]

Before starting the discussion of the functions of a theory, the logical character of a theory will be investigated in relation to the characters of concepts and statements.

1.4. Concepts

Logical reasoning is based on meaningful distinctions and connections. Not only theories, but also concepts and statements have an instrumental, an intermediary function in reasoning. Distinctions and connections are made in our subjective thought, and they concern external, objective affairs. As intermediary artefacts, explicit expressions of human thought, concepts have an objective character, but they are not the primary objects of thought.

Hence, theories, statements, and concepts have a logical character (as instruments of thought), and a non-logical meaning, referring to non-logical states of affairs. In natural thought explicit concepts and statements are not much in need. In theoretical thought they are indispensable. Classification and conceptualization belong to the first phase of any field of theoretical thought.

Definitions

Because a concept is not a statement, it cannot be an element of a theory. But a definition is a statement, and can therefore be an element of a theory. A definition is employed in order to introduce explicitly a new concept into a theory. Usually, however, concepts are tacitly or implicitly introduced. Examples of explicit definitions could be: ‘This is Venus’, or ‘Venus is the planet having a period of 224 days’. Any theory contains well-defined concepts alongside ill-defined concepts. Each user of a theory can be challenged to clarify their concepts, to distinguish these clearly from other concepts. However, it is impossible to define all concepts, because concepts are needed to define others. Each theory has a number of primitive concepts, which cannot be defined within the theory.

It is sometimes said that a definition is free. This is not entirely true, if one wants to introduce a new concept into an existing theory. Also definitions are subjected to the logical law of excluded contradiction. The definition of a new concept should not contradict the definitions of already accepted concepts in the theory. Any definition should avoid a contradiction in terms.

Identity

Concepts may have an individual or a universal character. In the first case they establish an identity, in the second case a species or class.

By its identity each thing can be distinguished from every other thing, each event from every other event, or each individual relation from every other individual relation. Since ancient times, Mercury, Venus, Mars, Jupiter, and Saturn were identified as planets, as wandering stars. This means, for instance, that last night and tonight one recognizes the same planet to be Mars, even though it has moved on meanwhile. A significant result of Greek astronomy, ascribed to Pythagoras, was the identification of the morningstar and the eveningstar as the same planet, Venus.

The idea of identity is subjected to the logical law of identity. Each thing is identical with itself. In the course of a logical argument it is not allowed to change the identity of the things about which one argues. A common fallacy of identity is equivocation, to identify what is not identical.

Often, but not always, identified things or events are tagged by a proper name (Venus) or a date (Copernicus’ death: 1543).

Classes

Classes and species may refer to things, like minerals, plants, animals; to events; to human acts, artefacts and associations; and much more. For instance, Aristotle distinguished four classes of change: variation of essence, of quality, of quantity, and of position. Change of position, also called local motion, was further divided into natural and violent motion. Natural motion was divided into motion towards the centre of the universe, away from the centre, or around the centre. In Aristotle’s theory, the first kind of natural motions refers to the class of heavy bodies, the second to the class of light bodies, and the third to the class of celestial bodies (3.1-3.2).

A system of related classes and subclasses is called a taxonomic system. It constitutes the barest kind of theory, for if one states that a certain individual belongs to a certain subclass, it can be deduced that it does not belong to other subclasses (if these do not overlap), and that it belongs to one or more superclasses.

A class concept points to things or events of the same kind, and are often indicated by a noun, like stars, planets, motions, dogs, lightings, birthdays. Properties, on the other hand, point to quite different things or events, which have something in common. In our language they are often indicated by an adjective, such as red, heavy, light-reflecting. Properties connect and disconnect classes. They are used to define classes. The property light-reflecting connects planets with houses, and distinguishes planets from stars. Properties serve to mark distinctions and similarities.

Aristotle distinguished between essential and accidental properties of an individual thing. Its essential properties indicate its nature, its essence, the species to which it belongs. Accidental properties establish the individual’s identity. Uniform circular motion around the centre of the universe is an essential property of a planet. But it is accidental that Mars takes about two years, Jupiter about twelve years to complete one period.

Intension and extension of the concept of a planet

A class concept involves both an intension (meaning) and an extension (the number of things or events belonging to the class). Extensional logic is restricted to the extension of concepts; predicate logic also concerns their intension.[22] Both extension and intension depend partly on the theory in which the concept is used. Consider, for example, the concept planet as conceived during the Copernican revolution.

Before Copernicus, a planet was defined as a wandering star, a celestial body moving with respect to the fixed stars. Besides Mercury, Venus, Mars, Jupiter, and Saturn, both the sun and the moon were recognized as planets. According to Aristotle’s cosmological theory, the seven planets move through the zodiac around the centre of the universe, the position of the earth.

This concept changed radically with the transition to Copernicus’ theory. In his heliocentric theory, a planet is a celestial body primarily moving around the sun. Hence, the earth became a planet, and the sun and the moon ceased to be so. Not only the intension of the concept planet changed accordingly, but also its extension. The number of the planets decreased from seven to six. Moreover, Tycho Brahe’s system, in which the sun moves around the earth, and the planets move around the sun, contains only five planets.

The Copernicans introduced the new concept of planetary system, namely a central body surrounded by one or more satellites. (The term satellite was coined by Kepler shortly after Galileo’s discovery of Jupiter’s moons.) Copernicus knew two planetary systems, the solar system and the earth-moon system. Galileo’s discovery of Jupiter’s moons was hailed as a significant reinforcement of Copernicanism, because

‘… now we have not just one planet rotating about another while both run through a great orbit around the sun; our own eyes show us four stars which wander around Jupiter as does the moon around the earth, while all together trace out a grand revolution about the sun in the space of twelve years.’[23]

It showed that the concept of a planetary system was not an arbitrary and improbable alternative for the geocentric systems of Aristotle and Ptolemy. Galileo’s discovery also showed that celestial bodies can partake in two motions simultaneously.

The extension of the concept of a planet is partially independent of the three theories mentioned above (Aristotle’s, Copernicus’, and Brahe’s). It invariably includes Mercury, Venus, Mars, Jupiter, and Saturn. This is also the case with the concept’s intension. Any theoretical definition has to take into account the character of planets as wandering stars. Only with respect to the sun, the earth, and the moon the three theories differ. The difference with respect to intension implies a partial shift with respect to the extension.

The extension of a concept can be changed without changing its intension, for example by the discovery of a new planet, like Uranus (1781). Such may be predicted with the help of a theory, as was the case with Neptune (1846). The recognition of Pluto as a planet (1930) was later undone, because it appeared not to fit into the intension of the concept planet as conceived in 2006.

Relations

Aristotle’s distinction between essential and accidental properties played an important part in medieval discussions. It has concealed the fact that there is another kind of concepts, namely relations. A relation is not a property of a single individual, but a property of at least a pair of individuals, or a pair of classes. Aristotelian philosophy had hardly any place for relations. Something is large or small, heavy or light, warm or cold, moist or dry, moving or resting.

Gradually, the Copernicans became aware that these binary contraries had better be replaced by relations, such as larger than, heavier than, warmer than.[24] Especially Galileo paid much attention to this matter. He rejected the contrary distinction between heavy and light bodies, by showing all bodies to be more or less heavy. He emphasized that rest is not contrary to motion, but is only a gradation of motion, with zero speed. A falling body, starting from rest, has a continuously increasing speed, varying from zero to the final value.

Aristotle distinguished between quantitative and qualitative properties, and he clearly valued the latter much higher than the former. The Copernican revolution changed this radically. The question of how large something is will sooner be raised in a climate in which the relation larger than is more important than the contrary distinction of large and small. More and more the Copernicans became interested in measurable quantities, developing measuring instruments and standards, for instance, a yardstick, a thermometer scale, a standard weight.

During the Copernican revolution, besides quantitative relations, also spatial, kinetic, and physical relations became increasingly important concepts for the physical sciences, as will be seen. 

Operational definitions

The shift from qualitative to quantitative concepts is one of the most striking features of the scientific revolution of the seventeenth century. This shift has been of consequence for definitions. In Aristotelian physics, conceptual definitions concern the essence, the nature of things. Gravity is the tendency of heavy bodies to move towards the centre of the universe. In their theories, Galileo and Newton attempted to describe gravity with the help of measurable properties like acceleration, mass, and weight. The distinction between heavy and light bodies, so important in Aristotelian physics, disappeared. Following Archimedes, both Giovanni Benedetti and Galileo Galilei stated that bodies only move upward spontaneously if their density is less than that of the surrounding water or air.

Definitions determining how a property can be measured are called operational. By an operational definition one does not define a magnitude, but its metric.[25] The metric of a measurable magnitude is a scale including a unit, useful for both measurements and calculations. Hence, a metric has both an experimental (measuring) and a theoretical aspect (calculating). For instance, the definition of the metric of force includes its method of measurement; a unit of force; the vector character of forces (this means, a force has a spatial direction besides a magnitude); the superposition principle (according to which forces of different natures but acting on the same object can be added in the way of vectors); and the relation of the net force on an object and its acceleration (Newton’s second law of motion). Because of the theoretical character of a metric the same metric may be connected to various measurement methods. As soon as a metric is established, a measuring instrument can be gauged, such that it satisfies the metric agreed upon. If a metric is generally accepted, it serves as a standard. A coherent set of metrics forms a metrical system.

In the twentieth century, Percy Bridgman introduced operationism, saying that operational definitions only are fit to determine the meaning of concepts. He thought that such a definition should unequivocally indicate how the property concerned should be measured.[26] Hence, if there is more than one means to measure a property, we have actually several properties. For instance, if several possibilities to measure the length of a thing are available, we should rather speak of different concepts of length, according to this somewhat extravagant view, which he later mitigated.

Newton’s operational definition of mass

Isaac Newton’s Principia defined mass or quantity of matter as the product of volume and density: ‘The quantity of matter is the measure of the same, arising from its density and bulk conjointly’.[27] This is clearly an operational definition, not a conceptual one. The concepts of volume and density cannot be multiplied with another. Ernst Mach considered Newton’s definition of mass circular, because he believed that density can only be defined as the mass of a unit of volume.[28] However, Newton did not define density in this way. He did not define density at all, apparently assuming this concept to be sufficiently known, contrary to mass.

Mass or quantity of matter was a completely new concept, introduced by Newton himself. It cannot be found with Galileo, nor with René Descartes or Christiaan Huygens. For Descartes, the essence of matter was its extension (3.4). Being material meant being extended. Hence, the quantity of matter was its volume. For Newton, matter and space were completely different. Therefore, he needed a new definition of quantity of matter. Keeping silent about its essence, he defined how its quantity could be measured. In the context of his ensuing theory, he argued that mass is a real property of any body, independent of its position. It turned out that mass is not only a measure of the body’s quantity of matter, but also of its inertia and its gravity. This cannot be derived from Newton’s operational definition of mass, but follows from his laws.

Whereas mass was a new concept in Newton’s time, density was definitely not. Density could be measured independently of any operation related to the division of mass and volume. A century before Newton’s Principia, Giovanni Benedetti and Galileo Galilei defined the concept of density in theories and experiments about floating, suspending, and sinking bodies, a subject introduced by Archimedes. In 1585, Benedetti rejected Aristotle’s contrary distinction between levity and gravity, pointing out that levity is caused by the upward force of the medium. A year later, Galileo’s La bilancetta (The little balance) described a hydrostatic balance to determine the average density of bodies and fluids.[29] In 1611 Galileo was involved in a polemic with a number of conservative Aristotelian scholars about the properties of floating bodies, and in 1612 he wrote Discourse on bodies on or in water.[30] Extended by Evangelista Torricelli and Blaise Pascal, Benedetti’s and Galileo’s views founded hydrostatics and aerostatics (9.5). Hence, for Newton it was obvious to define the new concept of mass using the well-known concepts of density and volume. However, his definition may very well have been influenced by the assumption that in denser matter atoms are more densely packed. [31]

1.5. Statements and their context

Statements may have various functions in a theory (5.1), but this section will only discuss their logical character in the context of a theory. The most simple statements are those connecting concepts via the copula is or equivalents. Aristotelian predicate logic is practically confined to statements like ‘Socrates is a man’, ‘all cows are animals’, ‘some men are Greek’, and ‘no swans are black’. In modern formal logic, the most important operations are negation, conjunction, disjunction, equivalence, and material implication. Propositions or statements are distinguished from propositional functions, in which variables occur. For instance, ‘x2+y2=z2’ is a propositional function. It is not a statement of which the truth can be established. It can only be ascribed a truth value if the free variables xy, and z are bound, for instance by a so-called quantifier. The existential quantifier is symbolized by Ǝx:there is at least one x, such that …; the universal quantifier by (x): for every x …. Now, if xyz are integral numbers, ƎxƎyƎz(x2+y2=z2) is a true statement, whereas (x)(y)(z)(x2+y2=z2) is false. The use of variables, first in mathematics, and soon afterwards in physics, is a fruit of the Copernican revolution.

The two quantifiers point to the important logical distinction between universal statements and existential ones. It reflects the ontological difference between the law side and the subject and object side of reality. Universal statements or law statements refer to lawful states of affairs. Existential statements refer to subjects or objects or their relations, or to data.

Only statements occur in a theory, and statements are human inventions. Newton’s law of gravity is a statement invented about 1680. But the law determining the motion of the planets around the sun has been valid long before Newton, even long before human beings inhabited the earth and started theoretical thought. Similarly, the fact the Jupiter has moons was established by Galileo in 1609. Thus the existential statement ‘Jupiter has moons’ dates from that year. But Jupiter having moons presumably preceded this statement for ages.

Clearly not every all statement is a law statement. If I say ‘all books in my study are catalogued’, I do not refer to a law, let alone a universal law of nature. In a logical sense this statement does not differ from a law statement like Newton’s law of gravity. Therefore, the logical distinction between universal and existential statements is not identical with the ontological distinction between laws and what is subjected to laws (8.6).

Theory dependence and autonomy

The truth of a statement is only partially determined by the theoretical context. It is possible to have the same statement in different theoretical contexts, with the same truth content. This is highly fortunate, for otherwise one would be unable to compare different theories.

This view expressing partial theory dependence and partial autonomy, both of concepts and statements, takes a middle course between two more extreme views, logical empiricism, and historical relativism.[32] One of the fundamental assumptions of logical empiricism was the existence of statements and concepts independent of any theory. These were so-called observation statements or protocol statements, and observational or empirical concepts. The empiricists strongly believed in observation, in which they found the certainty and trustworthiness of human experience. However, one cannot perform any observation out of the context of one’s expectations. In particular, for the observations made in scientific observatories or laboratories, elaborate instruments are used, which are developed according to sometimes rather advanced theories.

Criticism of empiricism was levelled by adherents of so-called historical-relativism.[33] Authors like Thomas Kuhn, Norwood Hanson, and Paul Feyerabend stressed the theory dependence of concepts and statements such that they became overbalanced to the view that the meaning of concepts and the truth of statements are completely determined by the theory in which they function.[34] This means that one cannot even compare two competing theories. It makes no sense to say that the theory of Copernicus is better than that of Ptolemy, because these are incompatible, talking about different worlds. The fact that sooner or later the Copernican theory was accepted is not the merit of the theory, but due to more or less accidental developments.

Kuhn, Hanson, and Feyerabend believed discussions between adherents of competing theories to be quite fruitless. It may be doubted whether Galileo for instance would have accepted this view. Galileo became famous because of his two great books, Dialogo (1632), and Discorsi (1638). These works of fiction describe discussions between three persons, Salviati, Sagredo, and Simplicio. Although the opinions of Salviati (the Copernican) and Simplicio (the Aristotelian) differed strongly, Galileo presented them as able to discuss all problems put forward.

Agreement with the view that observations cannot be made apart from any theoretical context should not blind us to the fact that observational results may be quite independent of some theories. For example, the occasional backward motion of the planets, which played an important part in Copernicus’ heliocentric theory, was never disputed by any astronomer, whether adhering to Copernicanism or not.

Arguments

Hence theories, statements, and concepts are intimately interwoven. This is also the case with their respective structures or characters[35]. Without any doubt, these characters are logically qualified. The main functions of theories, statements, and concepts is to mediate between logical subjective thought and its objects, and between the participants in any logical discourse. It is impossible to study theories, statements, and concepts as instruments of thought out of the context of logical laws and logical objects, and in particular apart from the logical subjects and their reasoning.[36] The functioning of theories, to be discussed in the following chapters, can only be understood in this context.

Logic might be considered a fundamental principle of human experience, not unlike the quantitative, spatial, kinetic, and physical principles, to be discussed in chapter 3. Because theories are characterized by deduction, their character has a typical kinetic aspect, deduction being the logical movement from one statement to another. Similarly, statements have a typical spatial aspect, being dominated by the idea of logical connection. Statements invariably connect other statements or concepts. Finally, the character of a concept refers to the logical unity and multiplicity, hence to the quantitative relation frame or mode of human experience.

In its turn, a theory has a function in a discussion, an argument. In a dispute, one does not only prove theorems from axioms and data, but the axioms and data themselves are discussed. The force of the arguments is tested, convictions collide, data are weighed. Hence, a discussion (in a logical sense) has a physical foundation, because it is based in a logical interaction between arguments.

1.6. The logic of theories

and the significance of language

The final section of the introductory chapter 1 is concerned with the art of the argumentative usage of language, the way people converse with each other when they use theories and other logical structures. It is subject to the norm of clarity. It concerns the question of how theories are communicated, how they are made clear to various people.

An important part of the philosophical discussions of the first half of the twentieth century concerned the possibility of reducing theoretical language to observation language, of translating all theoretical statements into observation statements, supposed to be independent of any theory.[37] Braithwaite rejects the possibility of defining theoretical terms in the observation language: ‘A scientific theory which, like all good scientific theories, is capable of growth, must be more than an alternative way of describing the generalizations upon which it is based, which is all it would be if its theoretical terms were limited by being explicitly defined.’[38]

Since the linguistic turn (circa 1970), analytical philosophy considers lingual analysis to be the nucleus, if not the whole of philosophy. It is a branch of logical empiricism, more influential in the humanities than in the natural sciences. It considers concepts to be symbols, such that there is little difference between a word and a concept, between a logical statement and a lingual sentence, between a theory and a narrative. The aim of science is assumed to be hermeneutic, to translate reality, to interpret it, to clarify it. Because ordinary languages like Latin, German, or English are not suited for this purpose, logical-empiricist philosophers proposed that science should develop its own formal language, unequivocal, and interpretable in only one way, assuming the need of three vocabularies, a logical, a theoretical, and an empirical one.[39]

In contrast, I shall assume that the logical aspect and the sign aspect of human experience are mutually irreducible, such that logical relations and structures presuppose the lingual ones.[40] Language has an important hermeneutic function in science, but science itself is not linguistically qualified. ‘Science has a language, but it is not a language, it is a body of ideas and procedures expressed in a number of languages.’[41] Whereas language is ambiguous, inviting interpretation, a logical act requires arguments, inviting proof. In order to find out whether the truth of a statement can be proved, first its semantic meaning has to be established. When Louis XIV was called le Roy-Soleil (the Sun King), occupying the centre of the universe according to his Copernican admirers, this was clearly a metaphor. Metaphoric expressions are significant, but not logically true. They allow of nuances, of many ways to express the same thing. Though having semantic meaning, and providing insight, they cannot function in a proof. The choice of metaphors is strongly determined by one’s world view. Thus, during and before the Middle Ages, the organistic world view required mechanical instruments like the lever to be made clear by comparing these to living systems. In a mechanistic world view, it is the other way round. Now the functioning of the human arm is explained by comparing it with a lever. Since the Copernican revolution, the clockwork metaphor became very popular in clarifying the new system of the world.[42]

The foundation of the logical mode of experience in the lingual one implies the possibility of expressing concepts in words, statements in verbal sentences, and theories in spoken or written texts. In a logical argument one wants to establish whether a statement is or is not true, but that is only possible if the corresponding sentence is grammatically correct, having semantic significance. The statement ‘tonight the sun will set at 20.05 hours’ is grammatically correct and has semantic meaning. It can be true or false. But the grammatically incorrect sentence ‘tonight the sun 20.05 hours sets at’ or the meaningless sentence ‘tonight the roof sets at 20.05 hours’ are both neither true nor false. They cannot play a part in a logical reasoning.

We have seen that a theory cannot allow of statements supposed to be false. Similarly, a norm for the meaningful use of language is that people speak the truth. When someone says ‘it rains’, this has only meaning if we assume that they intend to affirm that it rains.[43] Even lying is only possible in a context in which speaking the truth is the norm.[44] The scientific function of language is to achieve clarity about all this – the starting points, the debated issues, the data, the argument, the results, the tests.

In three respects argumentation needs a special kind of language. First, it requires its own logical indicators, words like therefore, thus, so, hence, consequently, because, since, for.[45] Second, because the aim of science is the investigation of laws, it needs universal quantifiers (‘for all …’), referring to law conformity, and existential quantifiers (‘there is a …’) referring to a particular instance of a law. Third, any science is in need of its own language. The logical requirement of having unequivocal concepts is at variance with the lingual variety of homonyms and synonyms, which, on the other hand, serves the needs of didactics very well. For instance, while acceptable in an Aristotelian context, the homonymic use of the word earth, meaning soil, the heaviest element, or the globe, became for Galileo an equivocation to be avoided.[46] Copernicanism had to introduce new words, or rather new meanings to existing words, like force, mass, quantity of motion. In this process, science started to withdrew from common language.

Mathematics as language of physics

It is often stated that mathematics is the language of physics. This assumption is clearly contradicted by the view that physics itself is a language. If one rejects this, and mathematics is accepted as a science on a par with physics, one should also reject the idea that mathematics is a language. Only if one considers mathematics to be fundamentally different from science could one accept the statement that it is a vehicle of science.

In Il saggiatore (1623) Galileo seems to adhere to this view:

‘Philosophy is written in this grand book, the universe, which stands continually open to our gaze. But the book cannot be understood unless one first learns to comprehend the language and read the letters in which it is composed. It is written in the language of mathematics, and its characters are triangles, circles and other geometric figures without which it is humanly impossible to understand a single word of it; without these, one wanders about in a dark labyrinth.’[47]

In this passage, Galileo reveals his neo-Platonic background, according to which nature can only be explained with the help of mathematical ideas. Galileo’s calling mathematics a language is merely a metaphor, a manner of speech. In fact, Galileo laid the basis for a far more intricate relation between physics and mathematics. The possibility to project kinetic and physical relations onto quantitative and spatial relations allowed him to apply mathematics to physical states of affairs (9.2).

This includes physics making use of the language of mathematics, which has been developed during the seventeenth century in a way that has been very important, also for physics. This development, never foreseen by Galileo, concerns the introduction of the algebraic symbol system, the use of letters as symbols for numbers and variables. This is indeed an example of the introduction of a typical language into science, showing how important and fruitful a right and well-considered use of language can be. Another example is the introduction of decimal fractions by Simon Stevin, although a hundred years later Newton still applied ordinary fractions. Finally, formulas as expressions of physical laws and relations came into use only very slowly. Started in the seventeenth century, it was commonly applied only in the nineteenth century.

Levels of communication

The main function of language is communication, the transfer of feelings, thought and skills by means of symbols and metaphors. Another function is the preservation of scientific knowledge and its transfer to later generations, the historical formation of the collective memory of mankind, mostly in written texts. Both private and public libraries have contributed significantly to this function.

Since the seventeenth century, scientific communication[48] occurs at different levels: between specialists on the same subject; between specialist and non-specialist scientists; between scientists and non-scientists; and in the didactic communication between a teacher and their students. Each has specific requirements with respect to the use of language.

At the first level, communication took place in Latin, by means of books and letters. The distribution of books was much better organized than during the Middle Ages, thanks to the invention of movable type printing in the fifteenth century. But the number of copies of each printed book was relatively small, not more than a few hundred copies every edition. Copyright did not exist, and often books were reprinted without permission of the author or the first publisher. Both Copernicus’ Revolutionibus and Newton’s Principia were written for specialists. In his preface, Copernicus emphasized that ‘mathematics is written for mathematicians’.[49] At the beginning of Principia’s third book, Newton says that he had abandoned the plan to write it in a popular way, ‘… to prevent the disputes which might be raised upon such accounts,’[50] i.e., to avoid the criticism of incompetent scholars. Nevertheless, he indicates which parts of the book can be omitted at first reading. During Newton’s lifetime, Principia was not translated into English, though it was into French. Opticks (1704) is a far less technical work, and was initially written in English, later also in Latin.

Another channel of communication was the exchange of letters – by Kepler and Galileo, by Descartes and Mersenne, by Newton and Bentley, by Clarke and Leibniz. Often, these letters were copied several times, and many of these have been preserved and form an impressive source of knowledge about seventeenth-century science.

The second level is very important to make the results of one specialization available to the others. If this does not function satisfactorily, the network character of theories is endangered, and stagnation may follow. A novelty was the introduction of scientific journals, for instance, the Philosophical Transactions of the Royal Society at London. Nowadays many of these journals have a specialist character and hence belong to the first category. But in the seventeenth century their aim was to inform non-specialists of the proceedings of science. Initially most papers were written in Latin, but gradually Latin had to give way to the common languages: Italian, French, English, Dutch, and German.

At the third level, especially Galileo did pioneering work, by writing his most influential book in colloquial Italian, in such a style that it was readable by non-scientists, that is, for an intelligent public. This contributed to Galileo’s popularity as well as to the dispersion of new views. In the Netherlands, Galileo was preceded by Simon Stevin, who pleaded for writing scientific works in common language in a comprehensible way. According to Stevin,

‘… the Greeks were of the most intelligent that Nature produces, but they lacked a good tool, that is, the Dutch language, without which in the most profound matters one can accomplish as little as a skilled carpenter without good tempered tools can carry on his trade.’[51]

An example is his Weeghconst (The art of weighing, 1586), a book about mechanics. After Galileo, Descartes wrote his influential Discourse on method (1637) in French.

‘And if I write in French, which is the language of my country, rather than in Latin, which is that of my teachers, it is because I hope that those who rely purely on their natural intelligence will be better judges of my views than those who believe only what they find in the writings of antiquity.’[52]

Discourse is a popular exposition of Descartes’ philosophy. His scientific work is Meditationes de prima philosophia (1641) and Principia philosophiae (1644), both written in Latin, though the latter was shortly afterwards translated into French.

Except for Galileo, no leading scientist of the seventeenth century wrote really popular works, leaving that to their disciples, usually scholars of high quality but lesser creativity. Thus Rohault popularized Descartes. Clarke, ’s-Gravesande, and Voltaire did the same for Newton.

The popularization of scientific results strongly influenced the common world view. The Western world view has changed radically since the Copernican revolution, though it took a long time before the idea of a moving earth became commonly accepted. In the seventeenth century even the majority of the learned remained convinced of the geocentric world view, and discussions about the validity of the heliocentric system continued deep into the eighteenth century.

At the didactic level, pupils must be introduced to the science they want to learn. It is a didactic aim to clarify theories, to make students understand them. Because a theory is an instrument, for instance to solve problems, the use of a theory must be exercised, such that the mastering of the language used is enlarged. On the one hand, a student learns to apply his own language to new problems, and on the other hand he extends his language, by learning and applying new words and expressions. In this way, a student not only gains clarity about a theory and its possible applications, but also about that part of reality with which the theory is concerned.

For a large part, medieval scholastic education consisted of the citation of authorities. In the discussion of a thesis, a long list of arguments taken from the literature was presented, both pro and contra the thesis, and in the end one’s own view was given. The best student was the one able to quote most citations, preferably from memory. Whereas Kepler’s Mysterium cosmographicum (1597) and Astronomia nova (1609) are scientific works, his Epitome astronomiae Copernicanae (1617-1621) is a summary and a textbook. Consequently, Kepler adopted the then usual form of question and answer, giving a systematic pedagogical order to his expositions, and avoiding biographical details.[53]

The turn of the tide came during the Copernican revolution. The French scholar Petrus Ramus advocated the educational value of visiting artisans, which he valued more than scholastic training. This means that he favoured the study of skills more than the study of bookish knowledge. Galileo also attacked the scholastic way of education.[54] He preferred to give his students new problems which they should try to solve, if possible with new arguments derived from common sense, observation, and experiment. In his Dialogue, Galileo’s spokesman Salviati challenges his opponent Simplicio to forget about Aristotle, and to use his own wits. This means that Galileo preferred to teach his students how to use theories, instead of teaching them accepted knowledge.

Demonstration experiments

Galileo’s discussions with a number of scholars from the university at Pisa about floating bodies (1611-1612) became famous.[55] The Aristotelian scholars argued that a piece of ice, like any other object, floats or sinks dependent on its shape, and they sustained their argument by quotations from Aristotle’s works. Based on experiments performed from his youth, Galileo showed the shape of a body to be irrelevant. Only its density determines whether it will be floating or sinking in water. He also mentioned an authority, namely Archimedes, but he sustained his argument by means of the hydrostatic balance. At the Grand Duke’s request, Galileo published his views in Discourse on bodies on or in water (1612). He made sure his argument was so clear that anybody able to read could understand it. Therefore, he wrote this treatise in colloquial Italian.[56] The hydrostatic balance was easy and cheap to construct, and anybody could repeat Galileo’s experiments.

We see that language, in order to show and clarify an argument, is not restricted to words. Experiments may also function to sustain an argument, to clarify it. Demonstration experiments in school physics have such a didactic function.[57] A similar didactic function can be allotted to illustrations, graphs, etc., and in particular to giving examples. Examples can never prove a theorem, but they can clarify it, and make it acceptable.


[1] For biographical details about the physicists and philosophers discussed in this book, see Gillispie (ed.) 1970-1980; Millar et al. 1996; biographies and encyclopaedias. Years of birth and death are not given in the text, but in the index of historical persons.

[2] Kuhn 1957, 1-2; Brown 1977, 111; Dijksterhuis 1950, 317 (IV:1); Toulmin, Goodfield 1961, 179; Cohen 1980, 3; Cohen 1994, 21-24.

[3] Copernicus’ Revolutionibus was published in 1543 (Nürnberg), 1566 (Frankfurt), and 1617 (Amsterdam). In several respects, the printed work deviates from the manuscript, which is preserved.

[4] Kant 1787 xiv, xxii. Cohen 1994, 24-27. See however Cohen 1985, 237-253.

[5]; Cohen 1994, 2010; Wootton 2015.

[6] Eamon 1994, chapter 6.

[7] Koestler 1959, part IV.

[8] On positivism, see Von Mises 1939; Kolakowski 1966; Suppe 1973.

[9] Clavelin 1968, 424-431, Drake 1978, xxi and Finocchiaro 1980, 159 doubt Galileo’s Platonism, contrary to Burtt 1924, chapter 3 and Koyré 1939. However, Galileo applied Plato’s literary form of dialogues to his most important works (see Galileo 1632, 12, 22, 89-90, 145, 158, 191-192), mistrusted phenomena (Galileo 1632, 256; see Feyerabend 1970), and in principle equated human intellect with that of God (Galileo 1632, 103). Most dialogues by Plato were translated in Latin only in the 15th century, see Gaukroger 2006, 89.

[10] Ravetz 1971, chapter 3, 4; Polanyi 1958, chapter 1, 4.

[11] Copernicus 1543, 24 (Preface).

[12] Popper 1972, 154.

[13] Dooyeweerd 1953-1958, I, 3; Stafleu 1980, 2002, 2011, 2015; Clouser 1991; Strauss 2009.

[14] Suppes 1957; Cohen, Nagel 1934.

[15] Popper 1959, 59: ‘Scientific theories are universal statements.’ Popper 1983, 33 identifies a theory with a hypothesis, but on pages 113, 178, 292, with a deductive system.

[16] Braithwaite 1953, 12, 22; Bunge 1967a, 51-54; 1967b, I, 381.

[17] Finocchiaro 1980, 311-331.

[18] Bunge 1967b, I, 391.

[19] Popper 1963, 317-322.

[20] Heath 1913, 299-310; Dreyer 1906, 136-148.

[21] Copernicus 1512, 59.

[22] Bunge 1967b, I, 65-72.

[23] Galileo 1610, 57; 1632, 334, 339-340.

[24] Galileo 1632, 369; Descartes 1637, 20.

[25] Bunge 1967a, 162; Stafleu 1980, chapter 3; 2015, chapter 3.

[26] Bridgman 1927, 5. See Hempel 1952, 39-50; 1965, 123-133, 141-146; 1966, chapter 7; Bunge 1967c; Suppe 1973, 18-20.

[27] Newton 1687, 1; Jammer 1961, 64-7; Cohen 1973, 335.

[28] Mach 1883, 237, 300.

[29] Galileo 1586, 134-140.

[30] Drake (ed.) 1957, 79-81.

[31] Newton 1687, 414.

[32] Kuhn 1962, chapter 9-10; Feyerabend 1970; Hesse 1974, 61-66; Suppe 1973, 199-208; Stafleu 1980, 25-27.

[33] Feyerabend 1975; Hanson 1958, 5-9, 18-19; see Brown 1977, chapter 6.

[34] Suppe 1973; Brown 1977; Weimer 1979; Glymour 1980, chapters 2, 3.

[35] Stafleu 2019b, Encyclopaedia of relations and characters.

[36] Polyani 1958; Cantore 1977.

[37] Suppe 1973, 102-109.

[38] Braithwaite 1953, 76.

[39] Suppe 1973, 16, 45ff, 66ff; Von Mises 1939; Bunge 1967c; Stegmüller 1979, 3-7.

[40] Seerveld 1964, 83; Stafleu 2011, chapter 1; 2015, chapter 14.

[41] Bunge 1967b, I, 47.

[42] Macey 1980, chapters 4, 5.

[43] Tarski 1944.

[44] MacIntyre 1967, 74, 92.

[45] Finocchiaro 1980, 311.

[46] Galileo 1632, 401-403.

[47] Galileo 1623, 237-238.

[48] Ziman 1968, chapter 6; 1976, chapter 5; 1978, chapter 2.

[49] Copernicus 1543, 27 (Preface).

[50] Newton 1687, 397.

[51] Drake 1970, 51.

[52] Descartes 1637, 77-78.

[53] Koyré 1961, 437-438.

[54] Galileo 1623, 270-273.

[55] Drake 1978, 169-205; 1970, chapter 8.

[56] Drake (ed.) 1957, 84.

[57] Heilbron 1979, 13.


 Chapter 2

Explanation and prediction

2.1. Physics and astronomy before Copernicus

Chapters 2 and 3 are concerned with two important functions of theories, to wit prediction and explanation. Prediction is the first and most obvious aim of any scientific theory. This is a consequence of the deductive character of any theory, of its kinetic foundation, deduction being the logical movement from one statement to another. Prediction may be characterized as the kinetic-logical function of a theory, to be distinguished from its physical-logical function, which would be to explain, for explanation is tied to a cause-effect relation of some kind, as will be seen.

The distinction between prediction and explanation and the corresponding struggle between instrumentalism and realism played an important part in the early history of astronomy until the seventeenth century. In the Athenian tradition, Aristotle’s Physics explained the structure of the cosmos in a realistic sense, whereas in its Alexandrian counterpart, Claudius Ptolemy’s Almagest, interpreted in an instrumentalist way, only provided more or less reliable predictions. Nicholas Copernicus and his adherents rejected this division of labour, introducing a realist mathematical astronomy, able to predict and to explain. Contrary to Ptolemy, Copernicus succeeded in explaining the retrograde motion of the planets and in calculating their distances to the sun relative to that of the earth. Therefore, it is a mistake to consider Copernicus’ theory to be equivalent to Ptolemy’s because of the relativity of motion. The problem of stellar parallax caused Tycho Brahe to propose a geocentric alternative to Copernicus’ heliocentric system (2.5). This turned out to be acceptable for instrumentalists, but was rejected by realists. Johann Kepler’s program was to combine astronomical prediction with physical explanation (2.6).

In chapter 2 we shall discuss the structural difference between prediction and explanation. It has played an important part in the history of astronomy, both in its pre-Copernican stage (2.1) and in Copernicus’ investigations (2.2). Next we discuss the logical structure of prediction and explanation (2.3) and its implications for Copernicus’ treatment of retrograde motion (2.4) and the size of planetary orbits (2.5). Finally we review Kepler’s views on explanation and prediction (2.6). In chapter 3 we shall explore some principles of explanation.

The physics of celestial motion

Ancient philosophy made a sharp distinction between explanatory physics and descriptive astronomy. In theoretical knowledge, Aristotle distinguished between metaphysics, also called theology or first philosophy, concerned with unchangeable and immaterial substances; physics or natural philosophy, about changeable things with independent existence; and mathematics, concerned with immovable spatial or quantitative properties abstracted from physical things.[1] In Greek Athens physics was interpreted in a realist sense. It was the study of the nature or essence of things, and was concerned with the rational form (Aristotle) or idea (Plato) lying at the foundation of each thing or living being. Astronomy, centred in Hellenistic Alexandria, was considered part of mathematics and was treated as an instrument for the calculation of observed planetary motions.

Following Plato, all Greek, Hellenistic and medieval philosophers adhered to the view that the perfect celestial bodies can only move uniformly in circular orbits concentric with the universe. On rational grounds Plato argued that celestial bodies move eternally without being disturbed.[2] This would only be possible in circular or rectilinear orbits. Clearly, the celestial bodies do not move in a straight line, which leaves circular motion. The perfect celestial bodies are fixed to transparent spheres, moving uniformly around the earth as their common centre. This Platonic doctrine was conceived to be purely theoretical, independent of observation, for the celestial phenomena ‘… can be apprehended only by reason and thought, not by sight …’[3]

In Athens, Plato’s disciples, Eudoxus, Calippus, and Aristotle, elaborated this view into a system of homocentric spheres.[4] Aristotle referred natural motion to the centre of the universe, towards or from the centre, or around the centre. The linear motions to and from the centre concern sublunary objects; the circular motion around the centre belongs to the heavenly bodies. The transparent spheres carrying the celestial bodies are homocentric, all having the same centre, which happens to coincide with the centre of the earth. This cosmology implied a separation of celestial and terrestrial physics, between perfect and imperfect motion, having far reaching consequences (3.1).

Three inequalities

After the establishment of this clear and rational system astronomers were left with the task of reconciling it with the irregularities in the actually observed planetary motions, aptly called saving the appearances. (Simplicius ascribed this expression to Plato).[5] The theoretical system was never questioned. In Plato’s idealistic philosophy, the phenomena were considered to be deceptive, unreliable, and transitory copies of the true and eternal ideas, according to which the world was made. Aristotle’s realism was less extravagant, but he, too, gave his eternal forms precedence to transitory phenomena (3.2).

The overwhelming majority of heavenly bodies, about thousand visible fixed stars, satisfy the doctrine of uniform circular motion perfectly well. In 24 hours they turn together around an imaginary axis through the celestial pole. The sun moves annually along the ecliptic, a circle having an angle of about 23.5o with the celestial equator. The other wandering stars move never far from the ecliptic. In a geocentric scheme, the moon is closest to the earth, followed by Mercury, Venus, Sol, Mars, Jupiter, and Saturn. These motions show some irregularities, called inequalities during the Middle Ages.

The first inequality concerns the seasons. The northern winter is several days shorter than the summer. If the sun would move uniformly in the ecliptic, winter and summer would be equally long.

The second inequality is the most spectacular, and played the most active part in Copernicus’ heliocentric theory. Apart from the daily motion, all planets travel roughly along the ecliptic, in the same direction as the sun. Except the sun and the moon, the planets show occasionally backward or retrograde motion – Mercury every 116 days, Venus 584, Mars 780, Jupiter 399, and Saturn every 378 days.[6] The inferior planets (below the sun), Mercury and Venus, are never seen far from the sun and show retrogradation during about half the time. The three superior planets move backward if and only if they are opposite to the sun, when the earth is just between the planet and the sun.

The third group of inequalities consists of all irregularities which remain if the other two inequalities are accounted for. The most excessive occur in the motions of Mercury and the moon.

In order to account for these irregularities, Eudoxus and Calippus devised a system of homocentric spheres, three or four for each planet. Aristotle doubled this number in order to neutralize the motions of one planet before constructing the spheres for the next one.[7] Aristotle’s system counted 55 spheres, six more than was necessary.[8] The outermost sphere was the primus mobile, the first mover. The medieval scholastics usually added another sphere called the heavenly empire, residence of God and the elected people. The transparent spheres carrying the celestial bodies are still homocentric, turning around the centre of the earth. This model supplied a qualitative description of the second type of inequalities, but could not account for the other two. It could not explain the varying brightness of the planets or the changing apparent size of the moon.

Ptolemy’ Almagest

More practical than the Greek philosophers at Athens, Hellenistic mathematicians centred in Alexandria have studied many possibilities to describe all inequalities, using observations and calculations handed down by Babylonian astronomers since about 1000 BC. This work of ages started about 350 BC, and culminated in the Syntaxis mathematica, mathematical composition. Its author was Claudius Ptolemy (Klaudios Ptolemaios), living about 150 AD in Alexandria.[9] It is the only surviving comprehensive ancient treatise on astronomy. Arab scholars called it respectfully Al maegeisty (the great book). By its Latin translation, circa 1175, Gerard of Cremona made it known in Europe as Almagest. It appears that Ptolemy falsified some of his observation results. Astronomers like Brahe and Kepler were aware of inconsistencies in Ptolemy’s data.[10]

Besides the mathematical and astronomical Almagest, Ptolemy wrote Hypotheses planetarum, a physical and Aristotelian explanation of celestial motion, and Apotelesmatika (astrological influences, also known as Tetrabiblios). It is the oldest surviving systematic account of astrology, as a craft developed some 400 years earlier. In Geographia Ptolemy described meridians and parallel circles to indicate the longitude and latitude of any place on earth. At the end of the fifteenth century, in Italy maps were drawn based on Ptolemy’s data. For reasons unknown, Ptolemy replaced Eratosthenes’ quite accurate measurement of the earth’s circumference by a much smaller value. Because he drew Eurasia larger than it actually is, his estimate of the distance between Spain and the Indies across the Atlantic was much too small, leading to Columbus’ mistake that in the Bahamas he had arrived in East-Asia (1492).

In order to arrive at a mathematical description of planetary motion, including its inequalities, Ptolemy’s Almagest applied three methods.

First he introduced an eccenter, assuming that the centre of the ecliptic, the sun’s path around the earth, does not coincide with the centre of the earth. This allowed him to take account of the unequal duration of the summer and the winter. Eccenters were also used in the description of the other planets’ motions.

Next Ptolemy applied deferents and epicycles. Now a planet does not itself move uniformly around the earth, but it travels on an auxiliary circle, called epicycle. The epicycle’s centre moves on the deferent, the real orbit around the earth. With this device, probably invented by Apollonius about 200 BC and applied by Hipparchus circa 150 BC, the second inequality (retrograde motion) could be accounted for, as well as the variation of the apparent cross-section of the planets, because of the varying distance of the planet to the earth.

Ptolemy’s third device is the equant, always used in combination with an eccenter. The motion of the planet or the centre of the epicycle is taken to be uniform, not, however, as seen from the earth, but as seen from the equant. The eccenter lies exactly halfway between the earth and the equant. This became known as the bisection of the equant. Kepler could explain it as soon as he established that the planetary path is an ellipse.

These three devices allowed Ptolemy to provide a fairly accurate description of the motion of the planets for ages to come. Later astronomers, both Arabic and European, have corrected and improved Ptolemy’s calculations, using new observations as data. But Ptolemy’s methods remained virtually unchanged up till Copernicus’ time.

Physics and astronomy

Ptolemy’s model did not consist of homocentric spheres, but of heterocentric circles. By applying circles instead of spheres, Ptolemy did not need to neutralize the motions of one planet before starting with the next, as Aristotle had done. He calculated the motion of each planet apart. The mathematical model was not intended to explain the inequalities, but to calculate these in order to make predictions possible. Contrary to physics, mathematical astronomy was interpreted in an instrumentalist sense. The eccenters, deferents, epicycles, and equants of Ptolemy’s theory were not realistically interpreted.[11] They were never considered to represent the real state of affairs. A faithful disciple of Aristotle, the twelfth-century Arab philosopher Averroes commented: ‘The Ptolemaic astronomy is nothing so far as existence is concerned; but it is convenient for computing the non-existent.’[12]

Belonging to physics, the homocentric spherical system of Eudoxus and Aristotle was considered by many to be a true and sufficient explanation of the cosmos. Ptolemy’s system of heterocentric circles, being a part of mathematical astronomy, was merely a useful instrument to make calculations. The former was believed to state certain essential truths about the heavens, while the latter better fitted observations. Most of the time the two theories could peacefully coexist, but occasionally, conflicts between enthusiastic partisans of the two theories could not fail to occur.[13]

The double truth

About 1300, the division of labour between physics and astronomy received a new accent. In the twelfth and thirteenth centuries, translations into Latin of the works of Aristotle, Ptolemy, and others became available in Western Europe, together with Arab comments. These manuscripts were eagerly studied at the universities, but contained many views contradicting Christian doctrines, giving rise to conflicts with the church. For instance, Aristotle taught the cosmos to be eternal and unchangeable, which clearly contradicts the Christian idea of creation. For this reason, during the early Middle Ages Aristotle was less popular than Plato, who in his dialogue Timaeus introduced the Demiurge, a divine craftsman creating the visible world according to eternal ideas.[14] Copernicus refers to this craftsman: ‘… the mechanism of the universe which has been established for us by the best and most systematic craftsman of all …’[15]. It became a recurrent theme in seventeenth-century mechanism.

Only in the thirteenth century, Aristotle became the most important philosopher, in particular at the university of Paris, although his controversial views were condemned by bishop Stephen Tempier in 1277. However, the work of scholars like Thomas Aquinas led to a synthesis of official theology with Aristotelian philosophy, including its physics. Since then, philosophy and physics were taught in the theological faculty of the medieval universities, whereas astronomy as one of the seven artes liberales belonged to the lower faculty of arts. The liberal arts (not to be confused with the arts practiced by artisans) were divided into the trivium (grammar, rhetoric, dialectic) and the quadrivium (geometry, arithmetic, music, and astronomy). Successful students received a master’s degree. Masters in the liberal arts could seek admission to the faculty of theology, medicine, civil law, or church law, eventually leading to a doctor’s degree.

The students and masters of liberal arts were free to discuss their views on natural affairs, provided they did not pretend these to be true. Jean Buridan and Nicole Oresme in the fourteenth and Nicholas of Cusa in the fifteenth century discussed Aristotle’s On the heavens, contemplating the logical possibility of a daily motion of the earth. They never considered the annual motion of the earth around the sun.[16] Because this is the most important feature of Copernicus’ theory, it is not tenable to consider e.g. Oresme a precursor of Copernicus.

However, as soon as the question arose whether the earth really moves, Buridan writes:

‘… but I do not say this affirmatively, but I shall ask the lords theologians to teach me how they think that these things happen.’

Oresme doubted the distinction between celestial and terrestrial matter, and presented many arguments in favour of the moving earth. Nevertheless, in the end he rejected its reality:

‘And yet all people, myself including, believe that the heavens move, and the earth not: Thou hast fixed the earth immovable and firm.’[17]

In general, in their comments on Aristotle’s works, the medieval scholastics did not question Aristotle’s views, but they investigated his proofs. Thus, Oresme argued that Aristotle’s proof of the immobility of the earth is wanting, but he did not really doubt it. The earthly motion, being contrary to Aristotelian cosmology and biblical texts was considered at most as an astronomical possibility, but never as a physical reality.

The clerical practice of the double truth provided the medieval scholars with a certain margin, within which they were free to investigate and discuss anything, if only they ultimately submitted themselves to the authority of the church.[18] But at the close of the Middle Ages, the authority of the church waned. With the Renaissance and the Reformation people demanded the right for themselves to decide what is true or false. Hence the practice of double truth became discredited. Copernicus and Kepler rejected it. Galileo became involved in trouble with the church because he refused to adhere to it (8.3). Only Descartes still made use of it, to hide his true feelings about Copernicanism (3.4, 8.4).

2.2. Copernicus’ return to Platonism

During the Renaissance, Aristotle’s On the heavens was by far the most influential book on physical cosmology. Copernicus’ contemporary Girolamo Fracastero defended a homocentric system consisting of no less than 79 spheres. On the other hand, Georg Peurbach and Johann Müller (Regiomontanus) were prominent adherents of Ptolemy’s mathematical astronomy, renewing his data and making predictions with a practical purpose. Besides astrological forecasts (in particular practiced in medicine), this concerned navigation and the calendar.

The needs of navigation became prominent with the voyages of discovery in the century before Copernicus started his work. The calendar introduced by Julius Caesar in 45 BC, being outdated in the sixteenth century, was the first motive for Copernicus to reform astronomy.[19] The annual fixation of Easter, determined by the first full moon in spring, and the adaptation of the length of the year with the help of leap days, required knowledge of the motion of the sun and the moon. The design of the Gregorian calendar (1582) applied Erasmus Reinhold’s Prussian tables based on Copernicus’ new calculations, but it did not rely on the assumption of the moving earth.

The medieval tradition drew a sharp line between Platonism and Aristotelianism with respect to the status of mathematics (2.1).[20] In Aristotle’s physics mathematics played a subordinate role. In the sixteenth century, neo-Platonist scholars like Simon Stevin started to apply mathematics in their study of natural affairs.[21] Especially in Italy (where Copernicus was a student), they were inspired by Archimedes, another Platonist. His works, published by Tartaglia in 1543, made a deep impression because of his mathematical treatment of physical problems. (Another work by Archimedes, on method, was discovered together with the two published by Tartaglia, got lost, and was found again in the twentieth century.[22]) Like all Renaissance scholars, Copernicus looked for inspiration and motivation in antiquity. He did not intend to be a reformer, but a restorer of Platonic mathematics. He wanted to eliminate the distinction between physics and astronomy ascribed to Aristotle. Returning to Plato’s perfect spheres, he intended to interpret these realistically. He succeeded in this respect only with respect to retrograde motion.

Commentariolus and Narratio prima

Because Nicholas Copernicus insisted on Plato’s uniform circular motion, he criticized Ptolemy and his disciples for the use of equants, ‘… an inequality about their centres – a relation which nature abhors.’[23] They ‘… have in the process admitted much which seems to contravene the first principles of regularity of motion …’[24] Copernicus searched for ‘… the chief thing, that is, the form of the universe, and the clear symmetry of its parts.’ He proclaimed:

‘… if the motion of the wandering stars are referred to the circular motion of the earth, and calculated according to the revolution of each star, not only do the phenomena agree with the result, but also it links together the arrangement of all the stars and spheres, and their sizes, and the very heaven, so that nothing can be moved in any part of it without upsetting the other parts and the whole universe.’[25]

Revolutionibus (1543) was preceded by Copernicus’ Commentariolus (Short sketch, circa 1512), and by Narratio prima (First story, 1540), written by Copernicus’ only pupil, Georg Joachim Rheticus.[26] The Commentariolus (complete title, possibly due to Tycho Brahe: Nicolai Copernici de hypothesibus motuum caelestium a se constitutis Commentariolus.Copernicus’ Commentariolus or Sketch of his hypotheses for the heavenly motions), was never printed during the Copernican revolution. It circulated as a manuscript during the sixteenth century, got lost, and was rediscovered and published at the end of the nineteenth century. Much more influential was Rheticus’ Narratio prima, published in 1540, 1541, 1566 (together with the second printing of Copernicus’ Revolutionibus), 1597, and 1621 (both together with Kepler’s Mysterium cosmographicum). It was intended to be followed up by Narratio secunda (Second story), which was never written.[27] Rheticus also wrote a theological defence of Copernicanism, long thought to be lost, but rediscovered and published by Reinder Hooykaas in 1984.

Commentariolus is a short outline of the new theory, without calculations. It announced seven hypotheses or axioms, first about the orbital centres:

1. There is no single centre of the celestial orbits or circles.

2. The centre of the earth is not the centre of the universe, but only of gravity and the lunar sphere.

3. All the spheres revolve around the sun as their mid-point, and therefore the sun is the centre of the universe.

The fourth axiom concerns the stellar parallax and the distance of the fixed stars (2.5). The fifth and sixth axioms proclaim the terrestrial motion:

5. Whatever motion appears in the firmament arises not from any motion of the firmament, but from the earth’s motion. The earth together with its circumjacent elements performs a complete rotation on its fixed poles in a daily motion, while the firmament and the highest heaven abide unchanged.

6. What appears to us as motions of the sun arise not from its motion but from the motion of the earth and our sphere, with which we revolve about the sun like any other planet. The earth has, then, more than one motion.

The final axiom concerns the retrograde motion of the planets (2.3).

Copernicus’ system is truly heterocentric, having two centres, the sun for the planets, and the earth for the moon. Whereas this was initially considered a flaw of the theory, Galileo’s discovery of Jupiter’s moons in 1609 became a reinforcement of the Copernican viewpoint:

‘… now we have not just one planet rotating about another while both run through a great orbit around the sun; our own eyes show us four stars which wander around Jupiter as does the moon around the earth, while all together trace out a grand revolution about the sun in the space of twelve years.’[28]

It showed that the concept of a planetary system was not an arbitrary and improbable alternative for the geocentric systems of Aristotle and Ptolemy. It also showed that celestial bodies can partake in two motions simultaneously. Shortly after, Kepler proposed to distinguish satellites from stars and planets.

In Aristotle’s cosmology, the sphere of the fixed stars was moved by the outermost sphere, called the unmoved mover, the primum mobile. Copernicus made the earth the primum mobile of apparent celestial motions.[29] In Copernicus’ theory, the earth causes the apparent daily motion of the fixed stars as well as the apparent retrograde motion of the planets. Neither Aristotle nor Copernicus considered the prime mover a physical cause. Rather, as a kinematic cause, it was intended to render motion intelligible. This is the most obvious novelty of Copernicus’ theory. In Copernicus’ two treatises, Commentariolus and Revolutionibus, the daily motion of the earth is not prominent. It is not his starting point, but rather a consequence of the annual motion of the earth. Still, the diurnal motion became the most controversial feature of Copernicanism.

The axioms allowed Copernicus to explain the second inequality, the retrograde motion of the planets (2.4). For the description of the other inequalities, Copernicus had to take recourse to Ptolemy’s methods, though he avoided applying equants. Copernicus’ system is, strictly speaking, not heliocentric but heliostatic – the sun stands still, the earth and the other planets move around an eccenter near the sun.

2.3. Prediction and explanation

Before Copernicus, the task of astronomy was to describe the motions of celestial bodies, in order to predict future positions of the planets, for the sake of the calendar, astrology, or navigation. The most obvious examples of theories with predictive power concern local motion. The law for the motion of a planet connects its position with the time parameter. Such a theory is Ptolemy’s system of deferents, epicycles, eccenters, and equants, developed in his Almagest. This computing machine did not pretend to explain celestial motion. It was only intended to calculate motions, in order to make predictions possible.

Another example is Galileo’s theory of ballistic motion.[30] Assuming that the motion of a cannon ball is a combination of two independent motions, Galileo proved its path of motion in a void to be a parabola. The two independent motions include the horizontal one, which is uniform, and the vertical motion, which is once more composed of a uniform motion and a uniformly accelerated one. Galileo succeeded in finding a functional connection between position and time, and between the horizontal and vertical coordinates, such that it became possible to predict the path of the projectile. Again, this theory was not first of all intended to explain, but to predict ballistic motion. Galileo expressly refrained from giving an explanation of its cause.

Laws of coincidence or correlations, enabling predictions, are called empirical generalizations, if they cannot be derived from any accepted theory.[31] They are found by generalization of observed coincidences. Nowadays statistical theories can be helpful in investigating coincidences, but such theories did not exist at the beginning of the seventeenth century. Kepler’s laws of planetary motion had the status of empirical generalizations before Newton incorporated these in his theory of gravity (2.6).

Explanation

An explanation is characterized by giving an intrinsicirreversible, and effective relation. This is sometimes called causality. In an explanation a cause-effect relation is proposed. This logical relation is a projection on the originally physical relation of interaction. Yet it does not necessarily refer to an original physical relation. We shall discuss several non-physical cause-effect relations.

An intrinsic relation is something more than a pure coincidence. We speak of a coincidence if we cannot explain it, or if we are sure that there is no explanation. Many events can only be partly explained. For instance, in a traffic incident, there is usually no intrinsic reason why the colliding cars should be simultaneously at the same place. But if at a certain place many accidents occur, people may start looking for some intrinsic reason.

In the case of Kepler’s third law, the size of the orbit (R) and the period (T) turn out to be related, because for satellites of the same central body, R3/T2 has invariably the same value. This leads to the supposition of some intrinsic relation between R and T, a causal relation. Kepler could not find it, and therefore his law remained a correlation for over sixty years.

Also, subsumption under a law or classification does not necessarily constitute an explanation. If we assume that all planets moving around the sun obey Kepler’s third law, and we discover a new planet, we can predict that it will also satisfy this law, but we have not explained why.

The irreversibility of an explanation means that cause and effect cannot be exchanged. In the same context, we cannot exchange the explanans, the starting point of the explanation, with the explanandum, what has to be explained. Making this exchange is a logical fallacy, called petitio principii or begging the question. It means that the explanandum is used as an argument in proof. The simplest case is aa, which is logically unobjectionable, but ineffective. It does not convince anybody of the truth of a.[32]

A common version of this fallacy is labelling or name giving.[33] In this case one’s reasoning has the form ab, but at a close inspection it turns out that is just another name of b. An example is the exchange between Salviati and Simplicio in Galileo’s Dialogue, Salviati asking ‘… what is it that moves earthly things downward’, and Simplicio answering: ‘The cause of this effect is well-known; everybody is aware that it is gravity’, whereupon Salviati replies: ‘… you ought to say … that everyone knows that it is called “gravity”’.[34]

Material implication, symbolized by ab, is a logical expression of the irreversibility of the cause-effect relation. However, not every ‘if … then …’ statement constitutes an explanation. An example of the latter is the law of inertia: If no unbalanced force acts on a body, then it moves uniformly along a straight path.

The most important requirement for an explanation is that it ought to be effective. It should not be able to explain everything. An explanatory theory which can explain both a certain state of affairs and its contrary is useless. An explanation has to distinguish between what has to occur, and what cannot occur. An explanation is imperative, peremptory.

In this respect an explanation differs strongly from a coincidence. Kepler’s third law makes R3/T2 a constant for all planets, but without further explanation. It could just as well say that R5/T2, or R3/logT, is a constant. An explanation (such as was given by Newton) is only effective if it shows that only R3/T2 is a constant, excluding any alternative.

Karl Popper has stressed the requirement that a theory be effective: natural laws insist on the non-existence of certain things or states of affairs.[35] He criticized Marx’ theory of history, Freud’s psycho-analysis, and Adler’s individual psychology, because these theories (according to Popper) can explain everything that happens in their field of research.[36] It is a bit remarkable that Popper nevertheless had a great admiration for ancient atomic theory, which can also be considered an example of a theory able to explain everything.[37]

Although Popper’s examples can be questioned, his idea is correct. It is the basis of so-called falsificationism. Popper said that we should never attempt to verify a hypothesis, but to falsify it. We should try to show that the explanation given is unsatisfactory. A theory must be vulnerable, as vulnerable as possible, hence as precise as possible. The chance that a theory is true should be made as small as possible. If it turns out to be impossible to falsify such a theory, then it is not only probably true, but also powerful.[38]

The possibility of falsifying a statement makes it a scientific or empirical one, according to Popper. He called falsifiability a demarcation criterion, because it demarcates scientific, empirical statements from non-empirical ones.[39] It separates empirical science from pseudo-science (like astrology), non-empirical science (logic, mathematics), and metaphysics. However, non-empirical statements are not necessarily false or meaningless. In this respect, Popper differed from the logical empiricists, who distinguished empirical (i.e., for them, observationally verifiable), tautological (i.e., logical or mathematical), and meaningless (e.g., metaphysical) statements.[40]

Explanation of phenomena and of single events

Above we distinguished between existential statements and law statements (1.5). As far as an explanatory theory is able to deduce both, it can explain laws as well as singular events. Scientists pay most attention to the explanation of phenomena, i.e., repeatable events and reproducible experiments. In the next section, for example, we shall discuss the phenomenon that retrograde motion of a planet always coincides with opposition of this planet, the earth and the sun. Such a phenomenon has a lawful character. Hence, the explanation of phenomena means the explanation of laws.

The explanation of a single event may proceed by reference to a theory proving that if certain initial conditions are satisfied, the event necessarily occurs. Next it may be required to explain why these initial conditions occurred. It is certainly not sufficient to point out that the event to be explained is just an instance of a phenomenon. For example, if somebody asks why tonight’s retrogradation of Mars coincides with tonight’s opposition of Mars and the sun, he will probably not be satisfied by the mere statement that this is always the case. Actually, this answer could have been given by Ptolemy, because he knew the phenomenon. He was able to predict any single instance of the coincidence, but he could not explain it. Only Copernicus’ theory is able to explain the phenomenon, including every single occurrence of it, if it is clear that the initial conditions are satisfied.

2.4. Retrograde motion

Copernicus took the Platonic doctrine of circular uniform motion seriously,[41] and in his theory it plays a much more important part than in Ptolemy’s. Copernicus used the principle of circular motion to explain the movement of each planet around the sun, the rotation about its axis, and even gravity. Because the earth is a planet, each planet should be attributed terrestrial properties. Hence Copernicus assumed gravity to be directed to the centre of each planet, instead of to the centre of the universe, as Aristotle taught.[42] The planet’s rotational motion about its own axis is, according to Copernicus, a consequence of its spherical shape.[43] Hence, Copernicus used geometrical arguments to explain physical states of affairs.

In the present section we shall discuss retrograde motion, the second inequality, with respect to the superior planets. For the inferior planets, the argument is essentially the same, but it differs in some details.

Retrograde motion and opposition

In Ptolemy’s geocentric system it is a coincidence that retrograde motion always occurs if the planet is in opposition to the sun. For the superior planets, Mars, Jupiter, and Saturn, this occurs if the earth is between the sun and the planet, and the planet is seen in its summit during midnight. For the inferior planets, Mercury and Venus, retrograde motion occurs if the planet travels between the earth and the sun, hence it is more difficult to observe.

For the description of this phenomenon, Ptolemy needed a deferent and an epicycle for each planet. By a suitable choice of the relevant parameters (such as the periods of revolution and the relative size of the circles), it is possible to reconstruct the retrograde motion such that it occurs at the right time, i.e., when the planet is in opposition. Simultaneously the planet is closest to the earth, which agrees with the fact that its brightness is a maximum during retrogradation.[44] However, this is not an explanation, and it was never intended to be so. Ptolemy’s system would have worked just as well if the backward motion did not coincide with opposition and maximum brightness. Ptolemy did not use this coincidence to explain the variation in brightness, because such an explanation would have given a completely wrong value for the variation of the apparent magnitude of the moon and of Venus, as long as one does not realize that these are dark bodies reflecting light from the sun. Galileo was the first to apply this insight.

Copernicus, too, needed deferents, epicycles, and eccenters to give an accurate description of celestial motions, but his basic idea differed from Ptolemy’s. He assumed that the planets, including the earth, move around the sun with angular velocities decreasing with increasing distance to the sun. Without any further hypothesis, Copernicus could not only explain the phenomenon of retrograde motion, but also why it coincides with opposition and maximum brightness. ‘All these phenomena proceed from the same cause, which lies in the motion of the earth.’[45]

The apparent retrograde motion occurs when the earth, moving between the sun and the planet, overtakes the planet. At that moment, when the planet is in opposition, it is closest to the earth, and hence its apparent magnitude is a maximum.[46]

The vulnerability of Copernicus’ theory

Copernicus’ theory differs from Ptolemy’s because the coincidence of retrograde motion, opposition, and maximum brightness is proven as a deduction from his premises. It is necessary; it could not have been otherwise. If ever a new planet would be discovered displaying retrograde motion without simultaneous opposition, Copernicus’ theory would be falsified, which is not the case with Ptolemy’s theory. Copernicus’ theory is more vulnerable, because it explains something that Ptolemy’s does not.[47]

Whereas in Ptolemy’s theory opposition merely coincides with retrograde motion and vice versa, Copernicus’ explanation is irreversible. In his theory it makes sense to state that opposition is the cause of the observed retrograde motion, whereas it does not make sense to state that retrograde motion causes opposition. This causal relation has no physical character, however, but a kinetic one. No interaction between opposition and retrograde motion is involved. The real motion of the earth and the planets causes the apparent backward motion, observed from the earth.

This means that Copernicus had realized one of his Platonic ideals. The deceptive, apparent, observed retrograde motion was explained by the cooperation of real, ideal, uniform circular motions.

Prediction of phenomena

When philosophers speak of prediction, they mean sometimes something different from what we discussed above. It is said that a new theory must obtain the prediction of new phenomena, besides the explanation of well-known ones. For instance, Imre Lakatos states that Copernicus’ theory was able to predict several new phenomena, and had to be preferred for this reason.[48]

On closer inspection, this is questionable. Copernicus was able to explain a number of well-known phenomena, but he was less fortunate with respect to new ones. Like Brahe after him, he failed to predict the phases of Venus and Mercury (2.7). But he did predict the size of the planetary orbits (2.5).

Copernicus did predict the stellar parallax. ‘The apparent retrograde and direct motion of the planets arises not from their motion but from the earth’s. The motion of the earth alone, therefore, suffices to explain so many apparent inequalities in the heavens.’[49] This is the same effect as the retrograde motion of the planets, an apparent motion, caused by the real motion of the earth. Copernicus knew that the stellar parallax was not observable, and as a Popperian-before-the-fact, Tycho Brahe rejected the idea of the moving earth for this reason (2.5). According to Popper, if a theory is falsified, one should not take recourse to ad hoc hypotheses to save the theory. He calls this a conventionalist stratagem (10.1).[50] Copernicus did just that, by assuming that the fixed stars are so far away that the parallax was undetectably small. By this auxiliary hypothesis, he explained the non-occurrence of a phenomenon predicted by his theory.

This kind of prediction is more related to explanation, the establishment of a causal relation, than with prediction based on established coincidences. Carl Hempel and Rudolf Carnap supposed that explanation has the same structure as prediction.[51] This may be true for the kind of prediction discussed in the present section, but not for predictions merely based on coincidence. If Hempel were right, it would be inexplicable why the Copernicans preferred the Copernican theory above the Ptolemaic one. They did so because Copernicus could explain states of affairs which in the Ptolemaic system were merely coincidences.

2.5. The size of the planetary orbits

The transition from the geocentric world view of Aristotle and Ptolemy to Copernicus’ heliostatic system is sometimes assumed to be merely a shift of the origin of the coordinate system. This would be true if we were only concerned with the motions of the sun, the moon, and the earth. In that case it would not matter if these motions were described with respect to the sun or to the earth, or even to the moon, for that matter. For this reason, heliocentrism did not influence the reformation of the calendar, effectuated in 1582.

The most important difference concerns the motion of the superior and inferior planets. In Ptolemy’s system, they move around the earth, and in Copernicus’ system around the sun. From an astronomical point of view this is more important than the question of whether the earth moves or not. Tycho Brahe was the first astronomer to realize this.

Tycho Brahe’s system

Even if they did not accept Copernicus’ heliocentric system, sixteenth-century astronomers recognized Copernicus’ ability as a mathematician. They admired his trepidation theory, explaining an effect (the oscillation of the precession of the equinoxes) that Tycho Brahe proved to be spurious, and they used his calculations to reform the calendar.[52] But apart from Maestlin and Kepler, before 1600 no professional astronomer became a Copernican, not even Tycho Brahe, because of his excellent observations the most important astronomer between Copernicus and Kepler.[53] Tycho took Copernicus’ explanation and application of retrograde motion seriously. He recognized Copernicus’ use of it to calculate the planetary distances from the sun as an example of the phenomenon of parallax, which astronomers apply to determine astronomical distances.

If one looks at an object, its position relative to a distant background is largely dependent on the position of the observer, as is experienced from a moving ship or carriage, with the perspective of trees along the river bank or the road side changing more or less rapidly with respect to the horizon. Astronomers speak of parallax if a difference in position of a celestial body is determined from the viewpoint of two observers, or of one observer at two positions. If one knows their mutual distance (called the basis), the measured parallax leads to the distance of the observed object. Its accuracy increases when the basis is chosen larger. This method is known from antiquity, and was used to estimate the distance of the moon, and even of the sun. The problem is the choice of the basis.

The most obvious is to make simultaneous observations from different positions on earth. For this purpose one has to determine accurately the mutual distance, and one has to know the time at both places. However, until the eighteenth century, the distance between cities was only approximately known. Tycho Brahe was a pioneer in the application of triangulation, the method of determining geographical distances.[54] The difference in longitude, the angular distance along a terrestrial meridian, is not difficult to measure, but measurement of the difference of latitude between two places requires accurate clocks, which were not available before the end of the seventeenth century. This also concerns the second requirement. The occurrence of a lunar eclipse, if visible, can be observed anywhere on earth simultaneously. Therefore this is a good occasion to measure parallax, if the problem of the distance is solved. In the seventeenth century the transit of Venus or Mercury across the sun if observed from different positions on the earth was recognized as a method to determine the distance of the earth to the sun, the so-called astronomical unit, but reliable values had to wait till the eighteenth and nineteenth centuries.

Daily parallax

The second method is to determine parallax from the same observatory, but at different times. If taken in one night, this is called the daily parallax. In between the earth is displaced because of its rotation (or the firmament has rotated), causing parallax. This must be corrected for the motion of the celestial body concerned. Applying this method, Tycho Brahe measured the parallax of the moon, the sun, and of Mars (during a retrograde motion, when Mars is closest to the earth). The parallax of the sun turned out to be so small that the conclusion should have been that it is not measurable. Unfortunately one was not very critical at the time, such that the astronomical unit, the distance between the earth and the sun, was underestimated until the end of the seventeenth century. Also Tycho’s measurement of the distance of Mars was unreliable. In his De stella nova (on the new star of 1572) he stated that its lack of observable parallax meant that it was beyond the sphere of Saturn, proving that the firmament, the sphere of the fixed stars, is not unchangeable, as was stated by Aristotle.

Tycho observed the motion of a comet, and determining its parallax he found that its distance surpassed that of the moon. He concluded that comets are not meteorological phenomena occurring in the sphere of fire, the final terrestrial sphere below the lunar one. He argued that comets can freely move across Aristotle’s transparent spheres. Since then, philosophers tried to replace the crystalline spheres by liquid spheres, until Descartes proposed his vortex theory of celestial motion (3.4). Newton did away with any sphere surrounding the sun or the planets by showing that the planets do not move in their orbits because of being dragged, but because of gravity.

Annual parallax

Annual parallax is caused by the motion of the earth (the site of all terrestrial observatories) around the sun. Now the basis is the diameter of the earth’s orbit around the sun, twice the astronomical unit. Copernicus explained the retrograde motion of the superior planets as an effect of annual parallax (2.3). He determined the distance of the planets by measuring their position relative to the fixed stars in the course of a year, their retrograde motion (2.4). This is only possible by assuming that the planets move around the sun. Taking the basis as twice the astronomical unit, he avoided the problem that the distance to the sun was not accurately known. Tycho showed that it is not necessary to assume that the earth moves, as long as one accepts that the planets move around the sun.

The annual motion of the earth should cause a stellar parallax, an apparent motion of the fixed stars similar to the planets’ retrograde motion. Because Copernicus knew that the stellar parallax was not observable, he conjectured an ad hoc hypothesis: the distance to the stars is much larger than the distance to Saturn. ‘The ratio of the earth’s distance from the sun to the height of the firmament is so much smaller than the ratio of the earth’s radius to its distance from the sun that the distance from the earth to the sun is imperceptible in comparison with the height of the firmament.’[55]

Tycho’s alternative system

Tycho easily recognized the advantages of Copernicus’ theory, but he rejected the earth’s motion, for various reasons. The most important were astronomical, in particular the lack of stellar parallax. Tycho rejected Copernicus’ explanation based on the ad hoc assumption that the fixed stars have a very large distance. From the apparent brightness of the fixed stars, Tycho calculated that according to Copernicus’ theory each star would be larger than the sun, but Galileo proved his reasoning wrong.[56] As a physical argument, Tycho mentioned that the motion of the earth should be detectable in the fall of bodies or in projectile motion. Copernicus discussed this argument, but his defence against it was weak.[57] Galileo used his theory of motion to refute Tycho’s objections.

Tycho also repeated the standard theological argument. The motion of the earth, if literally interpreted, would contradict the Bible teaching that the earth stands still.

With respect to the universe at large, Simon Stevin observed that it made little sense to distinguish between Copernicus’ heliocentric and Tycho’s geocentric system.[58] If the size of the terrestrial orbit is negligible, it can be held that the earth is still the centre of the universe. It is a more relevant distinction that the sun is the centre of the planetary system (not of the universe), than whether the earth is moving around it (as the true Copernicans held), or is at rest (as Tycho and the later Tychonians maintained). Neither Kepler nor Galileo agreed with this view.

During the Copernican revolution, the stellar parallax which made it necessary to enlarge the universe considerably in order to explain that stellar parallax was not observed, counted more against than in favour of Copernicus’ theory. Aristotle rejected the motion of the earth using the stellar parallax as an argument.[59] Galileo called it the most subtle argument against the Copernican position which can be found.[60] It had to postulate an incomprehensible immense empty space between the spheres of Saturn and of the fixed stars. All these arguments favour Tycho Brahe’s system, accepted by many seventeenth-century instrumentalists. Only in 1838, nearly three centuries after Copernicus’ death, Friedrich Wilhelm Bessel measured the stellar parallax of the nearby star 61 Cygni, applying much better instruments than those available in the sixteenth and seventeenth centuries. He found that the distance of the star is about 240 times the distance from Saturn to the sun, confirming Copernicus’ conjecture.

In several respects the Copernican system is significantly more complicated than Tycho’s. By assuming that the earth moves, all results of observation, necessarily taken from the earth, are more difficult to interpret. Kepler had to spend a great deal of his time determining the motion of Mars as seen from the sun (2.7).[61] The description of retrograde motion is simpler from Tycho’s viewpoint than from the Copernican one.[62] The assumption that the earth moves with a double motion causes many conceptual difficulties for anybody not accustomed to this counter-intuitive idea.[63]

The power of a theory

Both theories, Copernicus’ and Tycho’s, enabled them to make use of the observed size of the retrograde motion (the angular distance of the stationary points, where the direction of the apparent motion is reversed) in order to determine the distance of the planets to the sun.[64] This possibility (not mentioned by Rheticus) is absent in Ptolemy’s theory.

Ptolemy was aware of his inability to determine the size of the planetary orbits. About the three superior planets he assumed that Mars is closest to the earth, followed by Jupiter and Saturn. The argument for this choice is the observed periods of revolution of the deferents. For the sun this is 1 year, for Mars 1.9 years, for Jupiter 12 years, and for Saturn 29 years. The reason for Ptolemy’s ordering is that in ancient astronomy the planets move most unlike the daily motion of the fixed stars the farther they are from these stars. This argument is useless with respect to the inferior planets, because the periods of their deferents are the same as that for the sun. Because Mercury needs 88 days for its epicyclic motion and Venus 224 days, Ptolemy assumed rather arbitrarily that Mercury’s position is between Venus and the moon, which period of revolution is 27 days. Copernicus stresses the arbitrariness of Ptolemy’s and other’s order of the planetary orbits.[65]

This argument provided Ptolemy only with the order of the planets, not with the size of their orbits. If Copernicus would have followed the same reasoning, he would have placed Mercury closest to the sun, followed by Venus, Terra, Mars, Jupiter, and Saturn. Actually, he found precisely this order, but on the basis of a much stronger argument, which supplied him not only with the order, but also with the relative size of the orbits. In this respect, the superiority of Copernican astronomy over the Ptolemaic one is obvious.[66]

Copernicus’ argument is that the apparent retrograde motion is a projection of the actual motion of the earth around the sun. (In Tycho’s system, it is a projection of the motion of the sun around the earth.) This means that the apparent size of the backward motion is inversely proportional to the distance from the planet to the sun. It allowed him to calculate for each planet the distance to the sun in proportion to the distance between the earth and the sun. His results were surprisingly accurate: for Venus, Mars and Jupiter within 0.5% of modern values, for Mercury and Saturn within 3-4%.[67]

From the relative distances of the planets to the sun the size of the apparent retrograde path is explained.[68] This is an example of an explanation based on a spatial rather than a physical cause-effect relation, and it is irreversible. From the observed distance of the stationary points, Copernicus could calculate the size of the planetary orbits, but not explain them. On the contrary, he explained the size of the observed retrograde motion from the size of something he did not know beforehand, that is, the relative size of the planetary orbits.

Copernicus also calculated the relative orbital speeds of the planets. Ptolemy had tacitly assumed this speed to be the same for all planets, but Copernicus’ theory shows it to have a lower value if the distance to the sun is larger. This was beautifully confirmed by Kepler’s second law.

Simultaneously, the theory became more vulnerable in this way. For suppose someone had means to determine the distance from the planets to the sun (or their orbital speeds) independently. This could be of consequence for Copernicus’ theory. Ptolemy’s theory would be insensitive to such independent measurements.

2.6. Kepler on explanation and prediction

After Tycho Brahe, Johannes Kepler became the most important astronomer of Europe.[69] Even more than Tycho, he was deeply impressed by the power of Copernicus’ theory to bring about the dimensions of the planetary system. Already as a student at Tübingen and influenced by his teacher Michael Maestlin, he became convinced of Copernicus’ views. In 1594, at the age of 23, Kepler was appointed professor of mathematics at Graz. There he started an ambitious program of combining astronomical prediction and physical explanation in a realistic way. He developed a theory about the architecture of the planetary system, which he published in his first book, Mysterium cosmographicum (World mystery, 1597).[70] The book caries the date of 1596, but was published in 1597 and  again in 1621, with many additional notes. Kepler tried to explain two Copernican results, the number of planets, and the size of the planetary orbits.[71] Rheticus exclaimed: ‘Who could have chosen a more suitable and more appropriate number than six?’[72], but Kepler wanted to find a better explanation.[73]

Kepler started his investigation by constructing an equilateral triangle’s inscribed and circumscribed circles, determining the proportion of their radii, and observing that it was close to the proportion of the orbital dimensions of Jupiter and Saturn.[74] Now he constructed within the smallest circle a square, and its inscribed circle, then a pentagon, and so on. This did not lead to anything, because the proportions of the circles did not agree with those of the planetary orbits, and because it yielded an infinity of orbits.

Kepler had more success when he applied the same prescription to three-dimensional polyhedrons. Shortly before Plato, Greek mathematicians had discovered that only five regular polyhedrons exist with, respectively, four, eight, or twenty triangular faces, six square faces, or twelve pentagonal faces (3.1).[75] Kepler constructed these five bodies such that the circumscribed sphere of one was the inscribed sphere of the next. This yielded six spheres, one for each planet. It solved Kepler’s first problem, why the number of planets is six.[76]

Next, he calculated the relative size of the radii of these spheres. This yielded five independent values, three of which agreed fairly well with the values of the planetary orbits calculated by Copernicus. The value relating the smallest two spheres was adjusted by changing the model, and the disagreeing proportion of the largest two spheres Kepler waved aside.[77]

Kepler’s spatial explanation was never taken seriously except by himself. However, the publication of his book drew the attention of Tycho Brahe, who was in search of an able assistant to work on the observations made during the past decades. Tycho recognized Kepler’s mathematical genius, and invited him to Prague, where in 1600 the lifework of Kepler began, the attack on the motion of Mars, culminating in the book rightly called Astronomia nova.

The motion of Mars

From 1600 to 1612 Kepler worked at Prague, then the capital of the German empire, first as Tycho Brahe’s assistant, next as his successor as imperial astronomer at the court of Rudolph II.

For a period of about twenty years, Tycho had observed positions of Mars and other celestial bodies, with a much better accuracy than ever been achieved before. Kepler became charged with the description of the motion of the planet Mars with the help of Tycho’s system, but he also applied Ptolemy’s and Copernicus’ theories.[78] During the process, he found the first two laws of planetary motion, demonstrating that the planets move at a varying speed in elliptical orbits. As a consequence, Kepler broke away from uniform, circular motion, one of the fundamental ideas of Platonism, and defended by Copernicus, Galileo, and Descartes. Kepler considered the varying speed of the planets as a deviation from uniform circular motion, which he still considered natural.

Kepler called himself a Copernican, because he adhered to the idea of the moving earth. In one respect, Kepler was even more Copernican than Copernicus. To call Copernicus’ theory heliocentric is not entirely correct. The centre of Copernicus’ system is slightly displaced from the sun’s position (2.2). The earth and the other planets did not turn around the true sun, but about this centre, the mean sun. This is especially relevant for the motion of Mars, because its orbital plane would make a varying inclination of nearly two degrees with the earth’s orbital plane. According to Copernicus, the secant of these two planes passed through the mean sun. Kepler corrected this. Right from the start of his study of the motion of Mars, he placed the sun in the centre, and hence the secant of the two orbital planes went through the true sun. This led to a simplification of Copernicus’ calculations, because now the inclination of Mars’ orbit turned out to be constant.[79] It confirmed that Mars moves around the sun. As a consequence, Kepler could consider the real sun in the centre both as his point of reference and as the physical cause of variations in the planetary motions.

In order to refer the observations to the sun, Kepler had to make a careful investigation of the motion of the earth, from which all observations are made. Nobody before him had realized that. Kepler used an extremely ingenious and original procedure, consisting of mentally transporting himself to the planet Mars, and observing the motion of the earth from that viewpoint. In order to eliminate the motion of Mars itself, Kepler selected about ten observations from those made by Tycho during circa twenty years, taking his data 687 days apart. This is the period of Mars in its orbit, such that he disposed of ten observations of the terrestrial motion from the same Martian point of view.[80] He confirmed that the earth’s motion is no more uniform than that of Mars. The speed at the two extreme points (perihelion and aphelion) appeared to be inversely proportional to the distance to the sun. He extrapolated this to any point on the orbit, finding his second law, the area law: in equal intervals each planet traverses equal areas, as measured from the sun (1602).[81] This is now considered an instance of the law of conservation of angular momentum, which was not formulated before the eighteenth century.[82]

It turned out to be much more difficult to find the first law. From 1600 to 1606, Kepler tried to describe the motion of Mars, applying theories of Ptolemy, Copernicus, and Brahe. Ultimately he had to admit his failure. Uniform circular motion being the common foundation of the three theories, Kepler had to conclude that planetary motion is neither circular nor uniform. The orbit of a planet is an ellipse, with the sun in one of its focal points (1605).[83] Although Kepler proved this law only for Mars, he did not hesitate to apply it to the other planets as well.

These two laws were published in Kepler’s Astronomia nova (1609). Ten years later he published the third or harmonic law: the cube of the relative distance (R) from a planet to the sun is proportional to the square of its period (T).[84] Later on, the same law turned out to be valid for the moons of Jupiter, discovered by Galileo (1609), and those of Saturn, discovered by Huygens and Cassini (about 1660). Only the value for R3/T2 turned out to be different in the three cases. It therefore depends on a property of the central body, which Newton in 1687 proved to be the central body’s mass (9.2).

From Mysterium cosmographicum via Astronomia nova to Harmonice mundi

The full title, ‘New astronomy, based on causes, or celestial physics, expounded in commentaries on the motion of the planet Mars’, expresses the Copernican program to end the division of labour between explanatory physics and predictive astronomy. Copernicus, Kepler, Galileo, and Newton not only wanted to design predictive theories, but first of all they wished to have realistic theories with explanatory power. Kepler tried in vain to explain the motion of Mars by a magnetic force exerted by the sun. His laws remained empirical generalizations until Newton incorporated them in his theory of gravity. In contrast, the predictive power of Kepler’s laws became immediately clear to professional astronomers after Kepler’s publication of Tabulae Rudolphinae (Rudolphine tables, 1627), based on Tycho’s observations.

Kepler’s laws, though not constituting a coherent explanatory theory, have predictive power. With the help of the first two laws the positions of the planets can be predicted much more accurately than was possible either with Ptolemy’s or with Copernicus’ system, and also the third law can be used to make predictions. For instance, if a new satellite is discovered near a planet having already at least another one, one can predict its period from its relative distance, and vice versa.

Kepler’s first law is typically Copernican insofar as it constitutes a heliostatic system. The second law contradicts Copernicus’ Platonic idea of uniform circular motion, more so than the first law. Even Copernicus had contemplated the possibility that the planet’s orbit was elliptical. (This can only be found in the manuscript of Revolutionibus, where it is crossed out).[85] But the third law is Kepler’s most Copernican achievement. Although it does not follow from Copernicus’ theory, it can only be found in a system, in which the planets are supposed to move around the sun. Only then the relative distances from the planets to the sun can be determined (2.5).

In Mysterium cosmographicum (1597) Kepler thought to have revealed the secret harmony of the universe. Even after the discovery of the laws which made him famous, Kepler remained faithful to his juvenile work, and he felt greatly relieved to learn that the four new stars discovered by Galileo in 1609 were satellites of Jupiter, not of the sun.

Although Kepler knew that the original model did not tally with the observed facts, he remained faithful to the idea throughout his life. When he discovered his planetary laws, he changed his spheres into shells, thick enough to accommodate the elliptical orbits. He related the sizes of the orbits and their varying speeds to musical harmonies in Harmonice mundi (World harmony, 1619). This remarkable hodgepodge of good mathematics, sound physics, and metaphysical speculation consists of five books. The first two contain an extensive geometrical study of many kinds of polyhedrons. The third book deals with musical harmony, the fourth with metaphysics, psychology, and astrology.

The fifth book is mainly astronomical, but the elliptic law is not mentioned, the area law only once, and then not even correctly.[86] However, its most important discovery is Kepler’s third or harmonic law, relating the periods of the planets’ motion to the size of their orbits. In a poetic mood, Kepler wrote:

‘The thing which dawned on me twenty-five years ago before I had yet discovered the five regular bodies between the heavenly orbits …; which sixteen years ago I proclaimed as the ultimate aim of all research; which caused me to devote the best years of my life to astronomical studies, to join Tycho Brahe and to choose Prague as my residence – that I have, with the aid of God, who set my enthusiasm on fire and stirred in me an irrepressible desire, who kept my life and intelligence alert, and also provided me with the remaining necessities through the generosity of two Emperors and the Estates of my land, Upper Austria – that I have now, after discharging my astronomical duties ad satietatum, at last brought to light … Having perceived the first glimmer of dawn eighteen month ago, but only a few days ago the plain sun of a most wonderful vision – nothing shall now hold me back. Yes, I give myself up to holy raving: I have robbed the golden vessels of the Egyptians to make out of them a tabernacle for my God, far from the frontiers of Egypt. If you forgive me, I shall rejoice. If you are angry, I shall bear it. Behold, I have cast the dice, and I am writing a book either for my contemporaries, or for posterity. It is all the same to me. It may wait a hundred years for a reader, since God has also waited six thousand years for a witness …’[87]

If this reader was to be Newton, Kepler’s discovery of the harmonic law had to wait nearly seventy years. Descartes knew that the planetary orbits are not circular,[88] but like Huygens he rejected Kepler’s elliptical orbits. In 1666 Giovanni Borelli was (after Jeremiah Horrocks) the first astronomer to accept Kepler’s laws[89], and only about 1670 also Hooke, Halley, and Wren took Kepler’s work seriously.  

Like Kepler, Christiaan Huygens was convinced that the solar system could contain only six planets. After his discovery of Saturn’s satellite Titan, he thought this completed the number of moons, it being the sixth after the earth’s moon and the four moons of Jupiter, discovered by Galileo. In his Cosmotheoros (1698) Huygens still speculated about a world harmony.[90]

Even before Kepler, Copernicus stressed the harmony of his system in the famous neo-Platonic confession:

‘In the middle of all is the seat of the Sun. For who in this most beautiful of temples would put this lamp in any other or better place than the one from which it can illuminate everything at the same time? Aptly indeed is he named by some the lantern of the universe, by others the mind, by others the ruler. Trismegistus called him the visible God, Sophocles’ Electra the watcher over all things. Thus indeed the Sun as if seated on a royal throne governs his household of Stars as they circle around him … We find, then, in this arrangement the marvellous symmetry of the universe, and a sure linking together in harmony of the motion and size of the spheres, such as could be perceived in no other way. .’[91]

In the same spirit, Copernicus’ disciple Rheticus wrote:

‘But if anyone desires to look either to the principal end of astronomy and the order and harmony of the system of the spheres or to ease and elegance and a complete explanation of the causes of the phenomena, by the assumption of no other hypothesis will he demonstrate the apparent motions of the remaining planets more neatly and correctly. For all these phenomena appear to be linked most nobly together as by a golden chain; and each of the planets, by its position and order and every inequality of its motion bears witness that the earth moves …’[92]

However, for the sake of justice, we should compare the obvious beauty of the sun-centred universe with the no less obvious harmony of Aristotle’s geocentric cosmos. There is really no reason to reproach the Aristotelians for not being impressed by Copernicus’, Rheticus’, or Kepler’s lyrics. Copernicus replaced Aristotle’s harmony of homocentric spheres by a system that was truly heterocentric (having two centres, the earth, as the centre of the lunar orbit, and the sun), more so than Ptolemy’s system which only introduced heterocentricity for the sake of calculations.

The advantage of Copernicus’ system was not its harmony, but his ability to explain retrograde motion, and to calculate the size of the planetary orbits. Nevertheless, aesthetically appealing theories cannot fail to be very convincing. ‘The true and the beautiful are one’, Galileo said,[93] and many scientists share this opinion, even though harmony can never be the only argument for accepting a theory.

The fundamental distinction between prediction and explanation

This concludes our discussion of the distinction between the predictive and explanatory functions of a theory. Prediction is based on a coincidence of pairs of properties of some kind of events, and is symmetrical in this respect. Explanation is based on a logical cause-effect relation, which is by its nature asymmetric, and should be effective.

A predictive theory like Ptolemy’s needs not be explanatory, and an explanatory theory like Kepler’s model based on the five regular polyhedrons needs not be predictive. Prediction and explanation are irreducible functions of a theory, because their logical structure is respectively based in two mutually irreducible aspects of human experience, motion and interaction. These we shall discuss in the next chapter.

Not everybody accepts that theories should do more than predict, that theories should have explanatory power. The view that the only function of a theory is to make predictions is called instrumentalism. Before Copernicus, astronomical theories were interpreted in an instrumental sense. Instrumentalism was attacked by Copernicus, and defended by theologians like Andreas Osiander and Robert Bellarmine (8.2-8.3). It was hold by Mach and the logical-empiricists, and criticized by realists.[94]

Theories and arguments differ in a logical sense, because theories have a kinetic foundation in deduction, and arguments have a physical foundation in their logical force and weight (1.5). In the present chapter I stated that prediction is a kinetic-logical function of theories, explanation a physical-logical one. It means that theories are more neutral with respect to predictions than with respect to explanations. In particular instrumentalists stress that a theory can be used to make predictions, even if its basic axioms are false. In this way, the Ptolemaic system was accepted by the Aristotelians. But this view can hardly be accepted with respect to explanation. Everybody feels that a theoretical explanation cannot be effective if the basic axioms of the theory concerned are taken to be false.

The words kinetic and physical are used here in an analogical sense. Deduction is a kind of logical motion, and arguments are concerned with logical force. In the next chapter we shall meet the original, non-logical meaning of motion and interaction.


[1] Grant 2001, 94-95.

[2] Plato, Timaeus, 1164-1165; see Dreyer 1906, chapter 3; Heath 1913, chapter 15.

[3] Plato, Republic VII, 762.

[4] Dreyer 1906, chapter 4-5; Heath 1913, chapter 16.

[5] Duhem 1908, 5; Moss 1993, 42.

[6] Kuhn, 1957, 48.

[7] Aristotle, Metaphysics, XII, 8.

[8] Dreyer 1906, chapter 4-5; Heath 1913, chapter 16.

[9] Neugebauer 1957, 1975; Dreyer 1906, chapter 9.

[10] Newton 1977; Swerdlow 2010.

[11] It was sometimes attempted to make room for the epicycles by constructing homocentric shells with an appropriate thickness.

[12] Koestler 1959, 209; Rosen 1984, chapter 3.

[13] Dijksterhuis 1950, 230-237 (II: 141-148); Duhem 1908, chapters 2-4.

[14] Plato, Timaeus, 1161 ff.

[15] Copernicus 1543, 25 (Preface).

[16] Dijksterhuis 1950, 237-241, 254-256 (II: 149-151, II: 12-13); Hooykaas 1971, 75-79; Kuhn 1957, 114-122; Toulmin, Goodfield 1961, 165-169; North 1975; Grant 2001, 200-201. 

[17] For both quotes in this paragraph, see Hooykaas 1971, 77-79. Oresme refers to Psalm 93:1.

[18] Dijksterhuis 1950, 185, 186; Giedymin 1976.

[19] Copernicus 1543, 27 (Preface).

[20] Koyré 1939,  3, 36, 201- 202.

[21] Devreese, vanden Berghe 2003, chapter 6.

[22] Netz, Noel 2007.

[23] Rheticus 1540, 166, 135, 137; Copernicus 1512, 59; Galileo 1632, 341-342.

[24] Copernicus 1543, 25 (Preface).

[25] Copernicus 1543, 25 (Preface).

[26] Rheticus 1540; Rosen 1939; Kattenberg 1999.

[27] Rosen 1939, 162, 186

[28] Galileo 1610, 57; 1632, 334, 339-340.

[29] Galileo 1632, 122.

[30] Galileo 1638, fourth day.

[31] Bunge 1967b,  I, 354-355.

[32] Finocchiaro 1980, 275.

[33] Bunge 1967b, II, 9-10.

[34] Galileo 1632, 234.

[35] Popper 1959, 69.

[36] Popper 1963, 33-41; 1972, 23-29. For a critique, see Grünbaum 1976.

[37] Popper 1963, 81-83; Hooykaas 1971, 23, 223.

[38] Popper 1959, 34-42, 311-314; 1963, chapter 11.

[39] Popper 1963, chapter 11; 1983, 159-193.

[40] e.g., Hempel 1965, 3.

[41] Copernicus 1543, 38-40 (I, 4).

[42] Copernicus 1543, 46 (I, 9).

[43] Copernicus 1543, 37-38 (I, 3)

[44] Feyerabend 1975, 109-111.

[45] Copernicus 1543, 51 (I, 10); 238-242 (V, 3); 291-294 (V, 35); Kepler 1597, 30, 36; Galileo 1632, 342-345; Koyré 1961, 129; Glymour 1980, 178-203.

[46] Galileo 1632, 322.

[47] Copernicus 1543, 26 (Preface).

[48] Lakatos 1978, I, 189.

[49] Copernicus 1512, axiom 7.

[50] Popper 1959, 82-84.

[51] Hempel 1965, 249, 366-376; Carnap 1966, 16. For a critique, see Hanson 1973, 161. Hanson’s unfinished book was intended to demonstrate the distinction between explanation and prediction, in particular with respect to planetary motion, see Finocchiaro 1973, chapter 2; Radnitzky 1979; Toulmin 1961.

[52] Copernicus 1543, 51, 143ff; Dreyer 1906, 345-360; Hall 1963, 18-20.

[53] On Tycho Brahe, see Dreyer 1890; Koestler 1959, 286-316; Thoren 1990; Christianson 2000; Mosley 2007.

[54] Christianson 2000, 133, 136-137.

[55] Axiom 4 in Copernicus 1512, 58.

[56] Galileo 1632, 358-361.

[57] Koyré 1939, 141-143.

[58] Dijksterhuis 1950, 327 (IV: 12).

[59] Aristotle, On the heavens, II, 14.

[60] Galileo 1632, 138, 366-389.

[61] Koyré 1961, 43, 45.

[62] Galileo 1632, 342-344.

[63] Bunge 1967b, II, 347-349.

[64] Dijksterhuis 1950, 322-323.

[65] Copernicus 1543, 47-48 (I, 10).

[66] Koyré 1961, 51. Blumenberg 1975, 279 overlooks this.

[67] Copernicus 1543, 254, 260, 268, 276 (V, 9, 14, 21, 27); Koyré 1961, 108.

[68] Copernicus 1543, 51 (I, 10).

[69] Dijksterhuis 1950, 335-357 (IV: 25-59); Koestler 1959; Koyré 1961, 117-464; Beer en Beer (eds.) 1975; Banville 1981; Jardine 1984.

[70]  Kepler 1597.

[71] Burtt 1924, 64.

[72] Rheticus 1540, 147.

[73] Kepler 1597, 21, 26-27 (Preface).

[74] Kepler 1597, 23-24 (Preface).

[75] A proof is given in Euclid’s Elements.

[76] Kepler 1597, 50 (chapter 2).

[77] Kepler 1597, 89, 92 (chapters 13-14). Gaukroger 2006, 178 exaggerates somewhat by stating that ‘… the model fitted observational data extremely well’, see Koyré 1961, 147.

[78] Voelkel 2001, part 2.

[79] Kepler 1609, 21 (Introduction).

[80] Kepler 1609, 186 (chapter 24). See M. Caspar’s introduction to Kepler 1609, p. 43*, and Koyré 1961, 181.

[81] Kepler 1609, 24 (Introduction), 247 (chapter 40); Koyré 1961, 234.

[82] Galileo too was aware of this law: Galileo 1632, 398-399, 410-411.

[83] Kepler 1609, 34 (Introduction), 267 (chapter 44), 345 (chapter 58); Koyré 1961, 225, 244, 264.

[84] Kepler 1619, 291 (V, 3).

[85] Koestler 1959, 204.

[86] Kepler 1619, 289 (V, 3).

[87] Kepler 1619, 279-280 (Preface to book V), translation quoted from Koestler 1959, 399. See Koyré 1961, 343, 457.

[88] Descartes 1647, 117.

[89] Koyré 1961, 465-527.

[90] Cohen 1980, 20-21.

[91] Copernicus 1543, 50 (I, 10).

[92] Rheticus 1540, 164-165.

[93] Galileo 1632, 133.

[94] Popper 1963, chapter 3; 1983 111-149; Bunge 1967a; Feyerabend 1964; Giedymin 1976.


Chapter 3

Principles of explanation

3.1. Number and space

in the harmony of the spheres

This chapter discusses some fundamental and irreducible principles of explanation operative in the physical sciences. Before the Copernican era, especially quantity and space were crucial principles of explanation, both in ancient and medieval philosophy (3.1-3.2). The Copernican revolution transformed these into quantitative and spatial relations, and developed two new relational principles, pure kinetic motion, and physical interaction.[1]

Together with matter and action by contact, local motion became the focus of Galileo’s and Descartes’ mechanical philosophy, propagating the explanation of motion by motion (3.3-3.4). Kepler and Newton explained change of motion by a force, introducing experimental philosophy as a correction to mechanism (3.5-3.6). Newton discussed the problem of absolute or relative time, space, and motion (3.7). In order to make clear the novelty of these principles, some earlier attempts at explanation will be reviewed first.

Pythagoras and the rational numbers

In the sixth century BC, Pythagoras and his school tried to reduce cosmic relations to pure numbers and their ratios. Rational means both reasonable and proportional. As proportions of natural numbers, rational numbers are intelligible, and with their help the universe may be understood.

For instance, the Pythagoreans discovered that the musical tones sounding harmoniously together, stand to each other as integral numbers. These tones were not related to frequencies or wavelengths, as they are considered now, but to the linear dimensions of some musical instruments. Two strings of unequal length producing different tones differ by an octave if the lengths of the string are as 1:2, by a fifth if the proportion is 2:3, whereas a fourth corresponds to the ratio 3:4. It inspired Copernicans like Stevin, Galileo, Kepler, Mersenne, and Huygens to study harmonics (2.6).[2] Galileo’s father, Vincenzio Galilei, came into conflict with Aristotelian philosophers about the theory of musical consonants, invented by the Pythagoreans. He argued that this theory was no longer in accord with new musical practices. Both Simon Stevin and Christiaan Huygens tried to found a new theory, leading up to Huygens’ division of the octave in 31 tones. Marin Mersenne discovered that any tone produced by a musical instrument is accompanied by a number of harmonics.

The Pythagoreans attached much significance to numbers like 10, being the sum of 1, 2, 3, and 4. These four numbers were supposed to determine a point, a line, a plane, and a body, respectively. Accordingly, some Pythagoreans assumed ten celestial bodies or celestial spheres. A significant result of Greek astronomy, ascribed to Pythagoras, was the identification of the morning star and the evening star as the same planet, Venus (though the terms are also applied to other bright celestial bodies). In order to restore the number of ten planets, some Pythagoreans postulated a counter-earth, moving on the other side of the central fire, and therefore not observable. The central fire is never seen because the earth is only inhabitable at the side turned away from the fire. Initially the Pythagorean universe was geocentric. Sometimes the sphere of the fixed stars was included, sometimes the central fire, sometimes both. In the latter case, the counter-earth was superfluous.[3]

Perhaps influenced by Babylonian mythology, the emphasis shifted from the number 10 to the no less holy number of 7. The motions of the now seven planets stand to each other as the tones of the octave, displaying the harmony of the spheres. Kepler employed in his Harmonice mundi a musical notation to describe the elliptic motion of the planets, even if he recognized only six planets.[4]

The turn from arithmetic to geometry

Although influenced by the Pythagorean tradition, Plato made an important shift. Whereas the Pythagoreans stressed numerical  proportions, Plato turned to spatial relations. Plato based a theory of matter on the Pythagorean discovery of the existence of exactly five regular polyhedrons (2.6). The four elements, earth, water, air, and fire, introduced by Empedocles, corresponded respectively with the cube, the icosahedron, the octahedron, and the tetrahedron. The dodecahedron with its twelve pentagonal faces characterizes the fifth element, quintessence or ether, from which the heavenly bodies are made. The argument for this correspondence is quite vague.[5]

The shift from the numerical to the spatial principle of explanation was caused by a crisis in the Pythagorean brotherhood, which led to its disbandment.[6] This crisis occurred shortly after the discovery of the famous Pythagorean theorem. It implies that in a square with side 1, the square of the diagonal equals 2. The Pythagoreans proved that the length of this diagonal cannot be expressed by a rational number. The length of the diagonal is thus not rational, it is irrational, unreasonable, and unintelligible.

The starting point of the Pythagorean school, to explain everything with the help of rational proportions, was shipwrecked on a simple spatial problem. The Pythagoreans could not overcome the irreducibility of spatial relations to purely numerical ones. As a result, Alexandrian mathematicians inspired by Plato, such as Euclid, Ptolemy, and Archimedes, directed themselves to the development of geometry, as a principle of explanation besides the numerical one. Apollonius of Perga studied the sections of a plane with a double cone, discovering and naming the ellipse, the parabola, and the hyperbola. For the Copernicans, the Pythagorean-Platonic tradition became a source of inspiration to build a mathematical physics. However, first they had to transform numerology into calculation, numbers into quantities.

3.2. Explanation of change

in Aristotelian cosmology

In ancient and medieval philosophy motion was never a principle of explanation. It was always considered a result of an explanation. Like any kind of change, local motion had to be explained, and could not be used as explanans. In Copernicanism, motion became a new principle of explanation.

In order to understand the revolutionary character of this move, we have first to review Aristotle’s theory of change, to which it was opposed. The Aristotelian scheme of explanation was one of the answers given to a problem first formulated by Parmenides of Elea, circa 500 BC.[7] Parmenides identified being with intelligibility. On logical grounds, Parmenides proved change to be unintelligible, and thus non-existing.

‘What is cannot have come into being. If it did, it came either from what is or what is not. But it did not come from what is, since if it is existent it did not come to be but already is; nor from what is not, for the non-existent cannot generate anything.’[8]

By means of some well-known paradoxes, Parmenides’ disciple, Zeno of Elea, tried to prove that motion is an illusion.[9] He argued that Achilles, a legendary athlete, would never be able to overtake a tortoise, for if he would have covered the distance which at first separated him from the animal, it would have moved on. Concerning these paradoxes, several views are conceivable.

First, it may be admitted that the arguments are correct, and that motion is illusory. This view was shared by Parmenides, Zeno, and Plato. The observable world to which motion belongs is deceptive. It is merely an appearance, a shadow of the real, unchangeable world of ideas.[10]

Second, one may accept Zeno’s arguments, but maintain that motion is real. This means the recognition that motion cannot be explained starting from the categories Zeno applied. In his arguments we find only numbers and spatial distances as elements. Zeno succeeded in proving that these principles are not sufficient to explain motion. Shortly before Zeno, the Pythagoreans had proved that spatial relations cannot be explained by relations between integral numbers (3.1). Zeno’s paradoxes can be interpreted as demonstrating that motion cannot be explained by numerical and spatial relations. This way to consider Zeno’s paradoxes means to accept motion as an unexplained principle of explanation. It became the route of the Copernicans (3.3).

According to modern mathematics, local motion implies the continuous succession of temporal instants. This appears to imply a paradox, too. Like the rational and real numbers, points on a continuous line are ordered, yet no point has a unique successor. Since there are infinitely many other points between A and B, one cannot say that a point A is directly succeeded by a point B. Yet, a uniformly or accelerating moving thing passes the points of its path successively. Therefore, continuous succession of temporal moments cannot be reduced to quantitative and/or spatial relations.

The third view is that Zeno’s arguments are incorrect, and that motion is not illusory. This approach led Aristotle to his realistic theory of change, to be discussed presently.

Four causes

For the Greek philosophers, the ideal state of the universe was a static equilibrium in which only natural motions would occur, to be explained by numerical and spatial relations. Aristotle was far more realistic than his predecessors. He valued observations higher than Plato did, and accepted change as a problem to be solved.

According to Aristotle’s Physics, the explanation of change must start from two unchangeable principles, eternal form and eternal matter, both universal. The forms  are very similar to Plato’s ideas. The main difference concerns their relation with observable things, which Plato considered unreliable copies of the ideas. These should primarily be understood in a mathematical sense. The ideal triangle, a subject matter of geometry, is only approximately realized in triangular bodies. Aristotle was more inspired by his biological studies, and his forms refer to species of animals or plants. Unlike mathematical ideals, these can only be studied by careful observation.

Any individual is a combination of form and matter, is formed matter, is substance. Forms and matter are eternal and unchangeable. Only substances can change.[11] This view prevented Aristotle and his followers from appreciating that also relations and even motion itself might be variable.[12]

Strictly unformed matter, materia prima (first matter), is characterized as absence of form and therefore does no more exist in any concrete sense than pure form.[13] But matter can also be conceived in a less absolute sense. When a sculptor creates a sculpture, its design is changed as little as the marble, the material used. Only the concrete piece of marble changes, because the sculptor adds a new form to its matter. This form existed beforehand in the imagination of the artist, and the perfect form does not change. Changing a substance is the process (kinesis) by which it attains a new form.

Potentiality and actuality

In order to explain such a process, it is not sufficient to have insight into the form and matter concerned. The matter involved must have the potential to attain the appropriate form.[14] (By the distinction between eternal form and matter on the one hand, and changeable substance with its potential and actual properties on the otheer side, Aristotle avoided Parmenides’ problem mentioned above.) It is possible to make a sculpture out of a piece of marble, but not of sand or water. Marble has the potential to become a statue, which water has not. This is even more clear in living nature. A chicken-egg has the potential to become a chicken, and never becomes a duck or a horse.

Besides form, matter and potential, a fourth principle of explanation is needed, the efficient cause, the actualization of the potential. The chicken-egg must be hatched, the marble must be worked on. Only if these four causes are found is the explanation complete, according to Aristotle.

Instead of potential and actual, Aristotle also speaks of the final cause or destiny, and the efficient cause.  In this case, the process from potentiality to actuality is treated separately.[15] It is not always possible to distinguish the four causes. Sometimes the final cause or end appears identical with the form to be achieved.

Classical physics changed the four Aristotelian causes dramatically. The formal cause was replaced by natural law (8.6). The material cause was resolved in the schemes of matter and motion or matter and force. End, purpose, or destiny disappeared as a principle of explanation in physics, though it remained relevant in the study of animal behaviour and of human activity. Only the efficient cause survived, as the cause of motion, or of change of motion.

Four kinds of change

Aristotle distinguished four kinds of change in individuals, in decreasing order of importance:[16] change of essence or of nature, generation and corruption, coming into being and passing away, for instance the birth of a caterpillar, or the death of a butterfly; qualitative change of properties, like the change of a caterpillar into a butterfly; quantitative change of degree, increase or decrease, for instance the growth of a caterpillar; and finally change of position, for example the local motion of our caterpillar on a tree. Although the least far-reaching, no change is possible without local motion.[17] The first kind of change is treated in On generation and corruption, the other three in Physics.

Every change has two termini, a beginning (its matter) and an end (its form). For the first kind of motion, generation and corruption, these are contradictoriesx and non-x. The other three have contraries like hot and cold as termini. Therefore, Aristotle rejected the infinity of the cosmos. If the cosmos were infinite, the elements could move infinitely far away, without an end. Uniform circular motion is the only kind of motion having neither beginning nor end.[18] It lacks contraries, it lacks potential existence. Celestial bodies, moving uniformly in perfect circles, are completely actual, hence unalterable, eternal, incorruptible, ungenerable, unengendered, and impassive. These cosmic properties are found as a logical consequence of Aristotle’s theory of change. Even in this case, local motion is the basis of the actuality of the bodies concerned.

For a natural change no external cause is needed. We still distinguish between natural and unnatural death – the latter needs explanation. The alteration of an acorn into an oak is a natural process, and does not require an external cause. But the change of an oak into a pile of boards is an unnatural process, in need of an external cause. The free fall of a heavy body can be prohibited, if it is sustained. Similarly, natural processes like the growth of a plant can be prohibited, for instance, because of lack of water. But as soon as such external impediments are removed, the process will occur according to its nature, without any external cause.

Four terrestrial elements

Besides the four causes and four types of change, Aristotle distinguished four terrestrial elements. Plato related the elements to the regular polyhedrons, but this could not serve Aristotle’s theory of change.

Generation and corruption always involves a mixing of elements. The celestial bodies are made of a single element, ether, because they cannot be generated or corrupted. Aristotle related the terrestrial elements to termini of change. These are pairs of contrary properties, like warm and cold, dry and moist, up and down. Earth is dry and cold, water moist and cold, air hot and moist, and fire hot and dry. Earth and water are heavy, and by their nature move downward. Fire and air are light, moving upwards. The upward and downward motions are opposite, hence point to imperfection, and to the existence of at least two elements, heavy earth and light fire.[19] The contrary qualities of heavy and light were never related to density. Only neo-Platonic scholars like Benedetti and Galileo studied density as a quantitative property (1.4).

Empedocles’ four elements, if severed from the distinction between gravity and levity, are consistent both with Aristotelian and Copernican views. Up till the nineteenth century they remained the basis of medical and psychological theories. Galileo connected the elements with the senses. In chemistry, the four elements were abandoned in the eighteenth century.

The theory of the elements enabled Aristotle to criticize the older atomic views. As a reply to Parmenides’ denial of variability, the atomists accepted only local motion as possible change. In various ways, from the time of Galileo most Copernicans were atomists. They considered atomism not in the first place as a theory of the structure of matter, but as an ontological foundation of their world view, in which local motion does not play a subordinate, but rather a leading part.

Aristotle on natural motion

Before Galileo’s Discorsi, the most important treatise of motion was Aristotle’s Physics. (The pseudo-Aristotelian Questions of mechanics, extensively annotated by Galileo, only became available in Europe after 1525.)[20] It can only be understood in its relation to his cosmology, and the theory of the elements, displayed in Aristotle’s On the heavens. Just like Plato, Aristotle put the earth in the centre of the universe.

If the cosmos were in equilibrium, it would consist of a set of perfect concentric spheres. At the centre is the sphere of the heavy element earth, surrounded by the less heavy element water, the light element air, and lightest of all, fire. In or near the sphere of fire one observes lightning, comets, meteors, and aurora borealis (northern lights). It occupies the periphery of the sublunary spheres rather than their centre, as some Pythagoreans assumed. The sphere of fire is contained in the lunar sphere, the lower boundary of the heavenly space in which the celestial bodies move around.

According to Aristotle, the lunar sphere constitutes a sharp division between the celestial and terrestrial realms. The celestial space is ordered, the sublunary space is disordered. In the heavens, everything is perfect: the spherical shape of the celestial bodies; their unchangeability and incorruptibility; their circular uniform motion. On the other side, the sublunary sphere is imperfect. The separation of earth, water, air, and fire into concentric spheres is disturbed, and the four elements are mixed. Here one finds not only natural motion, but also unnatural, violent motions.

Natural motion in the sublunary spheres is vertical and rectilinear. It is directed toward the centre for heavy bodies, and toward the periphery for light bodies. Even this natural motion is caused by an unnatural, artificial initial state, deviating from the ideal state of equilibrium. Sublunary natural motion of a body means motion to its natural place, and can only occur if it is not in its natural place to begin with. This shows that also for Aristotle spatial position has a high priority as a principle of explanation. Natural motion is explained by spatial arguments.

The natural motion of the heavens, too, is not without cause. It is caused by the divine unmoved mover, who, at the uppermost periphery of the cosmos, is nothing but pure thought, thinking about itself – thought returning to itself. The rotation of the heavens is caused by the love of this god, by striving to become god-like, perfect. It is a final cause, not an efficient one. Circular motion is returning in itself. Ultimately, all motion is caused by the prime mover as an end, and the centre of the cosmos is unmoving.

Aristotle’s cosmos is an organized whole, in which everything has its natural place. It is, however, not a living organism. Aristotle did not adhere to astrology, which was not influential in Athens of his time. However, when during ages to come astrology and alchemy came to the forefront, their leading idea of the microcosmos-macrocosmos correspondence easily fitted into Aristotle’s cosmology.

Violent motion and the impetus theory

Aristotle distinguished natural motion from violent, artificial motion, motion influenced by a force.[21] Natural motion is motion according to the nature of a thing, whether it is heavy or light. A heavy body falls downward, because it is heavy. This is an internal cause. For a natural motion no external cause is needed, as it is for any unnatural motion. The relation between natural and violent motion is rest. Natural motion is the actualization of some potential, and naturally ends when the body has achieved its end, its natural place. Violent motion is contrary to the nature of a thing. Therefore, by force of its nature, everything resists violent motion. For Aristotle, rest is ontologically different from motion, both natural and violent, and is superior to both. Clearly, natural motion is explained by spatial circumstances, violent motion by a force. Motion is not a principle of explanation itself.

During the Middle Ages, Aristotle’s theory of local motion caused much discussion. In particular, it is by no means clear what kind of force causes the motion of an arrow. Apparently, it is the force of the bended bow. But this force ceases to act as soon as the arrow has left the bow, whereas the motion does not cease. Aristotle rejected any kind of action except action by contact.[22]

In order to solve this problem, the fourteenth-century scholars Jean Buridan and others at Paris developed the impetus theory, assuming that motion can be caused either by an internal or by an external motor. The external motor is the force, the internal motor is called the impetus. The bow does not only supply an external force to the arrow, but also an internal impetus. During the motion, the impetus decreases until it is exhausted, and the motion ceases. In the theories of Aristotle and his disciples it was unimaginable that a body would partake in a natural and a violent motion simultaneously. An arrow shot obliquely would move rectilinearly until its impetus is exhausted. Only then it would begin to fall.[23] The observed curved path of motion is contradicted by the theory, and hence deceptive. Galileo was the first scholar to recognize that the trajectory of a projectile is curved right from the start.[24]

Impetus was supposed to be proportional to the quantity of matter in the body and to its motion. In the seventeenth century this was transformed into mass and velocity, but these magnitudes were not defined yet in the fourteenth century. Nevertheless, the impetus can be recognized as a predecessor of the modern concept of linear momentum. However, impetus was considered as the cause of the motion, whereas the later momentum is merely a measure of motion, quantity of motion. The transformation of impetus into momentum has been a laborious process, and is a fruit of the Copernican revolution.

Although the medieval impetus theorists assumed that a falling body is increasing its impetus, and they also studied uniformly accelerated motion, before the sixteenth century the two were never related.[25] Buridan arrived at the important insight that the speed of a falling body at any instant depends on the path traversed since the start of the motion, rather than on the distance to the body’s natural place.[26] The latter was Aristotle’s opinion.[27]

During the Middle Ages, projectile motion was never considered a proof against the validity of Aristotle’s views. Even the impetus theorists tried to solve the problem within the context of Aristotle’s physics. Only in the seventeenth century the emergence of a new theory of motion became possible after the distinction between celestial and terrestrial physics was destroyed, in particular after Galileo’ attack on Aristotle’s cosmology (3.3).

The beauty of Aristotle’s theory of change

Aristotle’s theory of change is one of the most beautiful intellectual achievements ever made. It is not only completely logical, coherent and consistent; it is also in harmony with common sense. Apart from being falsified by classical physics, it has only three flaws: the uncertain status of light, being both terrestrial and celestial; projectile motion; and, perhaps most important, the lack of insight that distinctions like hot and cold are not contraries, but allow of gradual transitions. The first could be tentatively solved by abolishing the distinction between celestial and terrestrial physics; the second required a new view on motion; and the third led to the introduction of measurement in physics.

In Aristotelian philosophy, local motion was only one kind of change, and change was a process from potentiality to actuality. The Copernican revolution implied the slow and gradual transition from local motion as a process to inertial motion as a state. The first steps were taken by Copernicus himself, who contended that the earth’s daily rotation is not in need of any explanation besides the spherical shape of the earth. The natural motion of a sphere is rotation.[28] This shows that Copernicus accepted spatial causes besides the kinetic one explaining retrograde motion. The final steps were made by Huygens and Leibniz, who rigorously stated the relativity of motion, and by Newton, who associated the idea of inertia with the idea of mass. Meanwhile, Galileo, Beeckman, Descartes, Mersenne, and Huygens founded mechanical philosophy.[29]

3.3. Galileo on motion

Any new theory of motion could only have a chance after Aristotle’s cosmology was abolished. Meanwhile, Copernicus’ system could only be considered an interesting mathematical exercise. Kepler’s attempt to replace the Aristotelian system by a Platonic one (2.6) was abortive, because Plato shared Aristotle’s distinction between celestial and terrestrial physics. The absolute split between the perfect heavens and the imperfect earth constitutes the heart of Aristotelian cosmology – so much so that Galileo found it necessary to devote the entire first day of his Dialogue to the demolition of this distinction.

Unlike Brahe and Kepler, Galileo Galilei was not a professional astronomer, although as a professor of mathematics at Padua, he had to teach Ptolemaic astronomy. As an able instrument maker he earned more money than the university paid him. Learning about the recent invention of the spyglass, he built a telescope, and directed it to the heavens. He observed the moon, the sun, the planets, and the fixed stars, and discovered Jupiter’s four moons. In 1610 he published his discoveries in Sidereus nuncius (The message of the stars, or the starry messenger)[30],on which his attack on Aristotle’s cosmology was based. After this publication, which drew a lot of attention, Galileo discovered the phases of Venus and studied the motion of sunspots.[31] The publication of this book caused Galileo’s first encounter with clerical censors, who ‘… adamantly refused a layman the right to meddle with Scripture’.[32]

Galileo’s main argument was to show, on the basis of observations, that the celestial bodies are not perfect, not incorruptible, and not unalterable. He discussed the mountains on the moon, in order to show that the moon is not perfectly spherical.[33] He argued that the earth, like the moon and Venus, reflects the light of the sun, by pointing to the so-called secondary light of the moon. This occurs shortly before or after new moon, when alongside a small sickle the dark part of the moon is perceptible.[34] Galileo explained why this phenomenon is not observable at first or last quarters. (Probably, Galileo did not know that the same explanation had already been given by Leonardo da Vinci and by Maestlin.[35]) He showed that the sunspots are continuously generated and corrupted.[36] He pointed to Tycho’s observations showing that the new generable and corruptible stars (novae) are celestial objects, not sublunary ones.[37] (The new stars named after Brahe (1597) and Kepler (1604) actually were supernovae, the last occuring in the Milky Way and visible with the naked eye.)

On the other hand, Galileo argued that circular motion pertains to terrestrial as well as celestial objects. In his discussion of magnetism, Galileo let Sagredo state that a magnet exerts both circular and linear motions, and should therefore, according to Aristotle, be composed of celestial and terrestrial matter.[38]

Galileo interpreted Copernican cosmology in a realistic way. Colliding with Aristotle’s no less realistic cosmology, he came into conflict with the Catholic Church (8.3). Despite its prohibition in 1633, Galileo’s Dialogue terminated Aristotle’s cosmology. After him, Descartes, Huygens, Newton, and other Copernicans did no longer bother to refute it.

Galileo’s Dialogue made an end to the separation of terrestrial and celestial physics. In both realms the same kind of explanations could now be applied. The relevance of this achievement can hardly be overestimated.

Motion as a principle of explanation

Whereas the Pythagoreans were confronted with the irreducibility of the spatial principle of explanation, Zeno stumbled on the irreducibility of the kinetic mode (3.1). With hindsight, we could say that the Copernicans accepted Zeno’s challenge head-on, by making motion a new principle of explanation. (I am not aware of any Copernican who actually discussed Zeno’s paradoxes. However, Spinoza’s views resemble those of Parmenides, and his views on motion those of Zeno.) In ancient and medieval philosophy motion was never a principle of explanation. It was always pursued as the result of an explanation. Like any kind of change, local motion was an explanandum and could not be used as explanans.

In Copernicanism, the study of motion plays a central part. The characteristic trait of the Copernican system is not its heliocentrism, but the assumption that the earth is moving. Motion was introduced as a principle of explanation. Copernicus argued that the apparent retrograde motion of the planets is caused by the real motion of the earth (2.4). As we have seen, in a logical sense an explanation has a causal character, but this cause is not necessarily a physical one.

The principal objection against Copernicanism concerned the doctrine of the moving earth. In order to avoid it, Tycho Brahe proposed his compromise system (2.5). As an astronomer, he recognized the advantages of Copernicus’ theory. In his earth-centred system, with five planets circling around the sun, the sun has a much more prominent position than in Ptolemy’s. Tycho considered the physical and theological objections against the earth’s motion  insurmountable.

In contrast, Galileo Galilei felt inspired by the difficulties engendered by the earth’s motion.[39] It is a matter of dispute whether Galileo’s study of motion was inspired by Copernicanism. Although he openly adhered to it only after 1609, it appears that he accepted the Copernican system after about 1590.[40] Between 1609, when he made his provocative astronomical discoveries, and 1633, when he was convicted by the papal Inquisition, Galileo was the main agitator in favour of Copernicanism (2.7). After this episode, he published his ideas on the theory of motion in Discorsi (1638), avoiding mentioning Copernican views. Most work on this theory was done at Padua, where he was a university professor from 1592 to 1610.

Sunspots

Galileo’s claim to be the first to have discovered the sunspots is unjustified, but the quality of his observations and his reasoning were unsurpassed during his lifetime. His Letters on the sunspots (1613)  consists of three letters to the Augsburg merchant Mark Welser, in which Galileo criticized Christoph Scheiner’s anonymous interpretation of the sunspots. He observed that some sunspots do not change their dimensions during several days, except that their apparent width decreases if the sunspots move from the centre towards the circumference of the solar disc. Based on a careful measurement, he demonstrated that this phenomenon could only be explained if the sunspots are situated on the surface of the sun, which is spherical and rotates in about thirty days about its axis. Thus he explained the apparent change of the sunspots by the motion of the sun. In a similar way he explained why the apparent speed of the sunspots changes during the sun’s motion. According to Galileo, in the heliocentric system the sun has only one motion, rotation around its own axis in thirty days. Assuming the earth at rest, one has to describe two more motions to the sun: the daily and annual motion around the earth. The daily motion of the sun would imply that the axis of the sun’s own rotation changes continually, which is dynamically hard to believe.[41]

Galileo’s mechanical philosophy

Galileo understood motion to be sui generis, not to be explained, but to be used as an explanation. Explanation of motion by motion is short for the adoption of one or two principles of motion in order to explain other kinds of motion. It is not necessary to explain the primary or natural motions themselves. The principle of inertia could be formulated as: a body on which no external force is acting, moves because it moves. The principle of relativity, first explored by Galileo, implies that inertial motion is a relation. Besides inertial (circular) motion, Galileo considered the uniformly accelerated motion of free fall as a principle of explanation.

In Il saggiatore (The assayer, 1623) Galileo presented the program of the rising mechanical philosophy, to reduce all physical phenomena to matter, quantity, shape, and motion:

‘… whenever I conceive any material or corporeal substance, I immediately feel the need to think of it as bounded, and as having this or that shape; as being large or small in relation to other things, and in some specific place at any given time; as being in motion or at rest; as touching or not touching some other body; and as being one in number, or few, or many. From these conditions I cannot separate such a substance by any stretch of my imagination.’[42]

This became the nucleus of mechanical philosophy. For instance, Galileo explained heat as motion of corpuscles:

‘I do not believe that in addition to shape, number, motion, penetration, and touch there is any other quality in fire corresponding to “heat”.’[43]

In suit of Benedetti, he explained sound as motion caused by the periodic motion of a string,

‘… the waves which are produced by the vibrations of a sonorous body, which spread through the air, bringing to the tympanum of the ear a stimulus which the mind translates into sound.’[44]

As a consequence, Galileo distinguished objective from subjective properties, or primary from secondary qualities, as he called them:

‘To excite in us tastes, odors, and sounds I believe that nothing is required in external bodies except shapes, numbers, and slow or rapid movements. I think that if ears, tongues and noses were removed, shapes and numbers and motions would remain, but not odors or tastes or sounds.’[45]

Taste, odour, sound, touch are connected, respectively, with water (fluids), fire, air, earth. Vision (‘the sense eminent above all others’) is related to light, implicitly referring to the ether.[46]

Natural motions

Galileo considered motion as a state of a system.[47] A body, moving or at rest, is physically completely unaffected by which of these two states it is in, and being in one or the other in no way changes it.[48] Therefore, rest and motion are not contraries, as was taught by Aristotle. Both are states of motion.[49] Although this was only gradually understood, it implies that the attribution of the state of rest or motion to a given body is only possible in relation to another one.

In 1612 Galileo formulated the first clear expression of the principle of inertia, arguing from a discussion of balls moving on a plane. If the plane is tilted, a downward motion accelerates, whereas an upward motion decelerates.

‘And therefore, all external impediments removed, a heavy body on a spherical surface concentric with the earth will be indifferent to rest and to movements toward any part of the horizon. And it will maintain itself in that state in which it has once been placed; that is, if placed in a state of rest, it will conserve that; and if placed in movement toward the west (for example), it will maintain itself in that movement.’[50]

(Drake 1990 insists that Galileo did not express a principle of circular inertia. Drake views inertia as a dynamic principle, first put forward by Newton, whereas Galileo restricted himself to kinetics.)

Discorsi recognized two fundamental or natural motions: uniform circular motion (at constant speed), and the uniform accelerated motion of free fall (at constant acceleration). Both occur without external cause, and are idealized states. The third day of Discorsi, entitled ‘Change of position – De motu locali’, introduced these carefully by the axiomatic method, starting with the words:

‘My purpose is to set forth a very new science dealing with a very ancient subject. There is, in nature, perhaps nothing older than motion, concerning which the books written by philosophers are neither few nor small; nevertheless I have discovered by experiment some properties of it which are worth knowing and which have not hitherto been either observed or demonstrated.’[51]

Galileo recognized first uniform circular motion as primary, not in need of explanation. It concerns planets turning around the sun, and the earth rotating about its axis, as well as a terrestrial body moving without friction on a horizontal plane, for instance, a ball on a smooth surface, without air resistance. Galileo described the horizontal motion sometimes as being approximately rectilinear, but in principle it was circular.[52]

Galileo’s second natural motion is free fall in a vacuum.[53] In this case, acceleration instead of velocity is a constant. Again, no external cause is needed. Gravity, the source of the motion, is an intrinsic property of the falling body.[54] But it is not its cause, for the motion of fall is independent of the weight of the body, as Galileo found shortly before 1610.

Galileo’s two kinds of natural motion are not contrary to each other. First, one kind can change into the other. A ball uniformly accelerated on an inclined plane may continue its motion at constant speed on a horizontal plane. Secondly, they have a common source, for gravity is the source of all motion. When discussing the inertial motion of a ball on a horizontal plane, Galileo lets the motion start from an inclined plane.[55] The uniform motion of the planets he explains with the help of a Platonic myth about the fall of the planets from a common point toward their present orbits.[56] Gravity is even considered a measure of motion.[57] Thirdly, two natural motions can be composed; a body is able to perform two motions simultaneously.[58] This applies to a ball rolling down an inclined plane,[59] or to the earth combining its daily and annual motions. A composite motion is as natural as a simple motion.[60] It also applies to the combination of a horizontal uniform motion and a vertical accelerated motion. This enabled Galileo to explain the path of a cannon ball, the time-honoured problem of projectile motion. He emphasized that his theory is able to explain why the cannon reaches farthest if the ball is fired at an angle of 45o. This fact was empirically known for quite a long time (Tartaglia mentioned it in 1531), but never explained.[61]

Galileo on circular motions

Galileo connected both natural motions to circular motions in the following way.[62] On a horizontal plane he considered a number of objects starting simultaneously from the same point, moving at the same speed in various directions. At any instant, their positions constitute a circle of which the common starting point is the centre. Next he considered in one vertical plane a number of objects moving simultaneously on planes with various angles of inclination. If they start simultaneously from the same point, at any subsequent moment their positions lie on a circle having the horizontal through the common starting point for a tangent. Galileo concluded:

‘The two kinds of motion occurring in nature give rise therefore to two infinite series of circles, at once resembling and differing from each other … (this constitutes) … a mystery related to the creation of the universe.’[63]

Galileo’s universe was composed of circles. He even considered the motion of animals to be primarily circular.[64] (However, Drake argues against the view that Galileo was obsessed by uniform circular motion, that he adhered to circular inertial motion, and that he seriously believed the planets to move in uniform circular motion around the sun.[65]) Uniform circular motion is primary, not in need of explanation. It concerns planets turning around the sun, and the earth rotating about its axis, as well as a terrestrial body moving without friction on a horizontal plane, for instance, a ball on a smooth surface, without air resistance. Galileo describes the horizontal motion sometimes such as to be approximately rectilinear, but in principle it is circular.[66] The beauty of the Copernican system was expressed in uniform circular motion not confined to the heavens.[67]

In the fourteenth century, the impetus theory was applied to circular motion, and some scholars came close to the idea of inertia (3.2). Jean Buridan, for instance, observed that a heavy millstone, having much impetus if turned around, is inclined to persist in its motion. For the same reason he suggested that the celestial spheres are not in need of a cause to continue their motion. Galileo agreed with this opinion. In an ordered universe only finite and terminate motions, that is, uniform circular motions, do not disorder the parts of the universe.[68] This view prohibited Galileo from accepting Kepler’s discovery of non-uniform, elliptic motion. If comets were celestial bodies, they would move in oval trajectories, putting an end to the supremacy of circular motion. For this reason Galileo assumed comets to be atmospheric phenomena.[69]

Galileo’s concept of inertia concerned celestial bodies, moving uniformly in circles around the sun, or turning around their axis, as well as terrestrial objects moving frictionless on a horizontal plane.[70] This horizontal motion, too, was circular, for the earth is spherical. In Galileo’s worldview, only circular motion could be uniform.[71]

Falling bodies

Before Galileo, Giovanni Battista Benedetti had argued that in a vacuum all bodies having the same density would fall with the same speed.[72] Initially Galileo adhered to Benedetti’s view that this speed is determined by the density of the falling body. He supposed the speed of fall to be proportional to the difference between the densities of the body and the medium in which it falls. By this law he tried to account for the upward force exerted by the medium, and to disprove the distinction between light and heavy bodies, essential in Aristotle’s physics.

Only gradually, Galileo realized that a falling body is subject to three forces: the force of friction, dependent on the medium; the upward force of buoyancy, dependent on the density of the falling body relative to that of the medium; and the downward force of gravity. Because he wanted to base the law of fall on experience, he could not start with fall in a vacuum, because that is not empirically available.[73] Besides, its existence was denied by philosophers. Instead he estimated the contributions of friction and buoyancy, and discussed a situation in which both could be neglected. Galileo showed that buoyancy depends on the relative specific weight of the falling body and the medium. Therefore, in a hypothetical void (which specific weight is zero), buoyancy is zero. The same applies to friction. This means that in a void, all bodies would fall with the same speed, independent of their density.[74] Acceleration instead of velocity would be a constant.[75] Again, no external cause is needed. Gravity, the source of the motion, is an intrinsic property of the falling body.[76] But it is not its cause, for the motion of fall is independent of the weight of the body, as Galileo found shortly before 1610. According to Aristotle a heavier body falls faster than a lighter one. Galileo refuted this by a simple thought experiment. Consider two falling bodies of unequal weight connected by a string. On the one hand, the lighter one should delay the heavier one, but because the combination is even heavier, it should fall faster.[77] After the invention of the air-pump, Boyle’s experiments (1669) confirmed that in an exhausted receiver a feather ‘descended like a dead weight’.[78]

The axiom, that free fall implies a constant acceleration equal for all bodies, enabled Galileo to explain a large variety of phenomena:

‘… in the principle (of accelerated motion) laid down in this treatise (Galileo) has established a new science dealing with a very old subject … he deduces from a single principle the proofs of so many theorems … the door is now opened, for the first time, to a new method fraught with numerous and wonderful results which in future years will command the attention of other minds …’[79]

His decision to exclude physical elements from his theory of motion was prompted by his wish to make the most of motion as a principle of explanation.

Pierre Duhem described Galileo’s conscious decision to abstain from a physical explanation of free fall as positivism or instrumentalism.[80] To consider Galileo an instrumentalist is highly improbable, however, considering his incessant struggle against an instrumentalist interpretation of Copernicanism. If he had been an instrumentalist, he would never have had a conflict with the Inquisition. Galileo emphasized that: ‘… the true constitution of the universe … exists; it is unique, true, real, and could not possibly be otherwise …’.[81]

Experimental proof

In Discorsi Galileo presented his law for the motion of falling bodies as an axiom, but he found it in an experiment with balls rolling down an inclined plane, slowing down that motion considerably. Galileo performed his experiments before 1606,[82] but he published them only in 1638.[83] Marin Mersenne in 1647 and others observed that Galileo could impossibly have found his law with the stated accuracy, if he measured time with a kind of water clock as described in Discorsi, or any other clock then available.[84] However, he could have determined the equality of successive time intervals with the help of musical beats, with the stated accuracy.[85] This did not allow Galileo of calculating the acceleration of the ball at various angles of inclination, or to extrapolate this to find the acceleration of free fall. This may have been fortunate, for he would have found a value significantly different from the value found later by Christiaan Huygens from his experiments on pendulums.[86] Huygens experimented with a conical pendulum and (like Riccioli in Almagestum Novum (1651) before him) determined the length of a pendulum having a period of one second. His measurement of the fall in one second of 15.096 Paris feet could be trusted to about +0.01 feet.[87] Huygens’ value corresponds to 9.807 m/sec2 for the acceleration of gravity at Paris, the modern value being 9.8087 m/sec2. Only in the eighteenth century, physicists discovered that a rolling ball divides its energy between the kinetic energy of its centre of mass and the rotation energy around this point, such that the acceleration of a rolling ball is less than that of a gliding body along the same plane. Still, Galileo concluded that the acceleration of free fall follows the same pattern as in the case of balls rolling down an inclined plane.

Galileo’s experiments led him to the insight that the velocity at any instant is determined by the time elapsed since the start of the motion, not by its distance to the end point. This refuted Aristotle’s view of fall as a motion toward the body’s natural place.[88] Initially, Galileo made a mistake by connecting the velocity with the path covered since the start of the movement.[89] The same mistake was made by Albert of Saxony,[90] and later by René Descartes. Isaac Beeckman, misunderstanding Descartes’ proof, arrived in 1618 at the correct law of fall, twenty years before Galileo published his discovery.

Shortly before 1610, Galileo found that the increase of velocity is proportional to the time passed since the start.[91] This became his definitive law of fall. For all bodies moving in a vacuum, whether vertically or along an inclined plane, the speed increases proportionally to the time passed, the proportionality constant depending on the angle of inclination.[92]

With this law, Galileo established that the distances, covered in equal times by a ball on an inclined plane, are in the same proportion as the odd numbers.[93] This means that the path covered since the start is proportional to the square of the time passed.

As a result, after Galileo time became the most important parameter in mechanics. It symbolizes the shift from space (the covered path) to motion (the kinetic time) as the new principle of explanation.

Galileo discovered that pendulums of equal length are isochronous, their periods being independent of the amplitude or the mass of the bob.[94] He found the period to be proportional to the square root of the length of the pendulum. The isochrony of pendulums introduced a mechanical standard of time. After Galileo the problem of the measurement of time became urgent, for navigation as well as for mechanics and astronomy. It also created the problem of the nature of kinetic time itself (3.7).

The earth’s motion and the tides

In his Dialogue, Galileo took pains to make the double motion of the earth acceptable. He does so in two ways. First, he disproves the advanced arguments against the moving earth.[95] Why don’t we feel a constant eastern wind? Because the atmosphere turns around with the earth. Why does a falling stone arrive at the foot of a tower, and not slightly to the west? Because the stone partakes in the motion of the earth. Here, Galileo applies the relativity of motion. (Galileo’s use of the principle of relativity to refute arguments against the earth’s motion differs from applying it in a mathematical theory, as Huygens did, several decades later.) Thus the unobservability of the earth’s motion is explained by the motion of everything moving together with the earth. Galileo’s arguments refute the objections against the earth’s motion, but do not prove it.[96]

Therefore, Galileo next looked for positive evidence, for phenomena only explainable by the motion of the earth. He found them in the retrograde motion of the planets (Copernicus’ argument), the observed motion of the sunspots, and eventually in the not yet observed stellar parallax. Galileo thought that also the tides are such a phenomenon. He was so much convinced of this that he wanted to call his book Dialogue on the tides. This was refused by the papal censor, who agreed to an impartial discussion of the structure of the cosmos, but would not allow the suggestion that the earthly motion could be proved. Galileo discussed his tidal theory in the fourth day of Dialogue.[97]

He rejected Kepler’s idea that the tides are caused by the moon.[98] According to Galileo the double motion of the earth, daily and annual, causes accelerations and decelerations, leading to a periodic motion of water in its basin. He stressed that the details of the tidal motion depend on the shape of the basin, by which he tried to meet the objection that his theory implies a period of one day, whereas the actual period is about twice a day.

Galileo’s theory of tidal motion probably dates from about 1596.[99] Galileo developed this theory further in 1616. It circulated as a manuscript, Discorsi sopra il flusso e reflusso del mare. In 1619, Francis Bacon rejected Galileo’s theory, because of his (Bacon’s) observations, published in 1616. Galileo rejected the observations as far as contradicting his theory.

In 1687, Newton proved Kepler to be right, because the gravitational pull by the sun and the moon determines the tides.[100] Nevertheless, Galileo, too, was not far off the track, in particular with respect to the relevance of the shape of the basin. Within the context of what he knew of mechanics, his theory of the tides is a marvellous achievement. It shows an awareness of the relevance of acceleration, of inertia, and of resonance. Also in Newton’s explanation, it is essential that the earth moves around the common centre of mass of the earth-moon system, and this motion being circular is accelerated. Hence, the aim of Galileo to prove the earth’s motion by the tidal theory is also achieved if it is replaced by Newton’s theory.

3.4. Cartesian physics

Between circa 1620 and 1650, René Descartes or Cartesius was a leading philosopher, regardless whether this word is taken in the seventeenth-century meaning of scientist, or in its modern meaning.[101] In particular the twenty years he lived in the Dutch Republic (1629-1649) were very fruitful. His first published book, Discours de la méthode, dates from 1637, but its three appendices, La dioptriqueLes méteors, and La géometrie, were written some time before. La géometrie contributed significantly to analytical geometry, which Descartes considered the paradigm of each science. The certainty provided by geometry is warranted by its method, and in order to arrive at the same level of reliability, each science should proceed by the same method. Principia philosophiae (1644) and its extended translation, Les principes de la philosophie (1647), present Descartes’ theories of motion.

Descartes advanced Galileo’s program of explaining motion by motion in several respects, such as the law of inertia, the law of conservation of motion, the problem of collision, the mechanical properties of matter, and the properties of light. Descartes was convinced that with his method he could solve all problems of his time. As a program to replace the Aristotelian scheme of explanation by a mechanistic one, Cartesian physics exerted a large influence. Though in many respects a cul-de-sac, Cartesian physics constitutes an essential part of the Copernican revolution.

After Galileo, Descartes became the main founder of mechanical philosophy,[102] attempting to reduce macroscopic phenomena to microscopic ones, to be explained by matter, quantity, shape, and motion. The transfer of motion only occurred by impact between mutually impenetrable material particles. Phenomena that could not be reduced in this mechanical way were excluded from physics.

Descartes divided reality into res extensa, the objective physical world, essentially extension identified with matter; and res cogitans, the subjective mental world, which essence is thought, the human mind.[103] He assumed the two worlds to interact via the pineal gland (near the centre of the human brain, between the two hemispheres), the ‘principal seat of the soul’,[104] the source of ‘clear and distinct ideas’.

Descartes identified space with matter. All matter is space, and all space is material,[105] presumably with a varying density. A vacuum is unintelligible. Matter is infinitely divisible in a mathematical sense.[106] For this reason, Descartes is usually considered not to have been an atomist.[107] However, on physical grounds he assumed the existence of particles, with a minimum dimension. He even distinguished three kinds of corpuscles into which space-filling matter was differentiated.[108] Normal bodies are composed of coarse matter. Interplanetary space is filled with fine matter, and the pores in both are filled by the finest material, composing the sun and the stars, and responsible for the transmission of light. Particles could only differ because of their spatial shape and magnitude.

Aristotle defined a substance as any individual combination of form and matter (3.2). Essential and accidental properties determine a material body. Essential properties were contrary: a substance was dry or moist, cold or warm, heavy or light. Properties like colour and taste were accidental. Like Galileo, Descartes did not distinguish essential and accidental, but primary and secondary properties.[109] This distinction follows from the Platonist proposition that the real world is not necessarily the world as we perceive it. Primary qualities belong to objects as they really are. Secondary qualities such as heat or colour have no independent existence apart from the senses. The primary properties of matter are related to extension or motion – volume, quantity of motion, hardness, impenetrability, etc.[110] Other qualities are secondary, and should be reduced to primary, mechanical properties. An example is Descartes’ reduction of magnetism to the motion of cork-screw particles fitting holes in magnetic materials like iron (4.1).

If the Aristotelians talked about primary or manifest properties, they referred to sensory experience. Under Plato’s influence, Galileo, Descartes, and other mechanists considered sensory experience to be secondary, in need of explanation on mechanical principles.

Impact

Galileo connected the principle of kinetic inertia with uniform circular motion (3.3), but his disciples took linear inertial motion for granted.[111] Before Galileo published anything on motion, Isaac Beeckman (who discussed his views with Descartes) distinguished two kinds of inertial motion: uniform rectilinear motion, and (like Copernicus) uniform rotation of a heavy body around its axis.[112]

Descartes accepted the infinity of the universe, and therefore did not share Aristotle’s and Galileo’s caution against rectilinear inertial motion. He posited the law, that a body on which no force is acting moves rectilinearly at constant speed as long as it does not collide with other bodies.[113] However, this situation is imaginary, because Descartes believed that space is filled with matter, and therefore actual motion can only occur in a vortex or whirlpool.

The introduction of the principle of inertia generated a problem unknown in Aristotelian physics – the problem of how motion can change. For Aristotle, celestial motion never changes, it being eternally circular. Sublunary natural motion simply ceases as soon as the body has arrived at its natural place. Violent motion ceases when the driving force no longer acts. Local motion does not change, but is a kind of change, change of position.

After the establishment of the principle of inertia the question arose how motion can be started, halted, or changed in direction. Clearly this can only be done by some external force, for if no external force is acting on a body, it continues its motion. Galileo never posited this problem in his published work,[114] but Descartes did. It is the main problem of his physics. According to the mechanist program of explaining motion by motion, any movement can be changed only by another moving body. The only conceivable possibility for this is a collision between the two bodies.

The problem of collision forms the heart of Cartesian physics. Descartes introduced quantity of motion as a measure of motion, operationally defined as the product of volume (quantity of matter) and speed. This definition differs from the later definition of linear momentum in two ways. First, Descartes took volume to be quantity of matter, because he identified matter with extension. Newton would amend this by taking mass, operationally defined as the product of volume and density, as quantity of matter (1.4). Next, Descartes considered velocity to be a scalar magnitude, like speed. Later, Huygens corrected this, by assuming velocity to be a vector, as we now call it, having direction as well as magnitude. In its corrected form, this is now called the law of conservation of momentum, the product of mass and velocity. During a collision (whether elastic or not) one body may transfer some quantity of motion to one or more other bodies, but only such that the vector sum of all momentums is conserved. For Descartes, collision was the only conceivable way to change the motion of a body.[115] The concept of force could only be applied as a derivative of action by contact. ‘The mechanical world view rested on a single, fundamental assumption: matter is passive. It possesses no active, internal forces.’[116]

Descartes considered his law of conservation of motion to be clear and distinct, therefore evidently true, not subject to empirical scrutiny. He assumed quantity of motion to be indestructible, because it is natural.[117] At the creation, God supplied the cosmos with a quantity of motion, never to change afterwards.

Descartes elaborated these ideas into seven laws of impact.[118] With only one exception, these laws are contradicted by the results of experiments with colliding objects. Admitting this, Descartes observed that his laws concern circumstances which cannot be realized in concrete reality. ‘… Descartes’ rules of impact describe fundamental processes within nature as God sees them.’[119]

The laws concern collisions between bodies in a vacuum, and a vacuum is impossible.

‘The proofs of all this are so certain, that even if our experience would show us the contrary, we are obliged to give credence to our mind rather than to our senses.’[120]

In his theory of impact, Descartes treated rest and motion as contraries. Besides the concept of quantity of motion, he applied quantity of rest, inertia. In a collision, if the body at rest is larger than the moving one, the quantity of rest dominates the quantity of motion.[121] The effects of spatial extension and motion are, respectively, quantity of rest (inertia) and quantity of motion (momentum).

Descartes needed the distinction between rest and motion to explain the existence of bodies moving as a whole. The parts of the body move together with the whole, but are at rest with respect to each other. If Descartes would not have introduced the idea of rest, the idea of universal motion would have excluded the existence of extended bodies.

Without admitting it in plain words, Descartes assumed some kind of absolute space, a space as seen by God. Elsewhere, Descartes contended that motion can only be relative.[122] This dilemma arose from his identification of space and matter. If matter is the same as space, local motion as change of position is strictly speaking impossible. The only possibility to create motion in a plenum arises when spatial parts exchange their positions. Hence, real motion occurs in vortices, circular motion returning into itself. Real vortex motion in a plenum is relative motion, and the non-existing idealized rectilinear motion in a void is absolute.

Descartes applied his view of vortex motion in a plenum to the celestial bodies. The sun turns around its axis, as was discovered by Galileo and others (3.3), and it drags along the surrounding matter, and hence the planets. According to Descartes, a rotating planet creates its own vortex, dragging around satellites like the moon.

Huygens’ mechanism

In Christiaan Huygens’ work, mechanical philosophy reached its acme.[123] He rejected Newton’s views on force, accepting only matter, quantity, shape, and motion as principles of explanation, but he also took distance from Descartes’ philosophy of clear and distinct ideas. He valued observations and experiments as sources of knowledge much higher than Descartes did. Galileo, Huygens, and Newton denied that rest and motion are contraries. They treated rest as a state of motion, with zero velocity. In a beginning movement, the object starting from rest passes through all degrees of speed, until arriving at its final speed.[124] This is the foundation of the principle of relativity, whether Galilean or Einsteinian. If every movement has a relative character, there cannot be a fundamental distinction between rest and motion. When Huygens applied the principle of relativity to Descartes’ laws of impact, all but one turned out to be false. Huygens corrected these, and together with Wallis and Wren, he solved the problem of elastic and inelastic collision.

Descartes had assumed that the vortex motion of matter around the rotating earth caused a centripetal motion of all bodies having density less than the whirling matter. Huygens thought he was able to demonstrate this effect in an experiment (1668). He put pieces of sealing wax into a pail of water. If the pail was kept rotating, the pieces of wax moved to the periphery. But if the rotation were interrupted, all pieces moved to the centre, ‘… in one piece, which presented me the effect of gravity.’[125] One objection against Descartes’ theory was that the density of the imperceptible whirling matter would have to be larger than the density of all bodies falling to the earth. Moreover it is difficult to understand why gravity is directed to the centre, rather than to the axis of the earth’s rotation. In an ingenious way, Huygens sought to meet these objections. He published his theory, developed in 1667, only in 1690, three years after the much more successful theory of Newton, which Huygens admiringly but critically discussed.

Explaining gravity from mechanical motion, Descartes was the first to relate celestial motion with the motion of a falling body.[126] Galileo never connected gravity with the natural motion of the planets. Kepler compared gravity with magnetism, and magnetism with planetary motion, but never gravity with planetary motion. Kepler rejected the Aristotelian view of gravity as a natural tendency of heavy bodies toward the centre of the universe. He sustained Copernicus’ view that gravity is a mutual corporeal affection between cognate bodies tending to unite them,[127] but this did not inspire him to connect gravity with planetary motion.

3.5. Early concepts of force

Johannes Kepler always considered himself a Copernican, because he adhered to the idea of the moving earth. Yet he broke away from Copernicus’ fundamental idea of uniform, circular motion (2.6). Kepler’s first two laws, published in Astronomia nova (1609), indicate exactly the failure of Platonic and Copernican models. The planetary orbits are not circular but elliptic, and the planetary motion is not uniform, but varies according to the area law. Hence it should not be amazing that many Copernicans after Kepler rejected his results, and held to the uniform circular motion. For the mechanists like Galileo, Descartes, and Huygens, Kepler’s views did not fit their program of explaining motion by motion. Moreover, Kepler’s calculations were very difficult to understand and only comprehensible by professional astronomers. It took quite a long time before other astronomers confirmed Kepler’s discoveries.

Kepler realized that planetary motion deviating from uniform circular motion needs a non-kinematical explanation, which he sought in a kind of force. Aristotle knew forces only as causes of violent motion. Since Archimedes’ works were rediscovered, force became an important concept in the study of equilibrium situations. In both cases, forces were restricted to the sub-lunar realm.[128] Kepler was the first to use the concept of a force to planetary motion. Initially, in his Mysterium cosmographicum (1597), Kepler assumed an animistic view, supposing each planet’s motion to be conducted by a soul. But in a footnote added in 1621, he wrote that everywhere the word soul should be replaced by force, the moving soul of the planets, the cause of planetary motion.[129]

Kepler conjectured that the sun exerts an influence on the planets, pushing them around in their orbital motion. Like Aristotle, Kepler supposed the force keeping a body in violent motion to be proportional to its speed. Because a planet’s velocity is largest if it is closest to the sun, Kepler concluded this force to be inversely proportional to the distance from the sun. He supposed it to be tangential, directed along the planetary orbit. It was by no means attractive, i.e., directed towards the sun. Kepler suggested that the rotation of the sun causes the revolution of the planets.[130] Kepler estimated the period of the sun’s revolution to be about three days, and he was disappointed to learn from Galileo’s investigation of the sunspots that the actual period is thirty days.[131]

Like Galileo, Kepler admired William Gilbert’s book De magnete (On the magnet, 1600).[132] Gilbert was a halfway Copernican, accepting the diurnal rotation of the earth, but not committing himself to its annual motion. Contrary to Peregrinus, who in 1269 ascribed the properties of the compass needle to the rotation of the heavens,[133] Gilbert ascribed these to the rotating earth, which he considered a huge magnet (4.1). He believed magnetism to be the driving force for the diurnal rotation of the earth.

Kepler adopted this view, and he assumed that the force exerted by the sun on the planets is also magnetic,[134] as well as the influence of the moon on the tides. Galileo and Descartes rejected both ideas, because they wanted to explain motion by motion. Galileo executed this program with respect to the tides (3.3), whereas Descartes gave a mechanical explanation of magnetism (4.1).

Manifest and occult principles

The Copernicans rejected the Aristotelian distinction between celestial and terrestrial physics in two ways. The first was assuming that the same laws apply to both realms. This line was followed by Galileo, Descartes, and finally by Newton. The second way was to assume that the same force acts universally at the earth and in the heavens. This is Kepler’s line, who considered magnetism in this capacity, and again Newton’s, who demonstrated gravity to be the force determining planetary motion as well as the motion of falling bodies. ‘The force which retains the moon in its orbit is that very force which we commonly call gravity.’[135]

The second line was first pursued by astrologers, starting from the assumption of a parallel development of celestial and terrestrial events. Kepler was the last Copernican to be sympathetic to astrology. After him, astrology became definitely occult (dark). The Aristotelians called qualities occult if these could not be reduced to the manifest qualities, directly observable with the senses, such as hot and cold, moist and dry, hard and soft (3.2).[136] Also gravity and levity were manifest properties. Their key example of an occult property was magnetism.

Mechanists considered properties occult if these could not be reduced to the clear and evident principles of mechanics, to matter, quantity, shape, and motion. Hence they were proud of Descartes’ achievement, the reduction of magnetism to the motion of cork-screw particles, and the explanation of gravity by vortex motion (3.4). They objected to Newton’s theory of gravity, with its inherent principles of attraction and action at a distance, which they considered occult. Not being a mechanist philosopher, Newton rejected this view. For him, gravity was a manifest property, universally shared by all bodies, no less than their extension, hardness, impenetrability, mobility, and inertia.[137] However, as a force, gravity is not a property of a body apart from other bodies by which it is attracted.

The novelty introduced by Kepler and Newton concerns force as a dynamic principle, as a cause of change of motion. Aristotle too connected force with violent motion, but this motion was not variable.

Influenced by Archimedes, who studied the problem of the lever (and perhaps by the medieval scholar Jordanus Nemorarius), Tartaglia, Benedetti, Stevin, Galileo, Torricelli, and Huygens developed the static principle of force.[138] The most important example of force was weight, the only kind of force considered by Archimedes, and problems concerning the centre of gravity were very prominent during the seventeenth century. The static concept of force and pressure was also applied in hydrostatics and aerostatics (9.5).

3.6. Newton’s dynamics

Isaac Newton’s ideas on mechanics, gravity, and planetary motion were shaped between circa 1665 and 1685, during his years at Cambridge, quite long after the lifetimes of Kepler, Galileo, and Descartes, whose works he used and criticized.[139] He arrived at the insight that besides matter and motion, force as a new principle of explanation was required, independent of motion. He became the most important representative of experimental philosophy.[140]

Newton took pains to demonstrate that he was a true disciple of Copernicus, though not in the mechanist line of Galileo, Descartes, and Huygens, but in Kepler’s line. Although he read little or nothing of Kepler’s works, via Borelli he inherited from Kepler the laws of planetary motion (which he fitted into his theory of gravitation), as well as Kepler’s respect for observations, and the idea of force as a dynamic principle. Later on, we shall discuss the theory of gravity (9.4). The present section deals with the dynamical concept of force in the context of mechanics.

In the introduction to Principia (merely 28 pages in the English edition), Isaac Newton presented a summary of mechanics he was about to use. It contains operational definitions, axioms or laws of motion, several theorems, and a philosophical exposition of his ideas on space and time, commenting:

‘Hitherto I have laid down such principles as have been received by mathematicians, and are confirmed by abundance of experiments.’[141]

He introduced a new operational definition of mass (1.4), and discussed the metrics of time, space, and inertial motion (3.7). He wrought a synthesis of seventeenth-century physics (7.1), but he rejected Cartesian mechanical philosophy.

Innate force and impressed force

Mechanists like Descartes and Huygens could only conceive of force as an effect of motion and action by contact. Huygens considered a centrifugal acceleration as caused by a uniform circular motion. Whereas anyone before them considered uniform circular motion to be natural, not in need of a driving force, Hooke and Newton realized that a centripetal force is required to maintain it. Their novelty was to introduce an entirely new view of force as a dynamic principle, as a cause of change of motion. Newton’s concept of force is not as univocal as it is now.[142] In fact, Newton’s views on the activity of matter, strongly influenced by his alchemical research, changed continuously throughout his life.[143] He distinguished several kinds of force (Latin: vis).

Vis inertiae, the force of inertia (not to be confused with the inertial forces to be mentioned below), is also called vis insita or innate force. It is for any body proportional to its quantity of matter, now called its mass. Conform his predecessors, Newton initially considered vis inertiae the force keeping a body in its inertial state. In Principia it became the force that resists any change of an inertial state of rest or uniform motion. If besides the innate force no other force is acting on the body, the latter moves uniformly and rectilinearly. We call this Newton’s first law, but he got it from Descartes (3.4), perhaps via Huygens’ Horologium oscillatorium (1673).[144]

What physicists now call a force is Newton’s vis impressa, an external force acting on a body. Its accelerating effect on the body’s motion is inversely proportional to the body’s mass, to the inertial force. Newton considered the case of an external force acting during a short time. The effect of an impulse (the product of an impressed force and its duration) is a change of the quantity of motion (linear momentum), which Newton operationally defined as the product of mass and velocity. This is Newton’s force law, the second law of motion. It is nowadays better known as the product of mass and acceleration. This refers to a continually acting force, and was not explicitly given by Newton, but he used it all the same.

Like Huygens, Newton considered velocity to have direction as well as magnitude. The same applies to impressed force, for which he discussed the law of composition.[145] Impressed forces acting on the same body can balance each other. An unbalanced force acting on a body changes its speed, direction, or both.

Newton suggested that the force law had been accepted by all his contemporaries, even by Galileo.[146] Maybe he referred to the static concept of force (3.5). But Newton’s dynamic interpretation of force as the cause of changing motion was original, if he did not take it from Robert Hooke, who made him aware that uniform circular motion requires a centripetal force. The principle of inertia connects the static and dynamic views of force. The static view is applied in equilibrium situations. In the dynamic view, if all forces acting on a body balance each other, the body is at rest or moves uniformly along a straight line.

One may wonder why Newton introduced the law of inertia as an axiom, because at first sight it can be derived from the force law. If the net force on a body is zero, its acceleration is zero; hence its velocity is constant.[147] However, according to both common sense and Aristotelian physics, violent motion ceases if the force ceases to act. Common sense assumes that each body experiences a frictional force, dependent on speed, in a direction opposite to the velocity. Accordingly, if the total force on a body is zero, the body would be at rest. A unique reference system would exist in which all bodies on which no forces act would be at rest. This would agree with Aristotle’s mechanics, but it contradicts both the classical principle of relativity and the modern one. Newton’s second law alone does not refute this view. Only in combination with the first law the common sense view was refuted.

The force law states that a body moving under the influence of an external unbalanced force accelerates, but it does not specify with respect to what the acceleration is determined. The answer is that the acceleration is measured with respect to an inertial system. Apparently, this only shifts the problem, because now the question arises, with respect to what one can speak of an inertial system. The answer to this question is given by the first law: an inertial system is a body moving without the influence of an external force. Newton defined inertial motion with respect to absolute space, but he admitted that this absolute motion cannot be measured (3.7). This means that (as a matter of principle) the first law confirms the existence of inertial systems.

Like Huygens, Newton considers velocity to have direction as well as magnitude. The same applies to force, for which he discusses the law of composition.[148] Hence, an impressive force is present if speed, direction or both change.

Centrifugal or centripetal force

The distinction between force as caused by motion, and force as cause of motion is manifest with respect to circular motion. Only since the introduction of linear inertia, excluding circular inertial motion, it has become clear that circular motion is accelerated.

The first to investigate this problem was Christiaan Huygens. He derived the correct formula for the acceleration. Faithful to the program of explaining motion by motion, he introduced the centrifugal force as a result of circular motion.[149] Hooke and Newton, on the other hand, supposing that acceleration needs a force, introduced the centripetal force as a cause of circular motion.[150] It can be identified as a real, physical force like a magnetic or gravitational force.

Since then, in Newtonian mechanics, centrifugal force is considered an apparent force, an inertial force (not to be confused with Newton’s vis inertia, mentioned above). An inertial force like the centrifugal force or the Coriolis force only occurs in a non-inertial reference system, a rotating reference system, for example. An inertial force does not satisfy Newton’s third law. It would be wrong to consider the centrifugal force the reaction to the centripetal force, because these forces act on the same body, whereas the law of action and reaction refers to the forces between two bodies. The centrifugal force balances the centripetal force only in the rotating reference system, in which the body is supposed to be at rest. In an inertial reference system, the centripetal force causes the body to accelerate, whereas the centrifugal force does not exist.

Action and reaction

Newton also ascribed the third law to others,[151] but that is no more than false modesty. The famous law of action and reaction was brand-new:

‘To every action there is always opposed an equal reaction: or, the mutual actions of two bodies upon each other are always equal, and directed to contrary parts.’[152]

Like Descartes, Newton assumed that the motion of a body can only be changed by another body. However, this change is not caused by the motion of the second body, but by the force acting between them, either by contact or at a distance. Because this force has a reciprocal nature, the motion of the second body is changed as well. The law of conservation of linear momentum was for Descartes an axiom, but it now became a theorem (a corollary) to be derived from Newton’s laws.[153] For Newton, force takes precedence over motion.

The third law may be considered the constitutional law of experimental philosophy. As we shall see, it allows of measuring impressed forces. It distinguishes Newtonian dynamics from Cartesian or Leibnizian mechanics. Unlike inertia (the innate force), the impressed force is not a property of bodies, but a relation between them.

The dualism of force and matter

The three axioms or laws of motion lie at the foundation of the Newtonian dualism of force and matter. Throughout his life, Newton maintained an ambivalent position with respect to this dualism, because he was under the neo-Platonic spell that matter could only be passive.[154] To accept that matter could be active would make it independent of God. Therefore Descartes identified matter with extension, and he assumed that God had created an invariant quantity of motion. By introducing force as a new principle of explanation, Newton made matter more active than the mechanists would allow. By vis insita, the force of inertia, each body resists change of motion. Circular motion requires a vis centripetalis as a cause, instead of Huygens’ vis centrifugalis as an effect. Matter became interactive as a source of vis impressa, subject to the law of action and reaction, first of all the mutual force of gravity, later also as the source of electricity and magnetism. Matter turned out to have specific properties, having mass and chemical, electric, or magnetic properties, contrary to the mechanist view that matter can only have spatial extension and shape. In order to maintain God’s sovereignty over matter, Newton emphasized that any kind of force is subject to laws. Between 1700 and 1850, the matter-force dualism became the inspiration for the development of electricity (electric charge and Coulomb force), magnetism (magnetic force and pole-strength), and thermal physics (temperature difference as a force, and heat considered as matter).[155]

Newton’s third law made interaction a new principle of explanation. It interprets a force as a relation between two interacting bodies, on a par with the earlier discussed relations of quantity of matter, spatial distance, and relative motion. Though an actual force may partly depend on mass or spatial distance, as is the case with gravitational force, or on relative motion, as is the case with friction, a force is conceptually different from quantitative, spatial, or kinetic relations.

In contrast to impressed force as a relation between bodies, Descartes’ quantity of motion, Huygens’ linear momentum, and Leibniz’ vis viva were supposed to be variable properties of a moving body, transferable to other bodies. In the eighteenth century, disciples of Descartes and Leibniz quarrelled about the priority of momentum and vis viva. The Newtonian scholar Jean d’Alembert demonstrated these concepts to be equally useful, momentum being the time-integral of the Newtonian force acting on a body, and vis viva being its space-integral.[156] This means that force is the cause of change of momentum, and vis viva is the ability to perform work. But this compromise proposal was evidently unacceptable for both parties, because it would imply the recognition of the priority of the Newtonian force.

Conservation laws

By the three laws of motion, force is attributed a higher status than momentum. In the second law, impressed force and momentum occur at the same level, and the first law mentions neither. But in the third law only impressed forces occur. Moreover, the law of conservation of linear momentum and Kepler’s second law Newton could derive from his third law.[157] Euler did the same for the law of conservation of angular momentum in general. In 1847, Helmholtz showed this to apply to the law of conservation of energy as well.

This derivation depends on the rather severe condition that all forces can be reduced to so-called central forces, working between point-masses. This condition was acceptable for most Newtonians, because it fitted into their atomistic views. But it was far less acceptable for the Cartesians, who rejected atomism in favour of the identification of matter and space, making bodies extensive in principle.

The third law maintained its priority over the conservation laws until the end of the nineteenth century. Only in the field theories of Maxwell and Einstein, implying a return to Cartesian views of matter and space, it turned out that the above mentioned condition is too narrow. Since the twentieth century, physicists prefer to reverse the situation. Newton’s third law is shown to be a consequence of the law of conservation of momentum, under certain conditions. Hence, in modern physics the conservation laws have a higher status than Newton’s laws of motion, and the concepts of energy and momentum are more important than the concept of force.

This does not mean that conservation laws were absent in Newtonian mechanics and physics. Related to the material side of the matter-force dualism, several conservation laws were developed after Newton’s death – the laws of conservation of matter in chemistry, of electric charge, of magnetic pole-strength, and of heat.

3.7. Absolute and relative space,

time, and motion

It is often said that the shift from the geocentric to the heliocentric world view implies that mankind no longer held the centre of the universe, and had to be content with a more modest position.[158] This is typical hindsight. It was by no means the view of Copernicus, Kepler, and Galileo, who knew the background of ancient and medieval cosmology better than our present-day world viewers. In this cosmology, the central position of the earth was by no means considered important. The earth, including its inhabitants, was considered imperfect, occupying a very low position in the cosmological hierarchy. With the advent of the heliocentric world view, man was not ‘bereft from his central place’, but was ‘placed into the heavens’, the earth becoming a planet at the same level as the perfect celestial bodies.[159]

‘As for the earth, we seek rather to ennoble and perfect it when we strive to make it like the celestial bodies, and, as it were, place it in the heaven.[160]

The Copernican view of the cosmos was of great influence on the concepts of space and time.[161] In Aristotelian physics space is finite, bounded by the starry sphere, but time is infinite. Aristotle recognized neither beginning nor end of the cosmos and this embarrassed his medieval disciples. The Christian world view requires a beginning, the Creation, as well as an end, the return of Jesus Christ to the earth.

In the Aristotelian view of the cosmos determined by the form-matter motif, the earth stood still at the centre of the universe, which as a whole was not very much larger than the earth. Of course, Aristotle knew that the dimensions of the earth are much smaller than those of the sphere of the stars, but the latter was considered to be small enough to take the argument of stellar parallax seriously (2.4). When Copernicus introduced the annual motion of the earth, he had to enlarge the minimum dimension of the starry sphere such that the distance between the earth and the sun becomes negligible compared to it. This turned out to be a step towards the idea of an infinite universe. Copernicus, Kepler, and Galileo, however, still considered the cosmos to be spatially finite. Galileo observes that there is no proof that the universe is finite.[162] Aristotle’s assumption that the universe is finite and has a centre depends on his view that the starry sphere moves. Descartes, on the other hand, identified physical space with mathematical, Euclidean space, and therefore took it to be infinite.

In Aristotelian physics the place of an object is its immediate environment.[163] The natural place of the element earth is water, surrounding the earth. The natural place of the sphere of fire is above the sphere of air and below the lunar sphere. The place of Saturn is the sphere to which it is attached, above Jupiter’s sphere and below the starry sphere. Descartes agreed that the place of a body is its environment. On the other hand, he realized that the position of a body can be determined with respect to a coordinate system, and is not in need of material surroundings. He vacillated between the views that motion is relative and that it is absolute. (Also Galileo was aware of the principle of a Cartesian coordinate system[164]).

Inspired by his view on inertia, Newton devoted one quarter of his summary of mechanics to a scholium on space, time, and motion.[165] He did not intend to give definitions of these concepts, ‘as being known to all.’ His first aim was to make a distinction between absolute and relative time. In this context the term relative appears to differ from the now usual one, implying that the unit and the zero point of time are arbitrary. By relative time Newton meant time as actually measured by some clock.

‘Absolute, true and mathematical time, of itself, and from its own nature, flows equably without relation to anything external, and by another name is called duration: relative, apparent, and common time, is some sensible and external (whether accurate or unequable) measure of duration by the means of motion, which is commonly used instead of true time; such as an hour, a day, a month, a year.’[166]

Some clocks may be more accurate than others, but in principle no measuring instrument is absolutely accurate. By absolute time Newton meant a universal standard or metric of time, independent of measuring instruments. No one before Newton posed the problem of distinguishing the standard of time from the way it is measured. It could only be raised in the context of experimental philosophy. After Newton, the establishment of a reliable metric for any measurable quantity became standard practice in the physical sciences (1.4). During the Middle Ages, the establishment of temporal moments (like noon or midnight, or the date of Eastern) was more important than the measurement of temporal intervals, which was only relevant for astronomers. Mechanical clocks came into use since the thirteenth century, with a gradually increasing accuracy.[167]

Aristotle defined time as the measure of change, but his physics was never developed into a quantitative theory of change, and this conceptual definition did not become operational. Galileo discovered the isochrony of the pendulum. Its period of oscillation depends only on the length of the pendulum, and is independent of the amplitude (as long as it is small compared to the pendulum’s length) and of the mass of the bob. Experimentally, this can be checked by comparing several pendulums, oscillating simultaneously. Pendulums provided the means to synchronize clocks.

In 1659 Huygens derived the pendulum law making use of the principle of inertia, but apparently he did not see the inherent problem of time. Like Aristotle and Galileo, he just assumed the daily motion of the fixed stars (or the diurnal motion of the earth) to be uniform, and thus a natural measure of time. But Newton’s theory of universal gravitation applied to the solar system showed that the diurnal motion of the earth may very well be irregular. It is a relative measure of time in Newton’s sense.

The problem of absolute time, space, and motion is most pregnant expressed in Newton’s first law, the law of inertia:

‘Every body continues in its state of rest, or of uniform motion in a right line, unless it is compelled to change that state by forces impressed upon it.’[168]

Uniform motion means that equal distances are traversed in equal times. This means that the absolute standard of time is operationally defined by the law of inertia itself. The accuracy of any actual clock should be judged by the way it confirms this law. The law of inertia is a genuine axiom, because there is no experimental way to test it.

However, Newton did not follow this path. The only way he saw to solve the problem was to postulate an absolute metric for any clock, together with an absolute space. Newton admitted that the velocity of an inertial moving body can never be determined with respect to this absolute space, but he maintained that non-uniform motion with respect to absolute space can be determined experimentally.[169] He hung a pail of water on a rope, and made it turn. Initially, the water remained at rest and its surface horizontal. Next, the water began rotating, and its surface became concave. If ultimately the rotation of the pail was arrested abruptly, the water continued its rotation, maintaining a concave surface. Newton concluded that the shape of the surface was determined by the absolute rotation of the fluid, independent of the state of motion of its immediate surroundings. Observation of the shape of the surface allowed him to determine whether the fluid was rotating or not. In a similar way, Jean Foucault’s pendulum experiment (1851) demonstrated the earth’s rotation without reference to some extraterrestrial reference system, such as the fixed stars. Both Newton and Foucault supplied physical arguments to sustain their views on space as independent of matter. Descartes’ mechanical philosophy identified matter with space. In his mechanics and theory of gravity, Newton had to distinguish matter from space and time. In the eighteenth and nineteenth centuries Newton’s views on space and time became standard. ‘Newton’s absolute, infinite, three-dimensional, homogeneous, indivisible, immutable, void space, which offered no resistance to the bodies that moved and rested in it, became the accepted space of Newtonian physics and cosmology for some two centuries.’[170]

Gottfried Leibniz and Samuel Clarke (the latter acting on behalf of Newton) discussed these views in 1715-1716, each writing five letters.[171] : ‘It was less a genuine dialogue than two monologues in tandem …’[172] Leibniz held that space as the order of simultaneity or co-existence, and time as the order of succession, only serve to determine relations between material particles. Denouncing absolute space and time, he said that only relative space and time are relevant. But it is clear that relative now means something different from Newton’s intention. Apparently Leibniz did not understand the relevance of the principle of inertia for the problem of the metrics of space and time. ‘Abandoning Newtonian space and time in the manner Leibniz called for would entail abandoning the law of inertia as formulated in the seventeenth century, a law at the heart of Leibniz’s dynamics.’[173]  

The debate focussed on theological questions. For Newton and virtually all his predecessors and contemporaries, considerations of space and time were related to God’s eternity and omnipresence.[174] This changed significantly after Newton’s death, when scientists took distance from theology: ‘… scientists gradually lost interest in the theological implications of a space that already possessed properties derived from the deity. The properties remained with the space. Only God departed.’ … ‘It was better to conceive God as a being capable of operating wherever He wished by His will alone rather than by His literal and actual presence. Better that God be in some sense transcendent rather than omnipresent, and therefore better that He be removed from space altogether. With God’s departure, physical scientists finally had an infinite, three-dimensional, void frame within which they could study the motion of bodies without the need to do theology as well.’[175] This does not mean that later physicists were not faithful Christians. For instance, Michael Faraday was a pious and active member of the strongly religious Sandemanians, but he separated his faith firmly from his scientific work. Natural theology remained influential during the eighteenth and nineteenth centuries, but its focus shifted to biology and geology, and after Newton it had no significant influence on the contents of classical physics.

Further developments

Leibniz’ rejection of absolute space and time was repeated by Ernst Mach in the nineteenth century, who in turn influenced Albert Einstein, although later Einstein took distance from Mach’s opinions. Mach denied the conclusion drawn from Newton’s pail experiment.[176] He said that the same effect should be expected if it were possible to rotate the starry universe instead of the pail with water. The rotating mass of the stars would have the effect of making the surface of the fluid concave. This means that the inertia of any body would be caused by the total mass of the universe.[177] It has not been possible to find a mathematical theory (not even the general theory of relativity) or any experiment giving the effect predicted by Mach. ‘… to this day Mach’s principle has not brought physics decisively farther.’[178] Mach’s principle, stating that rotational motion is just as relative as linear uniform motion, is therefore unsubstantiated. Whereas inertial motion is sui generis, independent of physical causes, accelerated motion with respect to an inertial system always needs a physical explanation.

Newton treated the metric of time independent of the metric of space. Einstein showed these metrics to be related. Both Newtonian and relativistic mechanics use the law of uniform time to introduce inertial systems. An inertial system is a spatial and temporal reference system in which the law of inertia is valid. It can be used to measure accelerated motions as well. Starting with one inertial system, all others can be constructed by using either the Galileo group or the Lorentz group, both reflecting the relativity of motion and expressing the symmetry of space and uniform time.[179] In 1831 Évariste Galois introduced a group as a mathematical structure describing symmetries. In physics, groups were first applied in relativity theory, and since 1925 in atomic, molecular, and solid state physics. One of the first text books on quantum physics (Weyl 1928) dealt with the theory of groups. The spatio-temporal groups start from the axiom that kinetic time is uniform. In the classical Galileo group, the unit of time is the same in all reference systems. In the relativistic Lorentz group, the unit of speed (the speed of light) is a universal constant. Late nineteenth-century measurements decided in favour of the latter. In special relativity, the Lorentz group of all inertial systems serves as an absolute standard for temporal and spatial measurements.

Time as measured by a clock is called uniform if the clock correctly shows that a subject on which no net force is acting moves uniformly.[180] This appears to be circular reasoning. On the one side, the uniformity of motion means equal distances in equal times. On the other hand, the equality of temporal intervals is determined by a clock subject to the norm that it represents uniform motion correctly.[181] This circularity is unavoidable, meaning that the uniformity of kinetic time is an axiom that cannot be proved, an expression of a fundamental law. Uniformity is a law for kinetic time, not an intrinsic property of time. There is nothing like a stream of time, flowing independently of the rest of reality. Time only exists in relations between events, as Leibniz maintained, although he did not understand the metrical character of time. The uniformity of kinetic time expressed by the law of inertia asserts the existence of motions being uniform with respect to each other. If applied by human beings constructing clocks, the law of inertia becomes a standard. A clock does not function properly if it represents a uniform motion as non-uniform. But that is not all.

Periodic time

Whereas the law of inertia allows of projecting kinetic time on a linear scale, time can also be projected on a circular scale, as displayed on a traditional clock, for instance. The possibility of establishing the equality of temporal intervals is actualized in uniform circular motion, in oscillations, waves, and other periodic processes, on an astronomical scale as in pulsars, or at a sub-atomic scale, as in nuclear magnetic resonance. Besides the kinetic aspect of uniformity, the time measured by clocks has a periodic character as well. Periodicity is not only a kinetic property, but a spatial one as well, as in crystals. In a periodic wave, the spatial periodicity is expressed in the wavelength, the temporal one in the period, both repeating themselves indefinitely.

Whereas inertial motion is purely kinetic, the explanation of any periodic phenomenon requires some physical cause besides the principle of inertia. Mechanical clocks depend on the regularity of a pendulum or a balance, based on the force of gravity or of a spring. Huygens and Newton proved that a system moving with a force directed to a centre and proportional to the distance from that centre is periodic. This is the case in a pendulum or a spring. Electronic clocks apply the periodicity of oscillations in a quartz crystal.

Periodicity has always been used for the measurement of time. The days, months, and years refer to periodic motions of celestial bodies moving under the influence of gravity. The modern definition of the second depends on atomic oscillations. In the twentieth century, a second became defined as the duration of 9,192,631,770 periods of the radiation arising from the transition between two hyperfine levels of the atom caesium 133. This number gives an impression of the accuracy in measuring the frequency of electromagnetic microwaves.  The periodic character of clocks allows of digitalizing kinetic time, each cycle being a unit, whereas the cycles are countable. The uniformity of time as a universal law for kinetic relations and the periodicity of all kinds of periodic processes determined by physical interactions reinforce each other. Without the uniformity of inertial motion, periodicity cannot be understood, and vice versa.

At the end of the nineteenth century, Ernst Mach and Henri Poincaré suggested that the uniformity of time is merely a convention.

‘The question of whether a motion is uniform in itself has no meaning at all. No more can we speak of an “absolute time”, independent of any change.’) One has no intuition of the equality of successive time intervals.’[182]

This philosophical idea would have the rather absurd consequence, that the periodicity of oscillations, waves, and other natural rhythms would also be based on a convention. According to Reichenbach it is an ‘empirical fact’ that different definitions give rise to the same ‘measure of the flow of time’: natural, mechanical, electronic or atomic clocks, the laws of mechanics, and the fact that the speed of light is the same for all observers.[183] More relevant is to observe that physicists are able to explain many kinds of periodic motions and processes based on laws presupposing the uniformity of kinetic time as a fundamental axiom.

Motion and interaction as mutually independent principles of explanation

In Newton’s work impressed force is the most important concept besides matter. This may be called the strongest rupture with the mechanists, who wanted to explain motion by motion. For Galileo and Descartes, matter was characterized by quantity, extension, shape, and motion.[184] Motion could only be caused by motion.[185] Newton emphasized that perceptibility and tangibility were characteristic of matter as well. The property of matter to be able to act upon things cannot be grounded on extension alone. Newton introduced a new principle of explanation, now called interaction. Besides quantitative, spatial, and kinetic relations, interactions turn out to be indispensable for the explanation of natural phenomena.

Galileo and Descartes showed motion to be a principle of explanation independent of the quantitative and spatial principles. This led them to the law of kinetic inertia, now called Newton’s first law. Descartes assumed that all natural phenomena should be explained by motion as well as matter, conceived to be identical with space. Newton relativized this kinetic principle, by demonstrating the need of another irreducible principle of explanation, the physical principle of interaction.[186] However, Newton only made a start. For, as a Copernican inspired by the idea that the earth moves, his real interest was in the explanation of all kinds of motion, including accelerated motion. The full exploration of the physical principle of explanation did not occur during the Copernican era, but in the succeeding centuries.

Although this Copernican commitment partly justifies Dijksterhuis’ view that Newton fulfilled the ‘mechanization of the world picture’, the distinction between Cartesianism and Newtonianism is important enough to shed some doubt on this view. It is not improbable that Dijksterhuis considered the Copernican era too much from the viewpoint of late nineteenth-century mechanism, which included a revival of Cartesianism.


[1] Stafleu 2002, 2011, 2015, 2019b.

[2] Drake 1970, 53; Cohen 1980b; Levenson 1994.

[3] Dreyer 1906, chapter 2; Heath 1913, chapter 6, 12; Guthrie 1962-1981, I, 282-301.

[4] Kepler 1619; Koyré 1961, 326-343.

[5] Plato, Timaeus, 1179-1186.

[6] Popper 1963, chapter 2.

[7] Guthrie 1962-1981, II, 1-79.

[8] Guthrie 1962-1981, II, 28. This is not literally Parmenides’ text, but a paraphrase.

[9] Salmon (ed.) 1970, 5-16, 45-58. The main source of Zeno’s paradoxes is Aristotle, Physics, VI, 2, 9; VIII, 8. See Clavelin 1968, 34-48; Guthrie 1962-1981, II, 2, 91-96.

[10] See Plato’s parable of the cave, Plato, Republic VII, 747-750.

[11] Aristotle, Physics, I, 7.

[12] Aristotle, Physics, V, 2.

[13] Aristotle, Physics, I 8, 9.

[14] Aristotle, Physics, III, 1.

[15] Aristotle, Metaphysics, I, 3, V, 2; , Physics, II, 3, 7.

[16] Aristotle, Metaphysics, XII, 2; Physics, III, 1.

[17] Aristotle, Physics, VIII, 7, 9.

[18] Aristotle, Physics, VIII, 8, 9.

[19] Aristotle, On the heavens, I, 2, 3.

[20] Clavelin 1968, chapter 1.

[21] Aristotle, Physics, IV, 8.

[22] Koyré 1939, 7.

[23] Aristotle on projectile motion: Physics, VIII, 10; see Koyré 1939, 51.

[24] Galileo 1632, 151.

[25] Drake 1970, 38.

[26] Clavelin 1968, 96-97.

[27] Aristotle, On the heavens,  I, 8.

[28] Copernicus 1543, 38-40 (I, 4).

[29] Stafleu 2019a, chapters 2, 3.

[30] Galileo 1610. Drake (ed.) 1957, 19.

[31] Galileo 1613. On Galileo, see de Santillana1955; Drake 1957, 1970, 1978, 1990; Finocchiaro 1980, 1989, 2005; Redondi 1983; Shea 1986; McMullin 2005; Gaukroger 2006; Heilbron 2010; Wootton 2010, 2015.

[32] Shea 1986, 118-119.

[33] Galileo 1632, 71-78.

[34] Galileo 1610, 42-45; 1632, 67-69, 91-99.

[35] Clavelin 1968, 199-203.

[36] Galileo 1613, 98; 1632, 54, 58.

[37] Galileo 1632, 51.

[38] Galileo 1632, 412.

[39] Drake 1978, 42..

[40] Drake 1975; Dijksterhuis 1950, 372 (IV: 85); Clavelin 1968, 177.

[41] Galileo 1613, 107-109; 1632, 345-356. See Drake’s note on page 486 to Galileo 1632, 354.

[42] Galileo 1623, 274. Galileo’s philosophy is also discussed in Stafleu 2019a, 2.1. 

[43] Galileo 1623, 277-278.

[44] Galileo 1638, 98-99. See Drake 1970, chapter 2.

[45] Galileo 1623,276-277.

[46] Plato, Timaeus, 1186-1192.

[47] Descartes 1647, 77-78; Koyré 1939, 130-131; Dijksterhuis 1950, 193-194 (II: 108).

[48] Galileo 1632, 116: motion does not act.

[49] Galileo 1632, 21: rest is an infinite degree of slowness.

[50] Galileo 1613, 113-114; 1632, 145-148.

[51] Galileo 1638, 153.

[52] Galileo 1632, 19; 1638, 215; 1613, 113.

[53] Galileo 1638, 161; Koyré 1939, 181.

[54] Koyré 1939, 180.

[55] Galileo 1632, 145-148.

[56] Galileo 1632, 20-21; 1638, 261.

[57] Galileo 1638, 264-269.

[58] Galileo 1632, 175.

[59] Galileo 1632, 398.

[60] Galileo 1632, 235.

[61] Galileo 1638, 276; Drake 1970, 26.

[62] Galileo 1638, 192-193; Koyré 1961, 119.

[63] Galileo 1638, 193-194.

[64] Galileo 1632, 259

[65] Drake 1970, chapter 12-13, and 1990.

[66] Galileo 1632, 19; 1638, 215; 1613, 113.

[67] Galileo 1632, 118-119.

[68] Galileo 1632, 31.

[69] Clavelin 1968, 215.

[70] Galileo 1632, 28; 1638, 215, 244, 251.

[71] Galileo 1613, 113-114; 1632, 31-32, 147; see Clavelin 1968, 372-374.

[72] Koyré 1939, 26-27; Galileo 1638, 62.

[73] Grant 1981, 60-66.

[74] Galileo 1638, 72-84.

[75] Galileo 1638, 161; Koyré 1939, 181.

[76] Koyré 1939, 180.

[77] Galileo 1638, 62-64.

[78] Sargent 1995, 61.

[79] Galileo 1638, 242-243.

[80] Duhem 1908, 110-114.

[81] Galileo 1613, 97; see Galileo 1615, 166; Kolakowski 1966, 28-29; Dijksterhuis 1950, 372-374 (IV: 84-88).

[82] Drake 1978, 84-104.

[83] Galileo 1638, 178-179.

[84] Dijksterhuis 1950, 400-402 (IV: 130-132).

[85] Settle1961; Drake 1978, 88-90.

[86] Harper 2002, 197.

[87] Harper 2002, 182.

[88] Aristotle, On the heavens, I, 8.

[89] Galileo 1638, 167; Koyré 1939, 65 ff; Hanson 1958, 37 ff, 89; Finocchiaro 1973, 86 ff.

[90] Clavelin 1968, 99

[91] Galileo 1638, 74; Drake 1970, 39-40.

[92] Galileo 1638, 174.

[93] Galileo 1632, 221-222; 1638, 153, 175.

[94] Galileo 1638, 96-97.

[95] Galileo 1632, 125-218. See Finocchiaro 1980, 208; Copernicus 1543, 42-46 (I, 7, 8).

[96] Galileo 1632, 274.

[97] Galileo 1632, 416-465; Finocchiaro 1973, 16-18.

[98] Kepler 1609, 26-27 (Preface); see Koyré 1961, 194; Galileo 1632, 462.

[99] Drake 1978, 36-44.

[100] Newton 1687, 435-440.

[101] Dijksterhuis 1950, 444-460 (IV: 194-220); Scott 1952; Gaukroger 1995; 2006, 289-322; Clarke 2006; Stafleu 2019a, 3.1.

[102] Westfall 1971.

[103] Descartes 1647, 48, 53-54.

[104] Descartes 1649, 351-355, 359-362.

[105] Descartes 1647, 53, 65-73; Kant 1781-1787, A 20-21, B 5-6, 11-12, 36.

[106] Descartes 1647, 74, 82.

[107] Van der Hoeven 1961,109-120.

[108] Descartes 1647, 159; 1664, 24-25.

[109] Burtt 1924, 63-71, 106-111, 115-121.

[110] Descartes 1647, 53, 65-73.

[111] Koyré 1939, 237.

[112] Van Berkel 1983.

[113] Descartes 1647, 85; 1664, 38. See Scott 1952, on Descartes’ physics.

[114] Galileo 1638, 269-272 discusses impact, announcing a separate treatise, the so-called fifth or sixth day, published posthumously (1718).

[115] Descartes 1647, 86-88; Koyré 1965, 77-78; Van der Hoeven 1961, 120-139.:

[116] Deason 1986, 168.

[117] Descartes 1637, 21.

[118] Descartes 1647, 89-94.

[119] Hübner 1976, 304:

[120] Descartes 1647, 93.

[121] Koyré 1965, 77; Harman 1982a, 12.

[122] Descartes 1647, 76-79.

[123] Dijksterhuis 1950, 503 (IV, 282); Stafleu 2019a, 3.3.

[124] Galileo 1632, 21-28; 1638, 162-166.

[125] Huygens 1690, 132-133; Dijksterhuis 1950, 507-509 (IV: 288-290).

[126] Descartes 1647, 210-214.

[127] Kepler 1609, 25-26 (Introduction); Koyré 1961, 194.

[128] Kepler 1609, 26 (Introduction); Koyré 1961, 194; Kepler 1609, chapter 32-39; Jammer 1957, chapter 5-7; Koyré 1961, 185-224; Cohen 1974.

[129] Kepler 1597, 129.

[130] Kepler 1609, 34 (Introduction), 228 (chapter 34); Galileo 1632, 345.

[131] Galileo 1613, 106; 1615, 212-213.

[132] Gilbert 1600; Kepler 1609, 229 (chapter 34), 329, 331 (chapter 57); Galileo 1632, 399-414.

[133] Perregrinus 1269.

[134] Kepler 1609, chapter 34, 57; Koyré 1961, 208.

[135] Newton 1687, 409.

[136] Heilbron 1979, 19-43.

[137] Newton 1687, 398-400.

[138] Drake 1970, chapter 1.

[139] On Newton, see Dijksterhuis 1950, 509-539; Alexander 1956; Cohen 1971; 1980; 1985; McMullin 1978; Westfall 1980; Hall 1992; Cohen, Smith (eds.) 2002.

[140] Stafleu 2019a, chapter 5.

[141] Newton 1687, 21.

[142] Cohen 1973, 322-327; Harman 1982a,13-17; Dijksterhuis 1950, 512-515 (IV: 295-297).

[143] McMullin 1978; Elkana 1974, 16.

[144] For a discussion of Newton’s laws of motion, see Hanson 1965; Ellis 1965; Nagel 1961, 174-202; Cohen 1980, 171-193; Gaukroger 2010, chapter 2.

[145] Newton 1687, 14-17.

[146] Newton 1687, 21.

[147].Mach (1883); Hertz (1894); Cohen 2002, 68-70

[148] Newton 1687, 14-17.

[149] Huygens’ De vi centrifuga was written in 1659 and published posthumously in 1703, see Van Helden 1980, 150. An excerpt was published as an appendix to his Horologium oscillatorium (1673), which Newton studied.

[150] Newton 1687, 2-6. In 1674 Hooke observed that circular motion requires an unbalanced force, see Westfall 1980, 382-383, 416:

[151] Newton 1687, 22.

[152] Newton 1687, 13.

[153] Newton 1687, 17.

[154] McMullin 1978, 2, 29-56.

[155] Jammer 1957, chapter 9.

[156] Iltis 1970; Szabo 1977, 47-85; Jammer 1957, 165-166; Papineau 1977.

[157] Newton 1687, 17.

[158] Kuhn1957, 3; cf. Burtt 1924, 18-20.

[159] Koyré 1961, 114-115; Lovejoy 1936, 101-108.

[160] Galileo 1632, 37.

[161] Koyré 1957; 1965, 79-95; Jammer 1954; Burtt 1924, Ch. 4, 7.

[162] Galileo 1632, 319-320

[163] Aristotle, Physics, IV, 2, 4.

[164] Galileo 1632, 12-14.

[165] Newton 1687, 6-12.

[166] Newton 1687, 6.

[167] Landes 1983.

[168] Newton 1687, 13.

[169] Newton 1687, 10-11.

[170] Grant 1981, 254-255:

[171] Alexander (ed.) 1956; Grant 1981, 247-255.

[172] Grant 1981, 250.

[173] Cohen, Smith (eds.) 2002, 5.

[174] Newton 1687, 545-546 (General scholium, 1713); Jammer 1954; Grant 1981, 240-247.

[175] Grant 1981, 255, 264:

[176] Mach 1883, 279-286; see Grünbaum 1963, chapter 14; Disalle 2002.

[177] Mach 1883, 286-290.

[178] Pais 1982, 288.

[179]

[180] Margenau 1950, 139.

[181] Maxwell 1877, 29; Cassirer 1921, 364.

[182] Mach 1883, 217; Poincaré 1905, chapter 2; Reichenbach 1956, 116-119; Grünbaum 1968, 19, 70; Carnap 1966, chapter 8.

[183] Reichenbach 1956, 117.

[184] Galileo 1623, 277-278; Koyré 1939, 179.

[185] Dijksterhuis 1950, 503.

[186] Dijksterhuis 1950, 515.


Chapter 4

Experimental philosophy

in electricity and magnetism

4.1. Early magnetism and electricity

Founded by Francis Bacon, Robert Boyle, and Isaac Newton, experimental philosophy included Newton’s dynamics (3.6), but was most of all a view on heuristics (chapter 9): how to find natural laws in an empirical way. In chapter 4 we shall see how it worked in the investigation of magnetism and electricity. In the eighteenth and early nineteenth-century, it applied the four principles of explanation for the physical sciences as discussed in chapter 3. The isolation of a field of science as a program of experimental philosophy (9.1) became a success for gravity, electricity, magnetism, and initially also heat. It strongly depended both on inverse square laws for the interactive forces, and on the analogical concept of a static fluid, only in the electric case moving in a circuit. About 1850 this made place for the kinetic concept of moving particles, after the isolation of fields of science was overturned by the search for the unity of force (chapter 7) and the unification of electricity, magnetism, and light (4.6).

From ancient times, both electricity and magnetism were known to be occult, obscure, and hidden phenomena. Greek, Roman, and medieval sources ascribed medical power both to the magnet and to the attractive action of rubbed amber (elektron in Greek). This prehistory of electricity and magnetism was finished by William Gilbert in 1600 by completely separating the two fields (4.1). Gilbert’s innovative power was not in magnetism, but in static electricity, although he treated it only to show the differences with magnetism. In the eighteenth century, further investigation culminated in the law of conservation of electric charge and in Coulomb’s inverse-square law (4.2). In the first decades of the nineteenth century, Newtonians designed the analogical mathematical concept of a static field for all forces satisfying an inverse-square law (4.3): gravity, magnetism, and static electricity. Until circa 1800 research into magnetism stagnated, whereas electrici­ty caught up. In 1820 electricity and magnetism became united into electromagnetism, after the discovery of the electric current (4.4). The problem of which force would drive this current was only solved with difficulty (4.5). After 1850, James Clerk Maxwell finished the unification of electricity, magnetism, and optics in the electromagnetic field (4.6).

Magnetism

In 1269 Petrus Peregrinus (Pierre de Maricourt) wrote a letter to a friend about magnetism, known as Epistola de magnete.[1] This letter contains a remarkably modern plea for experimenting. Peregrinus knew of the distinction of magnetic poles; attraction of unequal poles and repulsion of equal poles; magnetising of a piece of iron by touching it with a magnet; breaking of a large magnet into smaller ones; and the North-South direction seeking action of a magnet. He devised a perpe­tuum mobile constructed with the help of a magnet, and he described two kinds of compasses. Except for his letter, next to nothing is known about Pierre de Maricourt.

Peregrinus did not treat magnetism as an isolated phenomenon, but he connected it with medicine and with astronomy. According to Peregri­nus the compass needle directs itself to the celestial North Pole, and he related this property to the daily motion of the heavens. He assumed that a spherical magnet suspended parallel to the celestial axis will rotate about this axis in 24 hours, like the celestial sphere. He cautioned his friend not to ascribe a failure of this experiment to nature, but to his limited skills. Considering the number of surviving copies, Peregrinus’ letter must have had quite a large influence, but it did not lead to a further development of magnetism until the sixteenth century, when it became interesting because of the increasing importance of navigation on open seas.

The isolation of magnetism as a field of science was not due to Peregrinus’ letter, but to William Gilbert’s influential book De magnete (1600),[2] although it did not contain much news about magnetic phenomena. Describing many experiments, On the magnet provided a critical summary of what Peregrinus and Gilbert’s contemporaries had discovered. Gilbert’s own contribution to magnetism was the insight that the earth is a magnet, with the magnetic north pole near the geographic south pole. A compass needle does not direct itself to the celestial but nearly to the terrestrial poles, like one magnet to another one. The complete title of his book is ‘New physics of the magnet and of magnetic bodies, and of the great magnet, the earth’. A large part is devoted to terrestrial magnetism that he connects to the earth’s rotation. In this respect he was a Copernican. Like Peregrinus, Gilbert performed experiments with a spherical magnet, a model that he called a terrella, a little earth. He mocked at Peregrinus’ idea that a magnet suspended from a thread would rotate in 24 hours about its axis, like the celestial sphere does.[3] Galileo outwitted him by observing that the terrella rotates in 24 hours together with the earth. Besides, Galileo only criticized Gilbert for his shortcomings as a mathematician.[4]

Gilbert’s De magnete made a deep impression on his Renaissance contemporaries Kepler, Stevin, and Galileo, though not on Francis Bacon.[5] Gilbert explained the tides by a magnetic influence of the moon. Kepler adopted this idea, assuming that the influence of the sun on planetary motion was magnetic (3.5). Stevin explained the fixed posture of the earth’s rotation axis with respect to the celestial sphere as a magnetic effect. Gilbert made no attempts to measure magnetic forces, although Giambattista della Porta (1589) had described the possibility of such measurements with a balance, and Gilbert mentioned his results.[6]

The seventeenth and eighteenth centuries witnessed several views on magnetism.[7] Gilbert believed that the earth, the mater communis (mother of all), is animated and he explained terrestrial magnetism accordingly. A magnet attracts iron because it induces a sympathetic form in iron. Kepler too, though being the first to apply the modern concept of force as cause of change of motion, initially assumed the celestial bodies to be animated. Galileo, Descartes, and other mechanical philosophers rejected animistic explanations.

The influence of Gilbert’s book is not due to his theoretical insights, but to his giving a convincing summary of the experimentally determined phenomena defining magnetism. Like Peregrinus he may be considered an early experimental philosopher. All later theories of magnetism are base­d on Gilbert’s experimental results.

Descartes’ mechanical explanation of magnetism

René Des­cartes assumed that all matter consists of moving particles only differing by magnitude, density and shape (3.4). He rejected the existence of a vacuum, but suggested that a magnet and other objects have pores, invisible for the naked eye.[8] Through these pores a continuous current of particles moves towards other bodies. The magnet expels particles fitting the pores of other magnets and of iron, but not fitting those of nonmagnetic materials. The stream of particles causes the motion of iron toward the magnet. Descartes explained the difference between North and South poles by assuming the particles and the pores to have screw thread.[9]

Descartes considered his theory as a possible, not a certain explanation of the phenomena. It was not subject to experimental confirmation of any kind. It made a deep impression because for the first time someone provided a clear and insightful mechanical explanation of magnetic action. To be acceptable in the mechanical philosophy this insight should only rest on the number, shape and motion of unchangeable particles.

This contradicted the Aristotelian philosophy considering the contrary concepts of warm and cold, dry and moist, hard and soft, heavy and light to be clear and evident, not requiring further explanation (3.2). These contraries served as termini for the explanation of change and were considered manifest, obvious, and either rational or observable with the senses. Aristotelians accepted concepts like the philosopher’s mercuryand sulphuras clear as far as these could be reduced to these termini. Phenomena that could not be reduced in this way they called obscure or occult, referring to magic. Their standard example of an occult property was magnetism, for it has no obvious connection with the properties warm, cold, dry, moist, heavy, or hard.

Therefore, Descartes’ explanation was hailed as a triumph of the new mechanical philosophy. It conferred much credit to Cartesian physics, although it did not explain anything; it did not predict new phenomena; it did not further measurability; and it did not generate new problems. For these reasons, Newton rejected it.

Early electricity

With William Gilbert the development of magnetism was finished for the time being, but the development of electricity started with the publication of his book.

‘… the De Magnete … establishes a new science, electricity. A new technique in the sciences was coming to fruition: the isolation of a field of study from broader issues, and Gilbert’s work in electricity was the major factor in the isolation and hence the establishment of that area of study … It is interesting that all this is a digression to Gilbert. His intense interest in electric phenomena is principally to eliminate them so that he may safely pursue his primary topic of study: magnetism.’[10]

‘Historians have recognized in Gilbert’s separation of the amber effect from magnetism the essential first step in the history of electricity, as well as an exemplar of proper scientific method.’[11]

Less than a tenth of Gilbert’s De magne­te is devoted to electricity (24 of 358 pages in the English translation).[12] The main aim of Gilbert’s investigation of electri­cal phenomena was to demonstrate that these were not of a magnetic nature, though he was not the first. Before Gilbert, Jerome Cardan (De Subtilitate Rerum, 1550) had established five differences between the magnet on the one side and amber together with a number of related materials on the other hand. Both Cardan and Gilbert distinguished the attractive action of amber from the magnet’s directive ability.[13]

Gilbert called any substance behaving like amber an electric (electricum in Latin), any other non-electric. He observed that all substances are attracted by any electric. In order to demonstrate this he used the first electrical observation instrument, invented by Girolamo Fracastoro (1550).[14] This versorium is an easily rotating piece of metal, like a compass needle, but without being magnetized.

According to Gilbert, a rubbed electric emanates an ef­fluvium, a vapour caused by the heat accompanying the rubbing. He observed that by heating alone a body could not be electrified. Gilbert established that the electric force like the magnetic one diminishes with increasing distance, but he did not mention a repulsive force, though he must have seen it once in a while. In his Philos­op­hia magnetica (1629), Niccolo Cabeo mentioned that a light object being first attracted by an electrified object was repelled after making contact.[15] He did not consider it an electric effect, but a mechanical one, to be explained in a theory of collisions.

Cabeo and other seventeenth-century investigators adhered to an old theory by Plutarch assuming that the electric force was caused by aerial motions. This hypothesis led to experiments in a vacuum, available in the seventeenth century after the invention of the air-pump. Supervised by Newton, Francis Hauksbee investigated light phenomena coupled to electricity.[16] Initially electricity was only known as a very weak force, caused by friction. This changed after Hauksbee’s invention of the electric­ generator (1703), a hand driven machine, generating electricity by friction.[17] During the eighteenth century, it would become a very popular instrument for demonstrations of electric effects. It allowed of charging objects much stronger than before, and made new experiments possible.[18] Commissioned by Martinus van Marum, John Cuthbertson built a large machine producing and storing electricity for Teyler’s museum at Haarlem (1784), where it is still in working condition.[19]

Between charged objects sparks could be observed, leading Benjamin Franklin (who investigated electricity from 1743 to 1757[20]) to interpret thunderstorms as electric discharges, devising some quite dangerous experiments with flying kits and designing a lightning rod (1752).

Between Hauksbee and Franklin, Stephen Gray investigated the possibility (first discovered by Boyle) to transfer the electric virtue from a rubbed electric to another object, making distinction between conductors and isolators. The latter turned out to be identical to Gilbert’s electrics. Gray observed that in this way metals, too, could become electric. He studied the phenomenon of electric induction, meaning that an object in the neighbourhood of a rubbed electric becomes temporarily electric, losing its electric property if removed from the surroundings of the electric. Hauksbee, too, observed induction, but he did not recognize it to be an electric phenomenon.[21]

With Hauksbee’s and Gray’s discoveries, electricity changed from a relatively simple uninteresting curiosi­ty into a confusing set of connected but unordered phenomena, that started to draw the attention of an increasing number of people.

Du Fay’s summary

At about the same time as Gray, Charles Du Fay (or Dufay) wondered whether electrified bodies differ from each other apart from the intensity of electrification. Like Gray he investigated for instance the significance of the colour of an object for its electric properties. He found that all substances could be electrified, some easiest by friction, others by induction. He recognized the phenomenon of repulsion discovered by Cabeo to be an electric one. Du Fay’s most important discovery was the existence of two kinds of electricity.

This led Du Fay to a generalising summary of all electric phenomena known at the time,[22] like Gilbert had done for magnetism. Expressed in modern terms, this summary reads:

1. All bodies, with the exception of metals and soft substances become electrified by friction.

2. Metals and moist objects are good conductors, hence poor isolators. The reverse is true for substances like amber, glass, and silk.

3. Contrary to good isolators, good conductors are easily electrified by induction.

4. Each electrified object attracts each unelectrified object by induction. But they repel each other when the second body is electrified by the first one by contact or conduction.

5. There are two kinds of electricity, called glass-like and resin-like (after amber, i.e., fossilized resin). All electrified objects of the same kind repel each other, whereas objects differently electrified attract each other.

Du Fay observed that his summary idealized and simplified real states of affairs. The distinction between conductors and isolators was not absolute. Glass became glass-like electric if rubbed with silk, wool, or cat skin, but resin-like electric if rubbed with rabbit skin.

More interesting was the following. When one put a strong glass-like electrified object near a weak glass-like electrified object, one would have expected repulsion according to rule 5. However, often attraction would occur, according to rule 4, when induction prevailed. Complications of this kind, together with, for instance, the influence of the humidity of the surrounding air, often made experiments on static electricity difficult to perform. Du Fay had to understand these complications before he could arrive at his idealized summary, which turned out to be very important for the further development of the field of electrical science.

Du Fay’s summary consists of a number of empi­rical generalisations, which do not logically follow from a theory. Characteristic for a summary consisting of empirical generalisa­tions that finishes the first phase of the development of a field of science is, however, that it remains valid even if the theories are forgotten. In his summary of experimental discoveries, Du Fay did not even mention his own theory about electricity.

Du Fay’s summary defined what one should understand by electricity, not by pointing out its essence, but by summing up a few law conformities. In his time three methods were known to electrify a body: friction, conduction, and induction. When later new sources of electricity were discovered, one always tried to demonstrate experimentally or theoretically that this new electricity was identical to the natural electricity defined by Charles Du Fay. Benjamin Franklin demonstrated the electric character of thunderstorms in 1752. Franz Aepi­nus discovered in 1756 that tourmaline becomes electric after being heated (pyro-electricity). Animal electricity of the electric eel or ray was investigated by John Walsh (1773); galvanic electricity by Luigi Galvani (1786) and Alessandro Volta (1800); thermoelectricity by Thomas Seebeck (1821); and magnetically induced electricity by Michael Fara­day (1831), who in 1833 definitely proved all these electrical phenomena to be identical to Du Fay’s.[23]

This means that Du Fay’s summary determined electricity as a field of science as well as its boundaries. It also means that these boundaries are flexible and the field can be extended. Most important, Du Fay showed that electricity is a universal property shared by all materials.

‘Dufay’s substantive discoveries  … are but one aspect, and perhaps not the most significant, of his achievement. His insistence on the importance of his subject, on the universal character of electricity, on the necessity of organizing, digesting and regularizing known facts before grasping for new ones, all helped to introduce order and professional standards into the study of electricity at precisely the moment when the accumulation of data began to require them. He found the subject a record of often capricious, disconnected phenomena, the domain of polymaths, textbook writers, and professional lecturers, and left a body of knowledge that invited and rewarded prolonged scrutiny from serious physicists.’[24]

4.2. The quantification of electricity

Between 1734, when Du Fay defined electricity, and 1785, when Coulomb measured it, effluvium theories became distinguished from fluid theories.[25] Both effluvium and fluid were metaphysical concepts. Neither was observable or directly measurable, though both inspired to performing experiments. One’s choice was determined by one’s world view, in which the distinction between action at a distance and action by contact appeared to be decisive.

The mechanist effluvium theory, favoured by Jean Antoine Nollet, is the oldest one and was later developed into an ether theory. An effluvium was a kind of vapour surrounding an electrified object. It was most successfully applied to the explanation of electros­tatic phenomena like electric attraction and repulsion. The attractive or repulsive force between two charged bodies occurs by contact action through the effluvium, streaming into or out of the bodies, very similar to Descartes’ explanation of magnetism (3.4).

The assumption of an effluvium surrounding a charged body led to attempts to condense this vapour in a bottle. The oldest condenser, first invented in 1745 by Ewald Von Kleist, is called the Leyden jar after the city where Andreas Cunaeus and Petrus van Musschenbroek a year later made the same discovery. It is a glass bottle, internally and externally covered by a metal foil.[26] The Leyden jar was used for the storage of electric charge, increasing the efficiency of electric machines. These were now primarily used to charge a battery of Leyden jars, in order to do experiments with an intensity formerly unknown. In 1762 Johan Carl Wilcke invented and in 1775 Alessandro Volta improved the electrophore, a capacitive generator applying electrostatic induction, becoming a popular instrument to charge a Leyden jar.[27]

An electric fluid, as proposed in the framework of experimental philosophy, is a liquid or gas within the object and therefore implied action at a distance. It was inspired by the problems of flowing electricity. By rubbing an electric, a certain amount of this fluid is added to or withdrawn from it. In the first case the body becomes positively charged, in the second case negatively. Benjamin Frank­lin called glass rubbed with wool positively charged, and this convention is still upheld. In contrast to mass, charge could now be conceived as an algebraic magnitude, having both positive and negative values.

For this substance Franklin formulated a conservation law (1747),[28] although before Franklin, the principle of conservation of charge was already widely accepted.[29] Within a closed system (electrically isolated from the environment) the total charge, defined as the algebraic sum of all charges, is constant. This law was never connected to the effluvium theories. It supplied a reasonable explanation for the possibility to electrify a body by friction, and of conduction, in both cases transferring the electric substance from one body to another one.

Franklin could explain electrical induction, the phenomenon that an uncharged body A is attracted by a charged one, B. The charge on B leads to a redistribution of the electric fluid on A such that the near part of A is more attracted than its faraway part is repulsed. The shape of the Leyden jar turned out to be irrelevant, as was demonstrated by the construction of a flat capacitor. Since then a condenser is a system consisting of two conductors separated by an isolator. The original meaning of the word con­denser (in English to be replaced by capacitor) disappeared when the idea of a condensing effluvium was replaced by that of a flowing substance.

In the eighteenth century fluid theories were developed for electricity (Franklin), magnetism (Aepinus), and thermal physics (Lavoisier). A quite different pseudo-scientific theory of a magnetic fluid in animal magnetism led to mesmerism after the German physician and astrologer Franz Mesmer.[30] For electricity and magnetism both one and two-fluid theories were proposed. In a one-fluid theory, the conservation law expressed that the substance concerned could not be created or destructed, whereas in a two-fluid theory one fluid could neutralize the effects of the other one.

This profusion of fluid theories elicited Friedrich Schelling’s sarcastic comment:

‘If we imagine that the world is made up of such hypothetical elements, we get the following picture. In the pores of the coarser kinds of matter there is air; in the pores of the air there is phlogiston; in the pores of the latter, the electric fluid, which in turn contains ether. At the same time, these different fluids, contained one within another, do not disturb one another, and each manifests itself in its own way at the physicist’s pleasure, and never fails to go back to its own place, never getting lost. Clearly, this explanation, apart from having no scientific value, cannot even be visualized.’[31]

After Franklin formulated the principle of conservation of charge as his most important axiom, Joseph Black and Antoine Lavoisier arrived at the principle of conservation of heat. Lavoisier (with predecessors) applied conservation of mass in chemical reactions. Franz Aepinus introduced a principle of conservation of magnetic pole strength, which, however, never turned out to be fruitful. As axioms these laws could not be proved, neither theoretically nor experimentally, but they formed a fruitful heuris­tic means. In the midst of all changes in physics since 1740, the law of conservation of charge has never been challenged, whereas the law of conservation of heat (or caloric, as Lavoisier called it) was transformed into that for energy (7.3).

The inverse-square law

For the success of Newton’s law of gravity it was decisive that he could prove that gravity, exerted by a sphere (the sun or a planet), equals that of a point-like body having the same mass as the sphere and localised at its centre.[32] Immanuel Kant related the inverse-square law  (the 1/r-law indicating that the force of gravity decreases inversely proportional to the square of the distance) for gravity to the three dimensions of space, like the perceived intensity of a point-like source of light diminishes in proportion to the square of the distance.[33] Long before any experiment could be exerted it was surmised that two charged objects attract or repel each other proportional to the charge of each of the two bodies and inversely proportional to the square of their distance.[34] Several physicists set out to measure the force between two charged objects. It took quite some time before these measurements were successful.

Isaac Newton’s Principia inspired an indirect proof of the electric inverse-square law.[35] In 1765 Benjamin Franklin made Joseph Priestley aware of the phenomenon that within a charged hollow conductor no electric influence could be observed from outside the conductor, and not even from the charge on the conductor. This was confirmed by Faraday’s experiment with an ice bucket.[36] Priestley made a connection to a thesis proved by Newton. Because of the inverse-square law, Newton proved that within a hollow homogeneous sphere the force of gravity is zero.[37] (A direct experimental confirmation of the inverse-square law for gravity was only provided by Henri Cavendish, in experimen­ts exerted in 1797-1798, more than ten years after those of Charles Coulomb for the electric force.) In analogy, Priestley suggested that Franklin’s discovery would implicate a 1/r²-law for electrical forces.[38]

Probably Priestley understood that this is not a strict proof, for two reasons. First, Newton’s proof depends on the proportionality of gravity with 1/r², but also on the proportionality with the masses – and about charges in the electric case Priestley could not say anything. Second, Newton’s proof is only valid for a hollow sphere with a homogeneous distribution of mass, whereas in the electric case the force within a hollow charged conductor turned out to be always zero, independent of the shape, the thickness, or the homogeneity of the conductor.

Priestley’s derivation drew little attention, and Maxwell published the more precise measurements by Cavendish (1773) on a hollow sphere no less than a century later.[39]

Coulomb’s experiments

In retrospect, direct measurements of the electric force could only have results if four conditions were fulfilled.

First, like Du Fay (4.1), one should carefully distinguish between the force between two charged objects and that between a charged object and an uncharged one, by induction. Next, the principle of conservation of charge implies that one should be careful to isolate the objects between which one wants to measure the force. Leaking of charge cannot be precluded entirely, and Coulomb had to make corrections for this effect.[40] If this had success, one could study two proportionalities apart (with the charges and with 1/r²).

Third, the inverse square law can only be found if the bodies concerned are sufficiently point-like, little spheres having a radius small compared to their distance. This restricts the quantity of charge to be brought on the spheres. Therefore, the measurement method has to be extremely sensitive. By not satisfying this condition, Daniel Bernoulli (circa 1760) failed in reaching the expected result. He worked with charged disks (instead of spheres, as Coulomb did), and did not find a 1/r²-law.[41] For the same reason it is almost impossible to find the inverse-square law for magnetism, because it always occurs bipolar. Attempts were done by Newton (who came close to a 1/r³-law), Michell (1750), and Coulomb (1787),[42] but a convincing experimental confirmation of the magnetic inverse-square law could only be given by Gauss in 1832, after Poisson in 1824 had derived the consequen­ces of this law for a dipole.

Finally, one needs a suitable measure for the force. In principle, this should not provide problems. Because of the possibility to add forces (considered as vectors) of a different nature acting on the same object, one may compensate the electric force by another known force. In practice, this met with difficulties. In the middle of the eighteenth century weight was applied as the compensating force, to be measured with a balance. Ber­noulli used the upward force on a floating body. Both methods turned out to be insufficiently sensitive.[43] Only after 1910 Robert Millikan succeeded in comparing the electric force on a charged oil drop directly with the force of gravity and the upward force in air.[44]

Charles Augustin de Coulomb, after whom both the electrical inverse square law and the modern unit of charge are named, was a convinced experimental philosopher. On theoretical grounds, in 1777 he criticized a Cartesian vortex the­ory of magnetism in favour of action at a distance.[45] Coulomb was not concerned with the question of what electricity is. For him the central problem was how to find a reliable measure of electricity.[46] In 1784, for his research of material properties, he had developed a torsion balance, an appa­ratus to measure the torque required to twist a metal wire.[47] Since 1785 he applied this instrument to the measurement of electric and magnetic forces. His measurement method satisfies the conditions mentioned above. However, although Coulomb’s instrument was much more sensitive than an ordinary balance, it was only marginally sensitive enough.

It could be used in two ways: statically, by measuring the angle through which the torsion wire had to be turned in order to compensate the electric force; and kine­tically, by measuring the period of a torsion pendulum, influenced by an electric force. Hence Coulomb found the inverse-square law in two independent ways.

The merit of Coulomb’s modus operandi is that this law could be established independently of the question whether the electric force is also proportional to the charges. In this way Coulomb’s direct method differed from Caven­dish’s indirect one, which, however, is much more sensitive.[48] The low accuracy and the difficulty to reproduce Coulomb’s experiment made that it took some time before his results were accepted, especially by mechanist scientists adverse to action at a distance.

Coulomb’s law

Next Coulomb stated that the force between the charged bodies in his torsion balance was proportional to the product of the charges.[49] If a conductive charged sphere touches an equal but uncharged sphere, Coulomb assumed that the charge will be distributed such that both spheres would get half of the original charge. Now he could operationally define the quantity of charge by the supposed law: at a certain distance r and a certain force F (both measurable), the magnitude of the charge q on each of the two identical spheres follows from the relation: F=q²/r².[50] In this way, Coulomb’s law established the metric of charge. The algebraic character of charge could be tested by measuring the attractive force of two spheres that neutralize each other after the measurement. One could consider three charged spheres and measure the force between each pair, predicting the result of the third measurement from the results of the first two. These and other experiments have shown that the magnitude which operatio­nal definition is given by Coulomb’s law is indeed the sought for additive magnitude satisfying the principle of charge conservation.

According to a twentieth-century measurement,[51] the exponent of r in Coulomb’s law differs from 2 by less than 3×10-16.

4.3. Mathematical fields

Adherents of Newton’s experimental philosophy were not only concerned with experiments and measurements, but also with the mathematical investigation of any field of science. The spatial representation of electricity and magnetis­m was developed by Siméon Poisson (1812, 1813, and 1824) in France; by George Green (1828) in England who introduced the word potential; and by Carl Friedrich Gauss (1839) and Wilhelm Weber (1846) in Germany.[52] They started from the potential the­ory for gravity, introduced in 1777 by Joseph-Louis Lagrange, followed by Pierre-Simon Laplace.[53] In this theory the concepts of work, potential, field strength, and flux, derived from Newton’s mechanics, played a part. Gauss emphasized that the theory is valid for any force satisfying an inverse-square law. It was applicable to gravity, electricity, and magnetism. The theory was basically atomis­tic, although its proponents were not necessarily committed to the reality of atoms. Because the sources of the field were assumed to be point-like or composed of point-like sources, the theory could apply the spherical symmetry around any point. Given the position and the strength of the static sources of the field, the potential, and the field strength in any point outside the sources could be calculated. Hence it was possible to determine the force exerted by the sources on an object having a position within the field. This allowed of introducing a number of mechanical concepts into electricity, such as work, field strength, potential, potential difference, and potential energy.

(A force performs work on an object if it is displaced. Work is the product of the force and the object’s displacement, measured in the direction of the force. In a displacement perpendicular to the direction of the force no work is performed. The field strength equals the force acting on an object divided by its mass (in the case of gravity) or its charge (in the electrical case). The potential difference between two spatial points is the work performed if an object is displaced between these two points, divided by its mass, respectively its charge. Hence the potential difference is the product of the field strength and the displacement measured in the direction of the field strength. The potential in a point equals the potential difference with an arbitrarily chosen zero point (in the elecric case often called the earth). Finally, the potential energy of an object is defined as the product of its potential and its mass, respectively charge.)

Flux

About 1840 Gauss defined the flux through a surface as the product of its area and the perpendicular component of the field strength, multiplied by a constant, dependent on the environment, usually called the medium. He distinguished a gravitational flux, an electric flux, and a magneti­c flux. Starting from the inverse-square law, Gauss proved that in these three cases the total flux through a closed surface (like a box without any hole) equals the enclosed mass, charge, respectively pole strength, irrespective of the position of these within the box. Magnetism only occurs dipo­lar. Therefore the total magne­tic flux through a closed surface is zero, according to Gauss.

Gauss’ law enabled him to solve quite a few problems without taking into account the detailed position of the charges within the Gaussian surface. For instance, one can now easily prove that within a conductor the electric field is zero as long as no current flows. This means that the charge on a conductor is positioned at its outer surface, and that the field within a hollow conductor is only determined by charges present within this cavity. In their absence, the field is zero, irrespective of the presence of charge on or outside the conductor. An experimental test for this result, like that of Franklin or Priestley (4.2), or Faraday’s ice-pail experiment,[54] confirms Gauss’ thesis, hence Coulomb’s law.

The static field theory is more theoretically than practically relevant. When it was developed, nobody suspected that the potential difference would become a key concept in electrical engineering (4.5).

Electrostatic and electromagnetic units

By means of Coulomb’s law a metric for charge could be defined (4.2). With the help of the mechanical scales for mass, length, and time, it was now possible to define metrics for electric or magnetic force, work, potential, field strength, and flux. Such a coherent set of scales constitutes a metric system, initially different for gravity, electricity, and magnetism, but unified in the course of the nineteenth century.

Gauss introduced a metric system for electrostatic units (esu) in 1832, accepting electric charge, defined by Coulomb’s law, as a funda­mental magnitude. In 1840 Weber defined electromagnetic units (emu), featuring the magnetic pole strength as a fundamental magnitude and the electric charge as a derived one. (Gauss and Weber cooperated in measuring the earth’s magnetic field in many regions of the world.) In 1856 Wilhelm Weber and Rudolf Kohlrausch charged a Leyden jar, measuring the electrostatic force associated with the potential, then discharged it, measuring the magnetic force caused by the electric current. They observed that the proportion of the two units of charge has the dimension of speed and a magnitude (3.107×108 m/s) close to the speed of light. In 1861-1862 Maxwell used this as an argument in favour of his electromagnetic theory of light (4.6).[55] Taking the proportion of the two units of charge (the speed of light) as a starting point, the electrostatic and the electromagnetic metric systems were united into the cgs and mks systems, based on the mechanical units (centi)metre, (kilo)gram, and second.

In the twentieth century these unit systems were replaced by the present Système International, in which neither charge nor pole strength, but electric current and its effects are taken fundamental, because these are more accurately measurable. This illustrates that electricity and electronics became much more important in physics and technology than mechanics, long assumed to be the basis of physics.

Mathematical fields lack physical reality

The static field theory could be derived strictly mathematically from the inverse-square laws and therefore did not require experimen­tal confirmation. The field was no more than a mathematical means, a geometrical representation of physical interaction. Only in a physical field theory the field could be accorded a physical reality, if the field is supposed to act on the bodies in the field – but then one would have to abandon action at a distance in favour of action by contact. Michael Faraday, William Thomson, and James Clerk Maxwell made this step (4.6).

There is some reason to doubt the physical meaning of a field or a flux as described so far. It could not find a place in the matter-force dualism. Apparently, the field is not material. It should be distinguished from mass, charge, or pole strength, and from the corresponding material fluids. Nor is a field a kind of force. A force has only physical meaning if it acts on an object. Now the electric field E(r) indicates that on an object having charge q at the position r a force is acting given by F=qE(r). But if a charged object is placed at r, it becomes a source itself, changing the field. For the calculation of the force, the field strength has to be applied as it was in the absence of the object, whereas this force only gets meaning in its presence.

Mathematically this problem is easily solved by considering the field as a limiting concept. Now E=limF/q, i.e.,the ratio of force and charge if both approach zero. But this would only emphasize the mathematical character of the concept of a field.

Moreover, there remains a physical problem that was never satisfactorily solved in classical physics. According to Newton’s third law of motion an object experiencing a force in a field will exert an equal force (with opposite direction) on the source. The problem arises if one abstracts from the object – how shall the field react upon the source? By according to the field only mathematical meaning one could ignore this problem. But one had to face it after the introduction of the energy of a field, when Thomson and Maxwell succeeded in giving the concept of a field a physical and dynamic character. Static electric and magnetic fields in a mathematical theory could not be considered real, because they had no physical effects. This changed radically with the investigation of electric currents.

4.4. The discovery of the electric current

In the static potential theory an important difference between gravity and magnetism on the one side and electricity on the other remains obscure. Stephen Gray’s discovery of the existence of electrical conductors, having no equivalent in gravity or magnetis­m, made physicists aware that flowing electricity might be more important than electrosta­tics. Nevertheless, only after the discoveries of Galvani and Volta this aspect drew the full attention of the investigators of electricity. Even then, nobody could have expected that moving electricity would change the world more than any political revolution. At the start of the nineteenth century, the future seemed to be with steam, not with electricity.

New forms of electricity

Since 1780 Luigi Galvani investigated the influence of electricity on dissected frogs. He suggested that the observed contractions of muscles were caused by the transport of a fluid from the nerves to the muscles. He argued this fluid to be the same as the electric one, and he considered the phenomenon to be comparable to a discharge of a Leyden jar. Others believed that galvanism, as it became known, differed from ordinary electricity, but like Galvani, Alessandro Volta refused to accept that it would be a physiolo­gical effect.[56] In 1792 he found that the same phenomena occur when two different metals are connected through a wet medium. The frogs in Galvani’s experi­ments provided this medium. The occurring muscular contractions served only as an accidentally discovered detection instrument for weak currents.

In 1796 Giovanni Fabbroni observed that oxidation occurs on one of two metals having contact with each other and with water. He concluded that a chemical reaction accompanied galvani­c electrici­ty. In 1800 Volta invented the pile called after him, consisting of a number of galvanic elements connected in a series.[57] Each element consisted of two sheets of different metals (for instance copper and zinc) separated by a wet medium (for instance wet paper). In this way Volta could multiply the rather weak effects of galvanism.

The investigation of electric currents got a considerable impulse from Hans Christian Oersted’s discovery (1820) of the action on a magnet of a wire connecting the poles of a voltaic pile.[58] When Oersted stretched the wire above a compass needle, the latter took a position perpendicular to the wire. At that time it was not generally accepted that a continuous currentwould occur in the wire connecting the poles of the volta pile. Because a Leyden jar could discharge almost instantaneously, Oersted assumed that the Volta pile would produce a discontinuous succession of discharges. In 1830 Oersted wrote about himself:

‘He conjectured that if it were possible to produce any magnetic effect by electricity, this could not be in the direction of the (axis of the) current, since this had been so often tried in vain, but that it must be produced by a lateral action. This was strictly connected with his other ideas; for he did not consider the transmission of electricity through a conductor as a uniform stream, but as a succession of interruptions and re-establishments of equilibrium, in such a manner that the electric powers in the current were not in quiet equilibrium but in a state of continual conflict.’[59]

Oersted published his experiment in Latin, probably the last time an important physical paper was written in this language. In part for this reason it became known elsewhere only indirectly, and it was understood in a way quite different from Oersted’s electromagnetic view. Investigators of the French Academy of Sciences interpreted it as an electrodynamic action of a current, a flow of electric charge through the wire. In an ingenious experiment, Jean-Baptiste Biot and Félix Savart determined the magneti­c force exerted by a straight current carrying wire on a compass needle.[60] They found it to be inversely proportional to the distance of the wire – hence a 1/r-law, not a 1/r²-law. Pierre-Simon Laplace demonstrated that the 1/r-law found by Biot and Savart valid for a long wire can be explained by assuming a 1/r²-law for a small piece of that wire. This reduction invited the criticism that such an infinitesi­mal piece of a current carrying wire can hardly function in an empirical generalisation.[61] However, the theory is also applicable to curved wires. A current in a circular wire causes a perpendicular magnetic field at its centre, reinforced in a coil with many windings. François Arago demonstrated that this solenoid could deflect a magnetic needle. The theoretical 1/r²-law for infinitesimal current elements turned out to be more general and more fruitful than the empirical 1/r-law, only valid for the specific case of a long straight current carrier.

It is striking that the separated, though analogous mathematical development of electro- and magnetostati­cs occurred largely when it became already clear that these two fields have more in common than was expected. At various occasions, experimental philosophers like Coulomb had argued that magnetism and electricity have no connection to each other. Hence, Oersted’s discovery, inspired by the romantic search for the unity of natural forces, completely surprised Newtonian scientists. Now an intrinsic connection turned out to exist between the two fields of science, they would try to reduce one field to the other. This annexation was executed by André-Marie Ampère between 1820 and 1826.

Ampère

Oersted’s discovery convinced Ampère that it was not justified to consider electricity and magnetis­m as separate fields of physical science. After it was established that a circulating current behaves as a magnet in all respects, Ampère assumed that all kinds of magnetism could be explained from the existence of molecular circular currents. Ampère’s hypothesisredu­ced the magnetic fluid to an effect of the motion of the electric fluid, explaining why magnetic monopoles do not exist.[62] At first, Ampère assumed these circular currents to have the size of the whole magnet, but after criticism leveled by Fresnel he accepted Coulomb’s suggestion that the molecules in a magnet are magnetic dipoles.[63] The hypothesis implies the brilliant idea that two currents interact magnetically. This inspired Ampère to start his research of the magnetic force between current carrying wires, both experimentally and theoretically.

His experiments were exceedingly original and well considered. Within a week after Oersted’s discovery became known at Paris Ampère presented his first experi­ment to the Académie des Sciences, in which he demonstrated the force between two parallel current carrying conductors. He showed that the wires attract each other if the currents have the same direction, and repel each other if the currents run in opposite directions. Later he showed that the force is zero when the wires are perpendicular to each other. Generally, it is proportional to the cosine of the angle between the two wires when they are in an oblique position.

Ampère made creative use of the null method in which different forces are balanced, because methods to measure currents or potential differences did not yet exist. Critics observed that Ampère drew far-reaching conclusions from what he did not measure. Yet, after Ampère the null method has become the key of many measuring instruments.

Maxwell admired Ampère’s method highly, but he was not convinced that his theory was derived from experiments:

‘The whole, theory and experiment, seems as if it had leaped, full grown and full armed, from the brain of the “Newton of electricity”. It is perfect in form, and unassailable in accuracy, and it is summed up in a formula from which all the phenomena may be deduced, and which must always remain the cardinal formula of electro-dynamics. The method of Ampère, however, though cast into an inductive form, does not allow us to trace the formation of the ideas which guided it. We can scarcely believe that Ampère really discovered the law of action by means of the experiments which he describes. We are led to suspect, what, indeed, he tells us himself, that he discovered the law by some process which he has not shewn us, and that when he afterwards built up a perfect demonstration he removed all traces of the scaffolding by which he had raised it.’[64]

When a quarter century later Weber repeated Ampère’s experiments, he established that in Ampère’s method the frictional forces were larger than the electrodynamic ones that he wanted to measure.[65] In his own experiments he succeeded in eliminating friction to a large extent, to his great relief finding the same results as Ampère had done.

In his approach to electrodynamics Ampère acted as an adherent of experimental philosophy, but at other times his views corresponded more with those of the Romantics and the Naturphilosop­hie.[66] Acting as an experimental philosopher, Ampère wanted to unite the parts played by Kepler proposing a number of experimentally found rules, and by Newton developing a mathematical theory explaining them. Ampère’s assumptions and methods did not convince everybody, but his results did. During a quarter of a century his Théorie mathématique des phénomènes élec­trodyna­miques, uniquement déduite de l’expérience (1826) was considered a standard work.[67] The title, ‘Mathematical theory of electrodynamic phenomena, uniquely derived from experience’, betrayed its Newtonian character. Ampère admitted that he did not succeed completely, because at least one of his experiments turned out not to be technically realizable. However, he was convinced of the result of this quasi-experiment and used it in his theory.

Like in Newton’s theory of gravity, in Ampère’s theory the law of action and reaction, the inverse-square law, and action at a distance were essential parts. Unlike Oersted, Biot, and Savart, he applied Newton’s third law of motion and discovered the mag­netic interaction between two electric currents.

Newton stated that he only wanted to describe how gravity acts, not what it essentially is. Ampère did not affirm that a current really flows through the wire. He only demonstrated that the observed phenomena could be explained with the help of this assumption. Meanwhile Ampère and his disciples believed this hypothesis firmly, and they did not find it necessary to investigate it experimentally. For experimental philosophers it was sufficient to have a metric to measure the strength of the current. Only in 1876 Henri Rowland performed an experiment to demonstrate that an electric current causes the same effects as moving electricity.[68] He applied an electrostatic charge to an ebonite disk covered with a gold foil. This disk, put into a fast rotating motion, turned out to cause a magnetic field, quite like an electric current in a circular wire does.

Yet Ampère’s theory is not entirely Newtonian. Indeed, the force between two current elements is proportional to the current strengths and inversely proportional to the square of their distance, being analogous to Newton’s law for gravity and Coulomb’s laws for electro- and magnetostatic interaction. But this force also depends on the angle between the two elements (having the spatial shape of a short line), as well as on the angle of these elements with their connecting line. This property was new in experimental philosophy and difficult to reconcile with action at a distance.

Another difference arose after Faraday introduced field lines. In the static fields all field lines start or end in infinity or in a point-like source (a charge, a magnetic pole, or a mass point). These so-called conservative fields can be described by means of a potential (4.3). However, the magnetic field lines around a current carrier are closed and cannot be described by such a potential.

These deviations from the Newtonian paradigm were reinforced by Neumann’s and Weber’s introduction of velocity and acceleration dependent terms in the electri­c interaction.[69] This became necessary by another discovery made by Faraday, the electro­magne­tic induction of an electric current by a changing magnetic field. In 1845 Franz Ernst Neumann extended Ampère’s theory to these currents, such that all electrical phenomena fell into their place: static electricity, attraction and repulsion of currents, and electromagnetic induction.[70] This theory became dominant on the European continent until the end of the nineteenth century. It started from action at a distance and Newton’s third law. Wilhelm Weber assumed that the electric interaction was entirely determined by Coulomb’s law and by the velocity and acceleration dependent componen­ts. In 1847 and 1871 Weber presented an atomistic theory of different forms of magnetism. Attraction and repulsion of constant currents were determined by the velocity of charged particles and induction by their acceleration, for a changing current causes a magnetic field by induction. He succeeded in finding mathematical expressions for all these phenomena. Only well after the experiments of Hertz (1887), Weber’s theory had to give way to Maxwell’s.

4.5. Electric current and potential difference

Classical physicists, whether adherents of mechanical or experimental philosophy, placed motion at the centre of their studies. They investigated linear uniform motion (inertia); linear accelerated motion (free fall); circular motion (with centrifugal or centripetal acceleration); elliptical motion (planets); vortices (Descartes on celestial motion); impact (Huygens, Wren and Wallis); rotation (Newton’s pail experiment, Leonard Euler); flowing fluids (Daniel Bernoulli); waves (in sound, and since the early nineteenth century in physical optics); and oscillations (pendulums and bodies suspended from a string). Except for Descartes after William Harvey’s discovery of the circulation of blood in the human body (1628), initially they did not pay attention to motion in a circuit. Theoretical physicists were more interested in static fields. Electric discharges of Leyden jars were not recognized as currents. An electric friction machine is a reliable source to achieve a high electric tension, but even with the help of a battery of capacitors it cannot deliver a significant constant current.

The development of galvanic electricity, and to a lesser extent Thomas Seebeck’s discovery of thermoelectricity in 1821 (7.2), improved this situation considerably. After the galvanic elements were developed into practical use, one could dispose of a source of a relatively strong and constant current, albeit that the tension supplied was much lower than that of an electric ­machine. Therefore, the development of powerful machines transforming rotational motion into electricity (or reversely) became the focus of electrical engineering. Michael Faraday in England and Joseph Henry in the United States performed the foundational work, and the first working dynamos and electric motors date from 1832, but engineers had to solve many practical problems before these became useful machines. The dynamo, initially connected to a steam engine or a water wheel, later on especially with a steam turbi­ne, became the most important source of electricity, though galvanic elements are still widely used.

The problem posed to the investigators of electricity at the start of the nineteenth century was how to measure – and even define – electric current and tension.

Measuring moving electricity

Before Charles Coulomb succeeded in finding a quantitative measure for the amount of electricity (4.2), electroscopes were used as measuring instruments.[71] An electroscope exists of two metal strips repelling each other when they are charged. The deviation was larger when the charge was stronger. Clearly, one does not measure a quantity of electricity in this way. A large sphere contains more charge than a small one if they have a conducting connection, but an electroscope shows the same reading when connected to either the large or the small sphere. In the eighteenth century the word tension was introduced for the magnitude measured by an electro­scope.

Whereas charge is an extensive or additive magnitude, tension is an intensive equilibrium parameter. This distinction is analogous to that between heat and tempera­ture that was introduced at the same time. As an equilibrium parameter, an intensive magnitude expresses the transitivi­ty of equilibrium. It says that two objects are in equilibrium with respect to some intensive magnitude if they are both in equilibrium with a third object, for instance a measuring instrument. On this equivalence property rests the working of electroscopes, pressure gauges, and thermometers.

Occasionally it is easy to design a metric and to gauge a meter for such an intensive magnitude. Torricelli’s tube provided a useful barometer with a linear scale, acting as a standard for other types of manometers. In other cases it turned out to be quite elaborate. A large part of thermal physics, for instance, consisted in designing a theoretical metric for tempera­ture, the ther­modynamic scale of temperature (7.4). In the electric case, Alessandro Volta succeeded in building a linear and reproducible electrometer (1778), but its scale had no relation to other metrics.

Until 1820, for the electric current no reliable measurement method was available. This situation changed after Oersted’s discovery of the influence of an electric current on a magnet. Ampère and Schweigger realised that the easily measurable torque of a current on a magnet needle provides a measure for the current.[72] Johann Schweig­ger multiplied this force by letting the current traverse a large number of windings of a coil, and William Sturgeon even more by adding an iron core. The multiplicator was gradually developed into the galvanometer (for small currents) and ampere meter (for larger currents). After reliable standards for resistors were found, these could be combined with galvanometers to construct voltmeters. Until the introduction of electroni­c instruments in the second half of the twentieth century, these belonged to the most important measurement instruments for electric research and engineering.

Ohm’s and Kirchhoff’s laws

In 1826 Georg Ohm related the current in a wire to the tension across it.[73] He understood that the current is driven by the tension, which he therefore called the electro­scopic force. Ohm was inspired by Joseph Fourier’s theory of heat conduction (1822), considering the tempera­ture difference over a wire to be the driving force of a heat current through the wire.[74] In this theory one has to take into account the heat capacity of the heat conductor, but this is not necessary for an electric conductor as long as capacitors are not involved.

Ohm’s representation of his results was quite complicated. He did not formulate his law in the now most common form of V=iR:the tension and the current are proportional to each other. The ratio of tension (V) and current (i) is called an object’s resistance (R=V/i). Ohm’s own theory boils down to the formula for the electromotive force or emf (E) of his current source having an internal resistance (r) and an external load resistance (R): E=i(r+R). The emf is now defined as the energy supplied by a source in driving a unit charge around a circuit. Ohm’s law applies to metals and diluted electrolytes, if the tempe­ra­ture is kept constant. It does not apply to gases and semiconductors, as was found later on.

After the middle of the nineteenth century, Ohm’s law became well-known and widely applied in electrical engineering. Yet it took more than twenty years before it became recognized in physics. Apart from his incomprehensible theory several causes of this delay can be mentioned. Ohm could not relate his metric of tension to that of other magnitudes. In suit of Volta, Ohm considered the electroscopic force to be an electric charge density, which is quite different from the concept of potential difference, with which the electro­scopic force was eventually identified.

Green introduced the potenti­al in electrostati­cs in 1828, after Poisson applied it implicitly since 1812. It was initially considered a purely mathematical concept, only applied in the absence of currents (4.3). In static equilibrium, when in a conductor no current occurs, the electric field must be zero, meaning that the potential is everywhere the same. This means that the potential is the electric equilibrium parameter, an intensive magnitude. It also means that a potential difference invariantly accompanies a current in a conductor, as its driving force. It lasted till 1849, before Gustav Kirchhoff identified Ohm’s electrosco­pic force with potential difference, quite different from charge density. Only then physicists could incorporate Ohm’s law in the received electromagnetic theory.[75]

Kirchhoff arrived at the correct interpretation of Ohm’s law after he had formulated the two laws for electric circuits named after him. In the nineteenth century, an electric circuit became represented by a schema­tic figure as an electrically coherent spatial ordering of one or more current sources with resistors, capacitors, and coils. The local potential characterizing each point in the circuit can be variable, for instance in the case of alternating currents. At each moment the sum of the emf’s of all electric sources equals the sum of all potential differences across the other elements in the circuit. This is Kirchhoff’s second law for any electric circuit (1845-46),[76] derived from the law of conservation of energy.

Only a potential difference or tension can be measured, often with respect to a fixed point, the earth. Johann Poggendorff developed a method to compare potential differences with the help of a sliding resistor, still called a potentiome­ter. In a Wheat­stone bridge (invented circa 1843, probably not by Charles Wheatstone) potential differences and resistances can be measured independently of the measurement of currents, with a quite good precision. In both cases a null method is applied, such that one does not measure the current, but establishes that in a part of the circuit the current is close to zero.

The current strength in a circuit satisfies Kirchhoff’s first law, especially important for circuits consisting of several loops: in each junction in the circuit the net flow into the junction equals the net flow out of it, at any time. This is a consequence of the law of conservation of charge. With the help of Kirchhoff’s two laws circuits can be analysed, both for direct and for alternating currents.

Instrumentalism, mechanism, or romanticism did not significantly contribute to the development of electricity and magnetism as described so far. It was a fruit of experimental philosophy, to which Faraday in particular adhered. But in the development of the electromagnetic field Newtonian action at a distance was replaced by Cartesian action by contact. William Thomson was an avowed mechanist: ‘I never satisfy myself until I can make a mechani­cal model of a thing. If I can make a mechanical model, I can understand it.’[77] Maxwell found his laws from a mechanical model, although for him it was merely an analogy.

4.6. The electromagnetic field

Although Francis Hauksbee applied the idea of a field to the state in the space around a magnet or an electrically charged body already about 1700, in the Newtonian tradition the field received merely a mathematical meaning (4.3). In the force-matter dualism it expressed a reduced force, called the field strength, the force on a unit amount of matter at a given position. No physical reality was accorded to the mathematical fields of reduced force.

After Oersted’s discovery of the magnetic action of an electric flow, Michael Faraday called the space surrounding the wire an electrotonic state. William Thomson searched for a mechanical model of the ether that could explain the propagation of light and heat, as well as the electromagnetic phenomena, without action at a distance.[78] James Clerk Maxwell applied a mechanical analogy (9.4) as a heuristic to find the laws for electromagnetism and its unification with optics, giving the physical concept of a field an integrating function besides energy, force, and current (7.4).

This search was stimulated by the emerging energy principle (7.3). James Prescott Joule connected electric currents with energy. William Thomson proved that in certain cases energy can be stored in the magnetic field in a coil, and Maxwell predicted theoretically the possibility to transport energy via the electromagnetic field. In 1887, Heinrich Hertz’s experiments confirmed the physical meaning of the concept of a field apart from matter.[79]

Faraday on fields

Before 1850 Michael Faraday’s ideas approached the later concept of a physical field most. His world was full of lines of force (or rather tubes, as Maxwell observed), forming a continuum, a physical field. A material atom, an electric charge, or a magnetic pole was nothing but a nodal point of lines of force. Faraday rejected action at a distance, referring to a recently published letter, in which Newton wrote:

‘That gravity should be innate, inherent, and essential to matter, so that one body may act upon another at a distance through a vacuum, without the mediation of anything else, by and through which their action and force may be conveyed from one to another, is to me so great an absurdity that I believe no man who has in philosophical matters a competent faculty of thinking can ever fall into it.’[80]

Faraday observed that light having a finite speed was not consistent with action at a distance being instantaneous,[81] but he cared more about experiments on the magnetic and electric properties of extended matter than about a mathematical theory of fields. He demonstrated that a magnetic field could change the direction of polarization of light.

William Thomson replaced Faraday’s unspecified number of lines of force by the mathematically easier manageable concept of flux (4.3).[82] He united the laws of Heinrich Lenz (1834) and Faraday (1850) about electromagnetic induction into one mathematical formula. (Lenz’s law is often stated as: the direction of the induced current is always such as to oppose the change in magnetic flux that produces it.)

According to Faraday the induction of electricity is caused by a changing magnetic field. Induction occurs in coils with or without an iron core. These inductors became increasingly important in electrical technology. Thomson introduced the concepts of self-in­duction and of mutual induction. Self-induction concerning a single inductor had been investigated by Joseph Henry in 1832 and by Michael Faraday in 1834.[83] Mutual induction (the interaction of two or more coils) determined the energy transfer in a transformer.

William Thomson observed that in a circuit consisting of an inductor and a resistance, the inductor contains an amount of energy, built up when the current in the circuit increases. It manifests itself as a spark when the circuit is suddenly interrupted. The field energy represents a kind of inertia in the circuit, comparable to the energy stored in a Leyden jar. A combination of a solenoid and a capacitor in an electric circuit could sustain an oscillating current, having a specific resonance frequency.

Maxwell’s mechanical models

James Clerk Maxwell published between 1855 and 1865 his gradually developing ideas in a number of papers culminating in his epoch making book A treatise on electricity and magnetism (1873).[84] Its significance has been compared to Newton’s Principi­a (1687). Yet both books soon became outdated because (for the benefit of their readers) their authors did not make use of new mathematical techniques: in Newton’s case differential calculus, in Maxwell’s vector notation.

In his first paper, ‘On Faraday’s lines of force’,[85] Maxwell proposed to study a mechanical model of the ether, a hypothetical continuous and incompressible fluid. A positive electric charge would correspond to a source of this fluid; a negative charge to a sink; and the magnitude of any charge to the amount of fluid pouring into or out of the container per second. Outside the sources and the sinks the law of continuity applied, expressing the conservation of the fluid. Faraday’s lines of force corresponded to the lines of flow in the fluid. The fluid’s speed indicated the potential and the inverse square law followed from the incompressibility of the fluid. With this model Maxwell argued that Faraday’s theories did not contradict mechanical philosophy and that action by contact was applicable to electromagnetism no less than action at a distance as in Ampère’s theory.

Maxwell extended his mechanical model in his next paper, ‘On physical lines of force’.[86] The new model contained a complicated system of vortices, sometimes treated as wheels, influencing each other such that neighbouring wheels turn in the same direction. This would be impossible with cogwheels, therefore idle wheels without axes were added, for which Max­well found an example in some contemporary steam engines. The intermitting wheels would correspond to electric currents in the electro­mag­netic theory, and the axes of the vortices to magnetic field lines.

The second model was rather complicated. It was not intended to serve as a mechanical explanation of electromagnetism, but as a heuristic model, enabling Maxwell to find the four laws known as the Maxwell equations. These laws generalized the two principles of Gauss for static electricity and for magnetism (4.3); Ampère’s law on the generation of a magnetic field by an electric current (4.4), to which Maxwell added a non-material displacement current; and Faraday’s law on the induction of an electric field by a changing magnetic field. The field equations are as fundamental for the electromagnetic field as Newton’s laws for mecha­nics. Now Maxwell could summarize all phenomena into one theory: static electricity, magnetism, flowing electricity, and induction. The most striking fruit of Maxwell’s equations concerned light. He proposed that light has an electromagnetic character and is the visible part of a much larger spectrum (6.4). Only gravity did not fit.

The Maxwell equations are field laws. In the theories of Gauss and Weber electric and magnetic fields occur as well, but these were completely determined by the strength and position of the sources: charges, magnetic poles, and currents. Charges and currents occur in Maxwell’s equations too, but the field strength in any point is also determined by the surrounding fields. This means that Maxwell’s equations contain spatial derivatives (or, in an alternative formulation, spatial integrals), indicating how the fields differ from place to place, besides temporal derivatives determining their change in time.

Displacement current

In his mechanical model Maxwell compared an electrical conductor with a medium offering resistance to his hypothetical fluid, like air provides resistance to a moving object. He conceived of an isolator as a membrane, impermeable for the fluid, but deformable by the fluid’s pressure. The molecules in the isolator, subjected to an electric potential difference, are polarized such that in each molecule positive and negative charges are displaced. This gives rise to the displacement current able to cause a magnetic field as long as the poten­tial difference changes. Maxwell’s third equation, also called Ampère’s law, indicates that a mag­netic field arises by a material current (the motion of charged particles) and/or by a changing electric field. The displacement current did not occur in Ampère’s original law.

This is a rather speculative model, in particular because Maxwell had no reliable knowledge about the electrical constitution of molecules. Maxwell did not improve this when introducing the same possibility of polarization for the ether, without any justification within the model. Even in the absence of a material medium a displacement current could occur.

This new idea of a displacement current caused great conceptual difficulties for Max­well’s contemporaries because Maxwell introduced it ad hoc in an almost incomprehensible model. Maxwell could not prove the existence of the displacement by any experiment, but he stressed it as an indispensable part of his theory:

‘Whatever electricity may be and whatever we may under­stand by the movement of electricity, the phe­nomenon which we have called electric displacement is a movement of electricity in the same sense as the transference of a definite quantity of electri­city through a wire is a movement of electricity.’[87]

A dynamical model

Was Maxwell led to the electromagnetic laws by his models, or by his great physical intuition?[88] The mechanical models were very artificial. The connection with the speed of light was dubious. The displacement current could not be derived from the model and had to be introduced ad hoc.[89] Therefore, in his third paper, ‘A dynamical theory of the electro­magnetic field’,[90] Maxwell abandoned the mechanical models. The dynamical theory, returning in part IV of his Treatise,was based on Lagrange’s abstract mechanics. In this theory energy is the most important magnitude.

‘I have on a former occasion attempted to describe a particular kind of strain, so arranged as to account for the phenomena. In the present paper I avoid any hypothesis of this kind … In speaking of the Ener­gy of the field, however, I wish to be understood literally …’[91]

Like Joseph-Louis Lagrange, Maxwell divided the energy of the field into potential energy (U), only dependent on spatial coordinates in the field, and kinetic energy (T), depending on position coordinates and their derivatives. The system ought to satisfy two principles. The total energy (T+U) should be constant, and the difference (TU) should be a minimum during any interval of time for the actually occurring changes, compared to any conceivable change in the position coordinates and their derivatives with the same initial and final states. Two centuries before, a similar minimum principle was applied by Pierre Fermat, who explained the refraction and reflection of light by assuming that light always follows the shortest path as measured in time (6.2).

Analogies

By Oersted’s and Faraday’s discoveries an asymmetri­c relation arose, reinforced by Ampere’s hypothesis about magnetism as an effect of the electric current. Whereas an electric current induces a magnetic field, only a changing magnetic field induces an electric current. Oliver Heaviside, to whom physics owes the vector notation of Maxwell’s equations, observed that these made the electric and magnetic field almost symmetrical.[92] An electric field can be caused by a static source, by a moving magnetic pole or by a changing magnetic field. A magnetic field can be caused by a static pole, by a moving charge or by a changing electric field. Hence, the most important differences between the two fields disappear in Maxwell’s equations. If magnetic monopoles do not exist and magnetic currents do not occur, this is a consequence of the structure of matter, not of the laws of electromagnetism.

Both Thomson and Maxwell used mechanical models, for instance in their study of the electromagnetic field, but they differed in the appreciation of these models. Thomson considered a mechanical model as an explanation, whereas Maxwell treated it as a heuris­tic method. The systematic manner by which he found universal laws by connecting various fields of science he called the method of analogy.[93]

Maxwell was not the first to use analogies (9.4). During the first half of the nineteenth century, analogical theories were developed for gravity, for electro- and magnetostatics, based on the similarity of the inverse-square law (4.3). The wave theories for water, sound, and light display many analogies. The similarity of electric and thermal conduction was used by Ohm applying Fourier’s theory of thermal conduction to the electric case (4.5). In 1842 William Thomson published a study on the analogy between electrostati­cs and the flow of heat in an unequally heated body, deriving an inverse square law for thermal physics.[94] These are mathematical analogies – the similarity concerns the mathematical form of the theories, whereas the physical contents remained different. However, Maxwell wished to apply a physical analogy.

‘We must therefore discover some method of investigation which allows the mind at every step to lay hold of a clear physical conception, without being committed to any theory founded on the physical science from which that conception is borrowed, so that it is neither drawn aside from the subject in pursuit of analytical subtleties, nor carried beyond the truth by a favourite hypothesis. In order to obtain physical ideas without adopting a physical theory we must make ourselves familiar with the existence of physical analogies. By a physical analogy I mean that partial similarity between the laws of one science and those of another which makes each of them illustrate the other.’[95]

Maxwell wanted to demonstrate that an electro­magnetic theory, based on Faraday’s ideas was possible, because it was valid in an analogous case. His mechanical model was not first of all intended to provide an explanation of the electromagnetic field, or even a more or less correct description. That would have been an unjustifiable hypothesis, a relapse into the old-fashioned effluvium theory. In his first paper, Maxwell only wanted to show that the theory is consistent, not containing internal contradictions, as well as complete, or at least able to describe the same phenomena as the alterna­tive theory of Ampère and Weber.

His models having a heuristic function came to the fore in Maxwell’s second paper. Now he applied his model to derive new relations. As such it was only fruitful as long as it suggested new ideas:

‘I propose now to examine magnetic phenomena from a mechanical point of view, and to determine what tension in, or motions of, a medium are capable of producing the mechanical phenomena obser­ved. If, by the same hypothesis, we can connect the phenomena of magnetic attraction with electromagnetic phenomena and with those of induced currents, we shall have found a theory which, if not true, can only be pro­ved to be erroneous by experiments which will great­ly enhance our knowledge of this part of physics.’[96]

Therefore Maxwell dropped his mechanical model as soon as it had served its purpose. Only ‘in speaking of the Ener­gy of the field, however, I wish to be understood literally …’[97] The nineteenth-century search for a single natural force explaining all phenomena turned out to be not successful. However, the new abstract concept of energy allowed of comparing, measuring, and even transforming various interactions (7.3). Maxwell probably referred to this state of affairs when he said that in physical analogies energy should be taken literally.

Hertz’ experiments

Physicists now believe Maxwell’s theory to be as important and lasting as Newton’s dynamics, but Maxwell’s contemporaries were not immediately convinced of its relevance. Until the end of the nineteenth century universities remained teaching the theories of Ampère and Weber. Besides his students, few physicists accepted Maxwell’s theory immediately, not even in England. Among the continental physicists, besides Hendrik Antoon Lorentz and Albert Einstein only Hermann Helmholtz paid attention to Maxwell’s theory, by comparing it to those of Wilhelm Weber and Franz Neumann. He inspired Heinrich Hertz to perform some crucial experiments, demonstrating the existence of long wave electromagnetic radiation and the reality of the displacement current (1887).[98] Only now an increasing number of physicists became convinced of the merits of Maxwell’s theory. Hertz’s experiments inspired Guglielmo Marconi to his invention of wireless transmission of signals, leading to radio and television.

Mechanical models of the ether

Maxwell found his equations by an analogy with a mechanical system, but his definitive theory was not mechanical, unless someone would succeed in finding a mechanical model for the ether. Therefore many mechanist physicists sought to provide the electromagnetic field with a mechanical basis more satisfactory than Maxwell’s machinery. The mechanical ether as an elastic medium would have some remarkable properties. Fresnel’s assumption that light is a transversal wave (6.4) was confirmed by Maxwell. According to the theory of elasticity, transversal waves can only occur in a solid, not in a liquid or a gas. The solid ether should be very rigid in order to explain the high value of the speed of light, but planets should not experience any resistance from it. The various models proposed could not make these properties consistent.

Newton had arrived at the conclusion that a linear uniform motion with respect to absolu­te space would not be detectable, contrary to a rotation (3.7). The adherents of the ether theory in the nineteenth century believed that such a motion should be detectable with respect to the luminiferous ether. A confirmation appeared to be the existence of stellar aberration, discovered in 1728 by James Bradley. Because light needs time to move from the objective to the ocular, a moving telescope should have a slightly different angle from a stationary telescope in order to see the same star. Bradley used the annual motion of the earth around the sun, by looking at a suitable star at intervals of half a year. With his ether theory, Fresnel could explain this stellar aberration.

On a suggestion by Hermann Helm­holtz, Albert Abraham Michelson decided to determine the earth’s speed with the help of an interferometer, an apparatus in which the differences between two beams of light are determined by their interference.[99] In interference the wave character of light is applied, such that the beams reinforce each other if their phases are equal (when oscillating at the same pace) or weaken if their phase of oscillation is different. This was realizable if the two beams would have the same source.

Michelson split a beam of light into two beams, and applying mirrors let one beam move forward and backward in one direction, the other one in the perpendicular direction. If the earth moves with respect to the ether the two beams would need different times to cover their paths (being the same in both cases), leading to a phase difference and interference phenomena. The magnitude of the effect is determined by the speed of the earth with respect to the ether, estimated to be about 30 km/sec, to be compared to the speed of light being 300,000 km/sec. To determine the effect of a speed difference of the order of 0.01% requires a great precision. Floating in a mercury bath, the whole apparatus could be rotated, to allow of measurements in various directions. Moreover the experiment was repeated several times in the course of a year. Michelson was an extremely able ex­perimen­ter, but when he exerted the experiment in 1881 he found no measurable result.

Reading his paper, Lorentz spotted an error and asked Michelson to repeat the experi­ment. Together with Edward Morley, Michelson did so in 1887, again with the same negative result. The difference in speed was at least forty times smaller than would be expected on the basis of the ether theory. This time no doubt was possible, and soon Michel­son’s result was accepted as an em­pirical generalisation: the observed speed of light in vacuum is independent of the speed of motion of the instrument with respect to the ether.

Could this be included in the ether theory? Yes: both Lorentz and George Fitzgerald assumed (in 1892) that the motion through the ether, the carrier of electromag­netic interaction, would influence the electric and magnetic forces keeping molecules together. In the direction of motion all objects would shrink a little bit, and all physical processes would slow down. Lorentz and Fitzgerald calculated this space contraction and time dilation, arriving at the so-called Lorentz transformation formulas.

Many physicists were not happy with this ad hoc solution. First it seemed improbable that the motion of the ether would influence all objects and all processes in the same way, independent of the structure of matter. Second, one was back to square one, to an ether that could not be observed in any way. Though the ether theory was intended to provide a mecha­nical basis for the electromagnetic theory of light, Lorentz’s theory was not mechanical. Newton’s third law did not apply to the interaction between the ether and material particles. In Lorentz’s theory the ether was immovable, but this could not be stable, as Heaviside, Hertz, and Helmholtz proved.

The ether was intro­duced to explain action by contact. Maxwell had demonstrated that this could be described by means of fields. In fact there was no compulsive reason to base these fields on a mechanical ether, of which one could not even imagine a rational representation. After the acceptance of Maxwell’s theory of electromagnetic fields and Einstein’s theory of relativity, most physicists abandoned the assumption of a mechanical ether. Maxwell’s electromagnetic theory had no mechanical foundation.

Electromagnetic foundation of mechanics

Therefore, Hendrik Antoon Lorentz tried the reverse, to find an electromagnetic foundation for mechanics. Already in his doctoral thesis, he paid much attention to Maxwell’s theory.[100] Yet, like most contemporaries, he became only convinced of its truth after Hertz’s experi­ments. Since then Lorentz attempted to reconcile Weber’s theory (matter consists of particles having charge and mass, acting at a distance) with Maxwell’s (the interaction between charged particles takes place by contact via the electromagnetic field).[101] Maxwell abstained from giving an opinion about the interaction between charged particles and the field. In order to account for this interaction, Lorentz added to Maxwell’s equations an expression for the force that a charged particle experiences from an electric and/or magnetic field (1892). Though Oliver Heaviside had derived this law already in 1889, it became known as the Lorentz force. Because it is perpendicular to the motion of the charged particle on which it acts, a Lorentz force does not perform work.

Since 1892 Lorentz assumed the occurrence of two kinds of charged particles in metals, the relatively heavy ions besides lighter and easily moving particles. After 1897 the latter could be identified with the electrons occurring in cathode rays (9.5). Between 1892 and 1904 Lorentz succeeded in explaining a large number of phenomena, including the normal Zeeman-effect and the Faraday rotation of the plane of polarization of light in a magnetic field.

About 1900 the theory of Lorentz and others appeared to be very promising, reviving the idea of the unity of all natural forces. One hoped to be able to reduce all properties of matter (including inertia, i.e., the mass of atoms and electrons) to electromagnetic interaction.[102] This view was reinforced by Walter Kaufmann’s discovery (1902) that the mass of fast electro­ns depends on their speed. Some physicists attempted to reduce the electron’s mass to an electric interaction of the electron with itself. The dualism of force and matter, with particles characterized by their mass, was replaced by a new dualism of field and matter, now with particles characterized by their charge. It appeared that the mechanical world view could be replaced by an electromagnetic one, in a synthesis of all physical and chemical fields of science separated during the eighteenth century, apart from gravity.

Special relativity

However, the euphoria did not last long. The electromagnetic world view did not survive early twentieth-century developments. In 1905 Albert Einstein published two papers on relativity.[103] Called ‘On the electro­dynamics of moving bodies’, the first was motivated by Maxwell’s theory. Einstein observed that according to the electromagnetic theories prevalent on the European continent, the motion of a magnet with respect to a conductor should be sharply distinguished from the reverse motion of a conductor with respect to a magnet, although the observable effect (an electric current in the conductor) is in both cases the same.

Next, without mentioning any specific experiment, Einstein pointed out that all attempts to determine the motion of the earth with respect to the ether were in vain. He proposed the principle of relativity: the laws of electrodynamics and optics should be valid for (and have the same mathematical form in) all iner­tial ­systems. After analysing the method of synchronizing clocks at different positions, he postulated that the speed of light is the same in all inertial reference systems, making it the universal unit of velocity. As a consequence, between two inertial systems moving with respect to each other, spatial and temporal relations are transformed together, not apart. In a quite simple way, he derived the transformation formulas earlier found by Lorentz and others. He demonstrated that space contraction and time dilation in moving systems do not have an electromagnetic origin, but a kinetic one, making the molecular hypothesis of Lorentz and Fitzgerald superfluous. This fate also befell the luminiferous ether.

Einstein showed that the mass of a moving body depends on its speed, according to Kaufmann’s experiments. In his second and much shorter paper on relativity, he argued that the inertia of a body depends on its energy content, leading to the famous formula E=mc², expressing the equivalence of the mass (m) and the total energy (E) of a physical body.

This totally new result was soon confirmed in the measurement of the so-called mass defect of atomic nuclei. The mass of a nucleus is less than the sum of the masses of the constituting parts. The difference (the mass defect) corresponds to the binding energy of the nucleus. In atoms and molecules this defect is too small to be detected, but in nuclei it is relatively large.

Now even the concept of mass was unified with that of energy, but most arguments in favour of the electromagnetic world view evaporated.

Neither Maxwell’s nor Einstein’s theory said anything about the interaction of radiation and matter (6.5).


[1] Peregrinus 1269.

[2] Gilbert 1600. See Burtt 1924, 163-166; Roller and Roller 1948, 547-555; Roller 1959; Dijksterhuis 1950, 430-436 (IV: 172-182); Butterfield 1949, 141-145; Hooykaas 1971, 110-112; Lindsay 1968, 131-142; Hesse 1961, 86-91; Heilbron 1979, 169-179.

[3] Galileo 1632, 406.

[4] Dijksterhuis 1950, 435 (IV: 180).

[5] Bacon 1620; Heilbron 1979, 175.

[6] Gilbert 1600, 167-168; Dijksterhuis 1950, 435 (IV: 180); Lindsay 1968, 135, 137.

[7] Duhem 1906, 10-14.

[8] Scott 1952, 188-193.

[9] Descartes 1647, 278-305.

[10] Roller 1959, 127.

[11] Heilbron 1979, 175:

[12] Gilbert 1600, 74-97; Heilbron 1979, 169-179.

[13] Roller and Roller 1948, 547; 1953, 354.

[14] Heilbron 1979, 275.

[15] Heilbron 1979, 182-183.

[16] Heilbron 1979, 229-249; Roller and Roller 1948.

[17] Hackman 1978.

[18] Dibner 1963; Hackman 1978.

[19] Dibner 1963.

[20] Franklin 1751-1754.

[21] Hauksbee published his work in 1708-1709, Gray in 1731-32 and 1735-36, all in the Philosophical Transactions of the Royal Society.

[22] Du Fay 1733-34. See Roller and Roller 1948, 581-590; Cohen 1956, 371-376; Whittaker 1910, 43-44; Feather 1968, 8-11; Hackman 1978, 61-63; Heilbron 1979, 250-260; Bycroft 2013.

[23] Faraday 1839-55, I, 76-109; Whittaker 1910, 175; Cohen 1956, 288-290; Shapere 1973, 519.

[24] Heilbron 1979, 260:

[25] Roller and Roller 1948, 591-608; Cohen 1956, 44-53, 57-59.

[26] Hackman 1978, 90-103; De Pater 1979; Heilbron 1979, 310-323.

[27] Heilbron 1979, 416-421.

[28] Franklin 1751-1754; Cohen 1941; 1956; 1990; Riskin 1998.

[29] Heilbron 1979, 330.

[30] Donovan 1993, chapter 9.

[31] Cited by Gower 1973, 321-322.

[32] Newton 1687, 193-213.

[33] Kant 1786, 56-59.

[34] Gillmor 1971, 192-194; Whittaker 1910, 59.

[35] Roller and Roller 1948, 541-639, 611-614; Whittaker 1910, 53-54; Feather 1968, 11-19; Heilbron 1993, 81.

[36] Faraday 1839-1855, II, 279-284.

[37] Newton 1687, 193.

[38] Priestley 1767-1775, II, 372-376.

[39] Maxwell 1873, I, 80-86.

[40] Gillmor 1971, 194-198.

[41] Roller and Roller 1948, 608-611; Heilbron 1979, 458-489.

[42] Newton 1687, 414; Tricker 1965, 3-4; Whittaker 1910, 55-57; Gillmor 1971, 140-150, 175-182, 188-190, 210-214; Palter 1972; Feather 1968, 57-68; Heilbron 1979, 87-97; 1993, 66-72.

[43] Roller and Roller 1948, 608-611; Heilbron 1979, 458-489.

[44] Millikan 1917; Franklin 1981.

[45] Gillmor 1971, 176-181, 193-194, 214-216.

[46] Gillmor 1971, 175-176.

[47] Gillmor 1971, chapters 5 and 6; Whittaker 1910, 57-60; Roller and Roller 1948, 614-621; Heilbron 1979, 465-477; 1993, 66-72, 75-81.

[48] Maxwell 1873, I, 80-83; Aykroyd 1935; Dorling 1974.

[49] Gillmor 1971, 191-192; Roller and Roller 1948, 621-62; King 1963.

[50] Bridgman 1927, 131-133.

[51] Fulcher, Telljohann 1976.

[52] Hoppe 1884, 330-339; Whittaker 1910, 60-66; Crosland, Smith 1978; Caneva 1978.

[53] Whittaker 1910, 61.

[54] Faraday 1839-55, II, 279-284.

[55] Maxwell 1861-1862, 499-500.

[56] Pancaldi 1991; Heilbron 1991.

[57] Sutton 1981.

[58] Caneva 2005, 176-190.

[59] Oersted, cited by Berkson 1974, 35.

[60] Hoppe 1884, 205-217, 233-237; Whittaker 1910, 82-88; Brown 1969; Tricker 1965; Frankel 1979.

[61] Hesse 1961, 216-218; Tricker 1962, 453-468; 1965, 98-110.

[62] Hoppe 1884, 229-233.

[63] Gillmor 1971, 218; Williams 1962.

[64] Maxwell 1873, II, 175-176.

[65] Jungnickel, McCormmach 1986, I, 141.

[66] See Williams 1962; 1973, 3-22;  Duhem 1906, 195-200; Tricker 1965, 32-36, 155-161; Feather 1968, 91-99.

[67] Ampère 1826.

[68] Whittaker 1910, 305-306; Schonland 1968, 37; Jungnickel, McCormmach 1986, II, 28.

[69] Hoppe 1884, 426-507; Whittaker 1910, 198-207; Jungnickel, McCormmach 1986, I, 140-146.

[70] Jungnickel, McCormmach 1986, I,148-152.

[71] Walker 1936.

[72] Hoppe 1884, 203-204; King 1963; Brown 1969.

[73] Ohm 1827; 1892; Hoppe 1884, 250-259; Whittaker 1910, 90-93; Shagrin 1963; Heidelberger 1980; Jungnickel, McCormmach 1986, I, 51-58.

[74] Fourier 1822; Herivel 1975, 149-208; Friedman 1977; Wise 1979, 52-58; Truesdell 1980, chapter 4.

[75] Hoppe 1884, 345; Whittaker 1910, 224-226; Shagrin 1963, 545-546.

[76] Hoppe 1884, 339-345.

[77] Thomson 1884, cited by Brush 1976, 580.  

[78] Whittaker 1910, 94-169; Schaffner 1972; Doran 1975.

[79] Hertz 1892; D’Agostino 1975; Jungnickel, McCormmach 1986, II, 86-92; Buchwald  1994.

[80]  Newton, Letter to Bentley, cited by Thayer (ed.) 1953, 54; see Faraday 1855, 532, 570-571.

[81] Berkson 1974, 108-112.

[82] Whittaker 1910, 219-224; Duhem 1906, 69-75, 80-85; Feather 1968, 121-145; Berkson 1974, 136-139; Doran 1975; Moyer 1978.

[83] Faraday 1839, 322-343; Maxwell 1873, II, 195-198; Tricker 1966, 33.

[84] Maxwell 1873.

[85] Maxwell 1855-1856.

[86] Maxwell 1861-1862.

[87] Maxwell 1873, I, 69. See O’Rahilly 1938, 89-90.

[88] Chalmers 1973a.

[89] O’Rahilly 1938, 77-101; Bromberg 1967; 1968; Siegel 1975; 1986.

[90] Maxwell 1864-1865.

[91] Maxwell 1864-1865, 563-564.

[92] Bork 1963; Berkson 1974, 197-199.

[93] Duhem 1906, 93-99; Turner 1955; 1956; Hesse 1963; 1974; Kargon 1969; Chalmers 1973a; Achinstein 1991, 207-232.

[94] Wise 1979, 58-60.

[95] Maxwell 1855-1856, 155-156; 215-229.

[96] Maxwell 1861-1862, 452.

[97] Maxwell 1864-1865, 563-564.

[98] Hertz 1892; D’Agostino 1975.

[99] Jaffe 1961; Livingston 1973.

[100] H.A. Lorentz, Over de theorie der terug­kaatsing en breking van het licht (On the theory of reflection and refraction of light, 1875).

[101] Lorentz 1909; Whittaker 1910, 386-428; Goldberg 1969; Hirosige 1969; McCormmach 1970a; 1970b; Zahar 1973; Berkson 1974, 256-290; Doran 1975; Miller 1980, chapter 1, 243-245; Jungnickel, McCormmach 1986, II, 227-238; Darrigol 1994; Kragh 1999, 9-10, 108-111; Seth 2004.

[102] Jammer 1961, chapter 11.

[103] Einstein 1905a; 1905b. See Miller 1980; Pais 1982; Jungnickel, McCormmach 1986, II, 244-247; Darrigol 1996.


Chapter 5

The solution of problems

5.1. The various functions

of the statements in a theory

The preceding chapters discussed the emergence of four mutually irreducible principles of explanation – the quantitative, the spatial, the kinetic, and the physical ones. The next would be life, as a biological principle of explanation. It has never been connected with classical physics (except for the application of microscopes), but this principle plays a part in the study of theories, their structure and functioning. Life includes generation, descent, kinship, growth, and metabolism. For theories, this means the growth of knowledge, the transformation of data, the generation and solution of problems. These are no more and no less than analogies, logical kinships. Life also includes the internal differentiation of the parts of a living organism, according to their internal functioning. Likewise, in a theory, defined as a deductively ordered set of true statements, propositions having various functions can be distinguished.

Chapters 1 and 2 discussed four stages in the use of theories, to wit, identification (a planet is a wandering star, the morning star is identical with the evening star); description (of the planetary motion through the zodiac, of backward motion); prediction (of solar or lunar eclipses); and explanation (of retrograde motion, for example). The present chapter deals with the fifth stage, the solution of problems.[1]

This chapter starts with a discussion of the various functions of the statements in a theory, needed for the solution of problems (5.1). Next it explores Thomas Kuhn’s notion of normal science, the problem solving period of any science (5.2). The function of theories is not only to solve problems, but also to generate new problems, contributing to the growth of knowledge (5.3). Kuhn’s views about crises and revolutions appear to be at variance with the history of science (5.4). The solution of problems generating new problems leads to the phenomenon of shifting problem situations in the history of a field of science like optics (chapter 6).

Various kinds of statements

Neither the earlier given definition of a theory (1.3), nor its ensuing criterion whether a statement does or does not belong to a theory, differentiates between the functions of the statements which form the elements of a theory. But we already found occasion to distinguish between universal or law statements, and existential or factual statements (1.5). The present section will distinguish statements according to their function in the deductive process carried out in the theory. Whether a statement or proposition is a definition, an axiom, a presupposition, a lemma, a datum, a theorem, a problem, or a hypothesis is mostly determined by the theoretical context, and may be different in another context. As in a living organism, the whole determines the functioning of the parts.

A theory can only be an instrument for the solution of problems if it contains a flexible variety of statements. First, it should dispose of a suitable set of concepts. Because concepts and properties are not statements, they cannot be elements of a theory, being a set of statements. Therefore the solution of a problem may start with a few definitions. By a definition a concept or property is introduced into a theory, often in order to specify a new problem.  Usually, a definition does not fully determine the extension and intension of a concept, because these are to be developed in the theory itself. In this respect, definitions are theory-dependent, or theoretical. Definitions concern general or class concepts, like planets, as well as particular ones, like the moon. A peculiar kind of definition is the operational one (1.4).

Axioms are law statements, supposed to be true within the context of, but not provable by the theory (8.1). In Newton’s theory of gravity, the inverse-square law is the fundamental axiom. In Newton’s theory of planetary motion, the central axiom is the assumption that the gravitational force is the only one acting between the bodies of the system (even if Newton was aware of other forces acting between the celestial bodies, like the magnetic one). No theory can function without one or more axioms, without law statements.[2] Every theory is characterized by its axioms.[3] Hence, Newton’s theory of planetary motion differs from Descartes’, because each starts from different axioms, but Kant’s theory is only an extension of Newton’s. However, the same theory can be axiomatized in various ways, either because axioms and theorems can change parts within a theory, or because the same physical law allows of various mathematical formulations.[4] For the solution of a new problem, one usually accepts the axioms of the theory to be applied, choosing the most useful formulation

Presuppositions are law statements, supposed to be true, which are borrowed from other theories, depending on the problems one encounters. In Newton’s theory of gravity, suppositions are the laws of mechanics (i.e., the three laws of motion and a number of theorems derived from these), and several branches of mathematics. Presuppositions are indispensable for a theory, but not characteristic. They are exchangeable. One set of presuppositions may be exchanged by another one without changing the theory significantly. Newton’s theory of gravity does not change very much if Newton’s geometrical methods of proof are replaced by those of the calculus; or if the third law of motion is replaced by the laws of conservation of energy and linear momentum; or if Newton’s version of Kepler’s second law (the area law) is replaced by the law of conservation of angular momentum. A theory cannot function without these presuppositions, but is not severely tied to them.

Data are factual statements, often derived in other theories, from instrumental observations, or from experiments. Data are taken to be true within the context of the theory. The truth of a datum strongly depends on this context. In one theory a statement may be a problem, or the solution of a problem, in another theory it may be a datum, accepted truth. Outside the context of a theory, a datum may be considered false, or only approximately true. Even in the same theory, one often uses successively, but never simultaneously, data contradicting each other. For instance, in Newton’s theory of planetary motion, planets are first considered to be points, next to be perfect spheres, although both statements are false and contradictory (1.3). In theories of motion, data often consist of the positions and velocities of the moving bodies at a given instant. For this reason, data are often called initial or boundary conditions. A particular set of initial conditions may constitute a model.

Even more so than presuppositions, data are exchangeable in a theory.[5] One may even exchange a datum by its logical negation. But data are indispensable for the solution of most if not all problems. The choice of the data depends on the problem at hand.

Theorems or propositions are factual or law statements which are derived (deduced) within the theory, sometimes in order to solve a particular problem, if the theorem is not already the solution.

Hypotheses have the same function as axioms or data. Nowadays philosophers do not always care to distinguish between, for instance, axioms and hypotheses, but during the Copernican revolution, hypotheses had a lower status than they have now (chapter 8). The truth of a hypothesis was not taken for granted.

A hypothesis may be a tentative solution of a problem, or a statement from which a solution can be derived. It is introduced to test its fertility, its problem solving capacity. In a logical sense, every axiom, theorem or datum is a hypothesis,[6] but this radical view, though very common, is not very useful if one wants to distinguish between various functions of the elements of a theory. Within the context of a theory it is possible and fruitful to distinguish between statements which truth is not doubted by the users of the theory, and statements whose relative truth value is in question. Hence, we take hypotheses to be tentative, temporal, and often speculative elements of a theory.

Problems are not statements

Problem solving may not be exclusively human, but only people invent and apply theories in this activity.[7] Problems have a logical structure different from concepts, statements, or theories. Apparently, they have analogical biological characteristics, besides logical ones. Problems are born, they flourish, and bear fruit, they generate new problems, and they perish. Only the solution of a problem is a statement, a theorem, or a set of statements, in the context of an existing or a new theory. The most simple form of a problem is ‘to prove the following statement’, which is the form of an instruction. However, in many cases a problem can only be formulated in this way, if its solution is known, or if a tentative solution is stated. Usually, unsolved problems will have a more open form, like that of a question.

As long as a problem is not solved, it cannot be decided whether its solution satisfies the criterion according to which a statement does or does not belong to the theory. Hence it is not always obvious whether a theory should be able to solve a problem, whether the problem belongs to the theory. For instance, before 1577, comets were considered meteorological phenomena, occurring at the periphery of the sublunary spheres, and problems concerning comets belonged to meteorology. Only when Tycho Brahe demonstrated their distance to be larger than the moon’s, comets became celestial phenomena, and their motion became a problem for astronomers.

A problem may be quite meaningless outside the context of a certain theory. For instance, the problem of determining the relative distances of the planets to the sun from the observed retrograde motions, which is solvable in Copernicus’ theory, is meaningless in Ptolemy’s theory, in which the relation of the retrograde motion to the sun’s is merely accidental.

Input-output scheme

The above classification of statements is probably not exhaustive, but suffices for our purpose, to proceed with our analysis of the structure and functions of a theory. Implicitly, the core of a theory was distinguished from its periphery. A theory has a relatively small nucleus of statements characteristic of the theory: its axioms and a small numbers of theorems. This nucleus is surrounded by a vast cloud of other statements: an unspecified number of presuppositions, a potentially infinite amount of data, and an unknown number of problems.

Like a living system, this scheme has an input-output character. The input consists of a problem together with some data considered to be necessary to solve the problem. The output (a fruit of the theory) consists of the solution of the problem. In the above example, the input is the problem to determine the relative dimensions of the planetary orbits, and the data are the observed sizes of the retrogradations. The output, the solution of the problem, consists of data, which can be used as input values for another theory, for instance, Kepler’s model of the solar system. Hence, if a theory functions well, problem solving contributes to the growth of scientific knowledge.

Data, presuppositions, and even problems may have their origin in other theories, and the solutions of problems may be useful in still other theories. Hence, each theory forms part of a network of theories. Stripped from data, presuppositions and problems, a theory is no more than a deductive scheme.

5.2. Normal science

Both Karl Popper and Thomas Kuhn have stressed that theories should solve problems, and thus contribute to the growth of knowledge.[8] Popper points to the biological character of problem solving, and Kuhn has become famous because of his philosophy of paradigms and normal science.

According to Popper, the main if not only method of solving problems is trial and error, an idea drawn from Darwin’s theory of evolution:[9]

‘Human thought tends to try out every conceivable solution for any problem, with which it is faced … by the method of trial and error. This, fundamentally, is also the method used by living organisms in the process of adaptation.’[10] ‘If the outcome of a test shows that the theory is erroneous, then it is eliminated; the method of trial and error is essentially a method of elimination. Its success depends mainly on three conditions, namely, that sufficiently numerous (and ingenious) theories should be offered, that the theories offered should be sufficiently varied, and that sufficiently severe tests should be made. In this way we may, if we are lucky, secure the survival of the fittest theory by the elimination of those which are less fit.’[11]

(When speaking of theories, Popper probably means hypotheses.) Popper’s view is doubtless attractive because of its radical simplicity. But it is hardly probable that any professor would accept a student literally working on a scientific problem according to Popper’s recipe. Trial and error is more the everyday way of solving problems than a scientific, systematic one.

Thomas Kuhn identifies normal science with the problem-solving stage in the history of any field of science, which follows the acceptance of a paradigm.[12] For examples, Kuhn points to Aristotle’s Physics, Ptolemy’s Almagest, Copernicus’ Revolutionibus, and Newton’s Principia and Opticks.[13] Kuhn’s concept of a paradigm was at first quite ambiguous. Masterman localized more than twenty different meanings of the concept of paradigm in Kuhn’s work, and his views changed considerably in the course of time.[14] However, in his Postscript – 1969, he stated that the term paradigm is used in two different senses.

‘On the one hand, it stands for the entire constellation of beliefs, values, techniques, and so on shared by the members of a given community. On the other, it denotes one sort of element of that constellation, the concrete puzzle-solutions which, employed as models or examples, can replace explicit rules as a basis for the solution of the remaining puzzles of normal science.’[15]

I have some doubt with respect to Copernicus’ Revolutionibus. Nobody tried to solve problems according to Copernicus’ methods, except Kepler, and he failed. Also in Kuhn’s first sense, Revolutionibus hardly counts as a paradigm. It was the belief that the earth moves which made someone a Copernican, but most Copernicans did not bother to read Copernicus’ book beyond the first chapter. However, the other four examples mentioned above certainly satisfy Kuhn’s definition of a paradigm, at least in the second sense. For many centuries, Aristotle’s Physics determined the character of an explanation. Ptolemy’s Almagest showed how to calculate planetary motions, Newton’s Principia how to conduct theoretical science, and his Opticks how to do experimental work in physics and chemistry.

Yet, these examples enable to shed doubt on Kuhn’s suggestion that a paradigm has an exclusive character such that anybody who does not accept it will find themselves outside the scientific community. During the Middle Ages, Aristotle’s philosophy was continually challenged by Platonists, and Ptolemy’s heterocentric system was challenged by the Aristotelian homocentric system. Copernicans warred simultaneously with adherents of Ptolemy and of Brahe. Newton’s experimental philosophy competed with Cartesian mechanism.

Even a single scientist may work simultaneously according to several paradigms. For example, between 1600 and 1606, Kepler tried to solve the problem of the motion of Mars according to Ptolemy’s, Copernicus’, and Brahe’s methods. He failed, but his own solution (the laws of planetary motion) did not become a paradigm before Newton proved them.

Textbook science

Partly, normal science is the practicing of science in an educational context.[16] After the acceptance of a paradigm, textbooks are no longer different with respect to their contents. At most they differ by the choice of examples, their didactics, or their lay-out. Textbooks present the accepted views of the scientific community. Alternatives are not mentioned at all, or at best as examples to be avoided. The structure of a textbook is authoritative. The theory, the only true doctrine, is presented as incontestable truth. This refers to Kuhn’s interpretation of a paradigm as defining a scientific community.

At the end of each chapter, problems are given in order to train the student to solve problems according to the method suggested by the book. Kuhn stresses that this is the only way for a student to understand the meaning, for instance, of Newton’s force law. Only if a student has learned enough he may try to solve problems which have never been studied before. Many students never arrive at this stage. By solving problems, a student proves his ability to become a scientist. According to Kuhn, normal science is largely concerned with puzzle-solving.[17]

However, normal science is more than science in an educational context. It also means the development of a theory, without doubting its fundamental axioms, its nucleus, as defined above. This is done by solving problems. Usually, problems cannot be solved by a given set of axioms, theorems, presuppositions, and data. New theorems have to be derived, presuppositions must be developed, data criticized, and new data must be found. Collecting data is only meaningful in the context of a developing theory, with the aim to solve a problem, and each problem needs its own set of data.

Hence, normal science is by no means as uncritical as is suggested by the textbook metaphor, even if the nucleus of the theory remains unquestioned. Therefore, the criticism levelled by Popper and others concerning the supposed uncritical character of normal science seems to be unwarranted.[18]

The theory-dependence of problems

Whereas a theory is characterized by its axioms, a field of scientific research is characterized by its problems. According to Kuhn a social group of scientists is determined by their sharing a paradigm.[19]

It can hardly be denied, however, that astronomy is a field of science, practiced by astronomers, who during important historical periods were guided by widely different paradigms – for instance, the Copernican versus the Aristotelian, or the Cartesian versus the Newtonian. Galileo’s Dialogue (1632) shows conclusively that adherents to competing paradigms were very well able to communicate their problems and to discuss each other’s solutions.

Because Kuhn assumes that a paradigm not only determines how problems should be solved, but even determines the problems themselves, he thinks that adherents of competing paradigms cannot even understand each other’s problems.[20] However, problems are not completely independent of a theory. The same problem may appear differently in various theories. In Ptolemy’s theory it is a problem to describe retrograde motion with the help of fictitious deferents and epicycles. Copernicus explained retrograde motion as a logical consequence of his theory. The non-observability of the stellar paradox is a problem in Copernicus’ theory, but not in Ptolemy’s, who, however, would have had no trouble to understand it.

Anomalies

Problems having an easy solution in one theory may be insoluble in an alternative theory. For example, Aristotle’s theory of crystalline spheres surrounding the earth explains why the moon always turns the same face to the earth. No Copernican theory could explain this. Descartes’ theory of vortical motion (3.4) easily explains why all planets turn around the sun in the same direction, and more or less in the same plane. Moreover, they rotate about their axis in the same direction, and this also applies to the sun and the moon.[21] This problem has never been solved in Newton’s theory.

Thus, within a theory one encounters unsolved problems besides solved problems and anomalies. An unsolved problem is called an anomaly if it can be solved by a competing theory.[22] (For Kuhn, an anomaly is a problem which defies resolution in the context of an accepted paradigm.) Hence, anomalies are theory-dependent. Einstein’s general theory of relativity (1916) is able to solve the perihelion motion of Mercury. Before, it was an unsolved problem in Newton’s theory, but since 1916, it is an anomaly in that theory. For a discussion of the merits of rivalling theories, anomalies will be relevant. A theory containing anomalies is suspect, and the removal of an anomaly will count as a triumph over competing theories.

Christiaan Huygens

Christiaan Huygens contributed little to the revolutionary character of seventeenth-century science.[23] Yet he was considered the foremost mathematician of his generation, and an important physicist and astronomer. He may be considered a prototype of a normal scientist. Like Kepler, Galileo, and Descartes before him, he solved many problems, but he does not belong to the architects of Copernicanism and mechanism, to which views he adhered.

As a mathematician, Huygens followed Descartes in his analytical geometry, and Torricelli in calculations of centres of gravity. Through Fermat and Pascal he became interested in problems of chance.[24] In all three fields he excelled, but he contributed nothing to the development of the calculus, which he left to his younger contemporaries, Newton and Leibniz.

As an astronomer, he improved the telescope by the invention of a new ocular and the micrometer. In 1655 he discovered the first moon of Saturn. He explained the unusual appearance of Saturn, first observed by Galileo in 1610, as being caused by a ring around this planet. His Systema Saturnium (1659) was the most important work on telescopic astronomy since Galileo’s.[25]

As a physicist, Huygens discovered the laws of pendulum motion and of centrifugal acceleration in uniform circular motion, and he explained the double refraction of light in Icelandic crystal. In order to achieve this, he developed a theory of light from an existent theory by Descartes (6.2). He also improved Descartes’ laws of collision, realizing that the law of conservation of quantity of motion can only be valid if this quantity has direction as well as magnitude (3.4). Moreover, he applied the principle of relativity to the problem of impact.

As a designer Huygens became famous because of the pendulum clock (1657) and the spring balance clock (1674-1675), not because he discovered any new principle applied, but because he was the first to bring various principles together, and to build a working instrument. Initially it was intended as an instrument for the determination of longitude at sea, for which a pendulum clock is obviously not well fitted. This drawback stimulated Huygens to devise the spring balance, lying at the foundation of the ship’s chronometer.[26] He published his results in Horologium oscillatorium (1673).

Huygens combined experimental with theoretical work, for instance in his investigation of Descartes’ theory of gravity (3.4). His great merits as a versatile and fertile problem solver were recognized by Jean-Baptiste Colbert, the minister of Louis XIV, who wanted Paris to become the cultural and political capital of the world. He persuaded Huygens to become the leader of the Académie Royale des Sciences, founded in 1666. With some interruptions, Huygens worked there till 1681 on his own research, stimulating others.

Although he may have had few revolutionary ideas, Huygens was by no means a slavish imitator of others. He adopted ideas from Descartes, but criticized these as well, developing these in his own way. Newton’s work he admired, but he did not share his ideas. It may be surmised that Huygens was too critical to be original. Yet, this is typical for normal science, as I understand it. From Kuhn’s description, normal science can be judged too easily to be uncritical, bound up as it is to a paradigm. But considering Huygens as a prototype of a normal scientist implies that slavish adherence to a paradigm is not necessarily prominent in normal science. Mature normal science should be distinguished from textbook science, described above. Typical for normal science is the central position of problem solving, as well as a critical view of ideas and theories suggesting solutions. Normal scientists worth their name should feel free to make a choice between various paradigms available, even if they lack the creativity to invent a new paradigm, a new method of solving problems.

5.3. The generation of problems

The fertility of a theory can be judged by its capacity to solve problems, as well as by its capacity to generate new problems.[27] Copernicus’ theory was accepted by many people not because it solved existing problems, but because it challenged creative people to solve new problems, connected with the idea of the moving earth.

In general, it is not the theory itself, but the solution of a problem with the help of the theory which generates a new problem. Of course, it is neither a theory nor a solved problem, but some scientist who invents a new problem, suggested by the theory and its consequences. The merit of great scientists like Kepler, Galileo, Descartes, and Newton is not first of all the capability to solve problems, but to pose the right questions at the right time, the recognition of relevant problems.[28]

The question which was first, a problem or a theory, is like the question of the chicken and the egg. One of the oldest astronomical exercises, to explain retrograde motion, only became a problem after the acceptance of Plato’s theory which assumed all celestial bodies to move uniformly in circular orbits. Copernicus’ solution of this problem generated the problem of stellar parallax (2.5). According to his theory, not only the planets but all stars should display an apparent motion caused by the annual motion of the earth. Only in 1838 Friedrich Bessel succeeded in actually observing this. Up till then, Copernicus and his adherents had to explain why parallax was unobservable, by pointing to the limited accuracy of observation results, and by the ad hoc assumption that the stars are very far away, compared to the distance from the earth to the sun and even Saturn.

Copernicus’ theory generated far more problems than it solved, especially because the earth’s motion cannot be experienced. Would not the air and the clouds fall behind the earth? Would not a cannon ball fired to the west fly much farther than a ball fired to the east? Would not an object falling from the top of a tower show a deflection to the west, instead of falling vertically down? Would not people at the equator be hurled off the earth?

Many of these problems were faced by Copernicus (and even by Oresme, in the fourteenth century). Copernicus stressed that water and air rotate together with the earth.[29] He refuted Ptolemy’s argument that a revolving earth would break apart, by stating that the earth’s rotation is natural, a consequence of its spherical shape. Hence it cannot have any violent effects, for a natural motion destroying the nature of a moving body would be a contradiction in terms.[30] Later on, Bruno, and especially Galileo, took pains to refute the arguments against the motion of the earth.[31]

The Copernican theory generated the problem of proving that the earth is not fundamentally different from the other planets. Galileo used his discovery of the phases of Venus to show that the earth, the moon and Venus share the property of reflecting the light of the sun. For the moon, this property was generally acknowledged. The observed phases of Venus are similar to the phases of the moon, and can only be explained by assuming Venus to be a light reflecting body instead of a primary source of light like the sun or the fixed stars.

Next, Galileo argued that the earth also reflects the light of the sun. He did so by pointing to the so-called secondary light of the moon, occurring shortly before or after new moon, when alongside a small sickle the dark part of the moon is perceptible.[32] Galileo also explained why this phenomenon is not observable at first or last quarter.

Descartes’ identification of space and matter implied the existence of only one possible mode of interaction, action by contact. Combined with his view on the indestructibility of motion, this led him to an investigation of the problem of impact (3.4).

Shortly after its foundation (1662), the Royal Society of London organized a contest about the problems of impact, and the prize was won by John Wallis and Christopher Wren. Huygens too delivered a solution. In fact, each of them solved a different problem. Huygens proceeded on the road of Descartes, by applying the law of conservation of motion and using a coordinate transformation. This technique, invented by Huygens, depends on the principle of relativity of motion, and later became a powerful expedient for the solution of many related problems.

Wallis and Wren applied the law of conservation of vis viva. Wallis studied inelastic impact, Wren and Huygens elastic collisions.

Newton

In 1685 Newton was challenged by Edmund Halley to solve the problem of whether an elliptical orbit would agree with an inverse-square law concerning the attraction of the planet by the sun.[33] Newton solved this problem, but the ensuing theory generated various new problems.

It generated the problem of the disturbance of the planetary motion by other planets. It generated the problem whether a planet moves around a resting sun, or whether both sun and planet move around their common centre of mass. The theory generated the problem whether the gravitational force exerted by a spherical body of given mass depends on the body’s radius.[34] It generated the problem about the influence of the sun and the moon on the tides,[35] and of the sun on lunar motion. It generated the question of whether the motion of a falling apple is of the same nature as the moon’s motion. It generated problems concerning the shape of the earth and its influence on the acceleration of falling bodies, besides the latter being influenced by the earth’s rotation. It led to the question whether gravity and magnetism are the same force.

Between 1685 and 1687 Newton solved most of these and many other problems, of which the first book of Principia (1687) mentions 48, and the third 22. The most obstinate problem, concerning the lunar motion, occupied Newton till shortly before 1700, when he gave it up. It was only approximately solved several decades after his death.

Between 1704 and 1717, Newton suggested an increasing number of problems in the queries appended to successive editions of Opticks (6.3).

Its problem-generating capacity makes a theory an instrument for the investigation of nature. Generating and solving problems constitutes a great deal of the history of science. History of science is the history of its problems.[36]

Because a theory generates new problems it may predict new phenomena, for example those occurring in experiments never tried before. If the result agrees with the solution of the problem, this reinforces a theory (though it does not prove it), especially if the experimental result could not be expected without the use of the theory.[37] This somewhat dramatic kind of reinforcement of a theory does not occur as frequently as one might expect. The most obvious example during the Copernican era is Pascal’s prediction that the barometric pressure depends on height.

5.4. Crisis and revolution

According to Thomas Kuhn, a period of normal science ends in a crisis, induced by a persistent anomaly or an increasing number of anomalies, problems which cannot be solved according to the accepted paradigm.[38] Eventually, a new paradigm replaces the old one, and this constitutes a scientific revolution. This view easily leads to a proliferation of revolutions and a devaluation of the meaning of this word. In this section I offer some arguments against Kuhn’s theory.

The acceptance of Copernicanism and the emancipation of science

For more than one reason Copernicus’ theory was incredible, at least during the sixteenth century. One of the reasons was that it contradicted the generally accepted Aristotelian philosophy, including its physics, cosmology, and astronomy. For most philosophers, the slight advantage of the Copernican system solving one or two astronomical problems did not offset the loss of a coherent system.[39] Perhaps this motivated the opposition against the heliostatic model by Martin Luther, Philipp Melanchton and other German Protestant theologians, even more than the supposed conflict with the Bible, which only became significant when Giordano Bruno used the Copernican theory as propaganda for his heretical ideas.

Bruno was imprisoned in 1592 by the papal Inquisition. Shortly after the Jesuit cardinal Robert Bellarmine (who later played a part in the so-called Galileo affair) became Inquisitor, Bruno was convicted and burned at the stake in 1600, though not because of his Copernican views.[40] Nevertheless, since this event Copernicanism became suspect in the eyes of conservative Roman Catholic theologians.[41] Copernicus’ book was placed on the Index of prohibited books between 1616 and 1621, and from 1633 to 1822. Both periods started with events involving Galileo (8.3).

Sixteenth-century astronomers recognized Copernicus’ ability as a mathematician, but apart from Maestlin and Kepler, before 1600 no professional astronomer became a Copernican (2.5). Initially, the adherents of Copernicanism only included people who had reasons independent of astronomy to doubt Aristotelian science, like Ramus, Stevin, Gilbert, Benedetti, Galileo, Beeckman, and Descartes.[42] They considered Copernicus a welcome ally in the struggle against the supreme Aristotelian philosophy. They became Copernicans, not on the force of astronomical arguments, but because Copernicus’ theory supplied them with an argument against Aristotelian cosmology. But it is not sufficient to show that a system like Aristotle’s has faults. It is only possible to combat such a system if there is an alternative such as was supplied by mechanical philosophy (3.3, 3.4).[43]

Copernicus and Kepler did not have a new alternative at their disposal. They had to take recourse to an older alternative, Platonism, Pythagoreanism, or both. This move was not very effective, because in the course of centuries the superiority of Aristotle over Plato had been firmly established. Plato’s views do not constitute a firm building like the Aristotelian one. Still, the renewed interest in Plato during the Renaissance must be viewed in the light of the growing resistance against Aristotelianism.

The historical significance of Descartes (and of Galileo before him) is that instead of appealing to an alternative ancient philosophy, he designed a new one, the philosophy of mechanism, although he was probably neither the first mechanist nor the most radical one (which may have been Spinoza). Descartes is generally considered to be the founder of modern philosophy. In the long run, the contents of his philosophy have not been very influential in physics. The typical Cartesian views on matter, quantity, space, and motion have retarded more than stimulated the development of physical theories. But the bare existence of an alternative for the previously superior Aristotelian philosophy worked as a liberating force.

If alternative, competitive philosophies are available, each claiming exclusivity and universality, a normal scientist can afford to ignore them both. During the seventeenth century the separation between science and philosophy made a start. This separation was completed only after circa 1800, after the work of Immanuel Kant. The emancipation of science from philosophy was strongly furthered both by Descartes and by Kant, contrary to their intentions.

But ultimately, it was not Cartesianism that became the philosophy of Copernicanism, but experimental philosophy. Not Descartes, but Newton wrought the synthesis of all anti-Aristotelian currents (7.1). This synthesis has philosophical aspects and backgrounds, yet has not a philosophical but a scientific character. The Newtonian synthesis confirmed the emancipation of science from philosophy. Since Newton, the credibility of a scientific theory is no longer determined by philosophical arguments, but by its agreement with other scientific results, in particular due to instrumental observations and skilful experiments.

The Newtonian synthesis constituted moderate Enlightenment, represented by John Locke, Jean Le Rond d’Alembert, François-Marie Voltaire, David Hume, and Immanuel Kant.[44] Besides Newtonianism, physicists in the eighteenth and nineteenth centuries developed several world views guiding their investigations. But they all accepted Newton’s results.

Crisis

We now arrive at an important point of difference with Kuhn’s views. According to Kuhn a crisis arises in a field of science if scientists working within the context of a certain paradigm are confronted with an increasing number of anomalies, of problems not to be solved according to the methods of the paradigm.[45]

For example, Kuhn points to the crisis preceding the publication of Copernicus’ Revolutionibus (1543),[46] but this example does not tally with the historical facts.[47] Before Copernicus all experts considered Ptolemy’s theory quite satisfactory.[48] Copernicus himself was the first to signalize a situation of crisis, but he was hardly unbiased. He had an obvious interest in putting the old theory in an unfavourable light. In the introduction to his Tabulae prutenicae (Prussian tables, 1551), based on Copernicus’ calculations, Erasmus Reinhold stated: ‘The science of celestial motions was nearly in ruins; the studies and works of (Copernicus) have restored it.’ Nevertheless he was not a Copernican.[49] His updated Prussian tables were better than the outdated Alfonsine tables (thirteenth century), but this was hardly due to the introduction of a heliostatic model, and Tycho Brahe found both unsatisfactory.

Also the publication of Newton’s Principia was not preceded by a crisis. Except for the conservatives, who held fast to Aristotelian physics, most educated people considered Cartesian physics satisfactory, promising, and acceptable. The criticism of Cartesian physics was primarily levelled by Newton himself, who again had an interest in putting his competitor in an unfavourable light.

Contrary to Kuhn, I argue that a crisis is often an effect of the introduction of a new fundamental theory, rather than its cause, because a new theory in general contradicts the accepted presuppositions.[50] A new theory makes it necessary to adapt the presuppositions. This evokes resistance, and as long as the presuppositions are not adjusted, their adherents are in conflict with those of the new theory.

This is obviously the case with Copernicus’ theory, contradicting the most important presuppositions of his time. Hence, the initial response to Copernicus’ theory was negative. The first to accept Copernicanism already doubted the Aristotelian presuppositions, but the crisis became a fact only when Galileo’s astronomical discoveries (1609-1610) made the new theory a serious threat to Aristotelian philosophy. The great debate concerning the merits of the Ptolemaic system did not take place before 1543, as Kuhn wants to make us believe, but between 1610 and 1640.

Similarly, Kepler caused a crisis within Copernicanism, by dropping the idea of uniform circular motion in favour of non-uniform elliptical motion. Also his results were ignored initially, but when accepted, they gave occasion to a fundamental change of the presuppositions of planetary theory, for example, the introduction of the concept of a force as a cause  of change of motion (3.5).

Newton’s theory of gravitation was not the effect of a crisis, but its cause. This crisis did not occur in the theory of gravitation, but in its presuppositions, mechanics and mathematics. In mechanics, the principle of action by contact in a plenum had to be replaced by action at a distance in a void. Newton’s theory led to the introduction of integral and differential calculus, causing a crisis in mathematics. In order to avoid this crisis, Newton presented the proofs in his Principia in a geometric way. Mathematicians have struggled with the foundations of the calculus until the nineteenth century.

Even the crisis leading to the disbandment of the Pythagorean brotherhood was caused by the theory leading to the Pythagorean theorem (3.1).

The uniqueness of the Copernican revolution

Without exaggeration, it may be said that Copernicanism caused so many crises that once the battle was over, nearly every presupposition was drastically changed. It is called a revolution because it resulted in the abandonment of virtually all preceding presuppositions.[51] It was to overthrow the entire scientific and philosophical establishment. Hardly anything of Aristotelian physics survived the Copernican revolution, and about the only reason to discuss Aristotelian physics (except for its own sake) may be to understand the rise of early modern science.

In the Middle Ages, scientific activity had an overwhelmingly logical character. Independent research was rarely executed, and scholars restricted themselves to a logical analysis of the ancient arguments. The prevalent impression was that everything worthwhile to know about nature could be found in the tradition. It is an essential part of the Renaissance spirit to become self-reliant, to start explorations independent of the tradition. The adventurous voyages of discovery in the fifteenth and sixteenth centuries have contributed much to undermine the trust in ancient views. Many unheard-of things were discovered, and many time-honoured insights turned out to be wrong. Hence, people became more and more critical of medieval philosophy and science.

From their writings it becomes clear that the Renaissance scientists were well aware of the novelty of their works. The word new (nova) became a platitude – see Tycho Brahe’s De stella nova (1573); Gilbert’s De magnete … nova physiologia (1600); Kepler’s Astronomia nova (1609); Bacon’s Novum organum (1620); Galileo’s Discourses on two new sciences (1638); and many more. Descartes wanted to make a fresh start in science and philosophy. Newton set out to develop the results of his predecessors, standing on their shoulders. All this shows a new awareness, the awareness that science is a historical endeavour, a force in history.

Since the Copernican revolution, science is no longer static, but dynamic. It is directed to a continuous and progressive opening up of the lawfulness of reality. In physics and astronomy, neither before nor afterwards did a comparable break with the past occur. Even relativity theory and quantum theory are based on the results of the past, and are in part its consequences. The architects of twentieth-century physics always stressed the continuity with nineteenth-century classical physics. In this sense, the Copernican revolution is quite unique in physics.


[1] Ravetz 1971, 72; Laudan 1977, 121ff

[2] Nagel 1961, 32.

[3] Bunge 1967b, II, 402

[4] Bunge 1967a, 1967c.

[5] Bunge 1967b, I, 402.

[6] Bunge 1967b, I, 226.

[7] Bunge 1967b, I, 165.

[8] Popper 1963, chapter 10; Kuhn 1962.

[9] Popper 1959, 19, 70, 241-244, 288-289.

[10] Popper 1963, 312.

[11] Popper 1963, 313

[12] Kuhn 1962, chapters 2-4; Kuhn 1970.

[13] Kuhn 1962, 10, 23.

[14] Masterman 1970; Toulmin 1972, 98-130;.Suppe 1973, 135-151, 643-649.

[15] Kuhn 1962, 175.

[16] Kuhn 1962, 20, 187-191.

[17] Kuhn 1962, chapter 4.

[18] Popper 1970; Watkins 1970; see Feyerabend 1975, chapter 3; Lakatos 1978, I, 68-69.

[19] Kuhn 1962, 10.

[20] Kuhn 1962, 37.

[21] Galileo 1632, 261, 396.

[22] Laudan 1977, 17, 18, 26-30.

[23] On Huygens, see Dijksterhuis 1950, 405-419, 503-509; Bos et al. (eds.) 1980; Andriesse 1993; Gaukroger 2006, 420-430.

[24] Hacking 1976.

[25] Van Helden 1980, 150.

[26] A reliable chronometer suited for ships had to wait till John Harrison, circa 1760, constructed one.

[27] Popper 1963, 222. Laudan 1977, 108-109.

[28] Koestler 1959, 401.

[29] Copernicus 1512, 58, 63; 1543, 42-46 (I, 7-9); Koyré 1939, 132-135.

[30] Copernicus 1543, 43-46 (I, 8);.

[31] Galileo 1632, first day. Koyré 1939, 133.

[32] Galileo 1610, 42-45. Galileo 1632, 67-69, 91-99.

[33] Cohen 1971, chapter 3-5; Westfall 1980, 402-404.

[34] Newton 1687, 415-416.

[35] Newton 1687, 478-484.

[36] Popper 1972, 170.

[37] Popper 1959, 269; 1963, 36; Lakatos 1978, I, 38-39; Grünbaum 1976.

[38] Kuhn 1962.

[39] Burtt 1924, 36-38; Feyerabend 1964; 1970; Clavelin 1968, 58-60; Galileo 1632, 56-57.

[40] Gaukroger 2006, 113-116.

[41] Koyré 1939, 136.

[42] Dreyer 1906, 345-360; Hall 1963, 18-20.

[43] Gaukroger 2006, 289.

[44] Gillispie 1960, chapter 5; Israel 2001; 2006; 2011; Gaukroger 2010, chapter 4.

[45] Kuhn 1962, chapter 6-8.

[46] Kuhn 1962, 68-69.

[47] Rosen 1984, 131-132; see the discussion in Beer and Strand (eds.) 1975, session 3, in particular Gingerich.

[48] Dijksterhuis 1950, 325.

[49] Koyré 1961, 94; Duhem 1908, 70-74.

[50] Laudan 1977, 14ff, 45ff, 88.

[51] Cohen 1985.


Chapter 6

Problem shifts in

the history of optics

6.1. Medieval theories of vision

Chapter 5 argued that theories are instruments for solving problems and for generating new problems, contributing to the growth of knowledge. In the course of history, this means that new problems replace older ones. The attention of scientists shifts from one problem situation to another. This phenomenon will be investigated in the field of optics,[1] arguing that craftsmen and experimental physicists decisively influenced the problem shifts in optics, even more than the philosophies discussed so far.

Optics played an important part in the Copernican drama, but also  crossed its temporal boundaries both to its past and to its future. Section 6.1 deals with ancient and medieval theories of vision, culminating in Kepler’s explanation of the working of the human eye. Section 6.2 describes the attempts of seventeenth-century mechanists to explain the empirically known laws of reflection and refraction. Section 6.3 discusses Newton’s Opticks (1704) as a manifesto of experimental philosophy. He suggested that light is a stream of particles. In the nineteenth century, wave optics became prominent (6.4). At the end of that century, the experimental study of the interaction of matter and light made clear that light has a dual character, wave-like and particle-like (6.5). It marks the transition from classical to modern physics. Meanwhile optics became more and more interlaced with other parts of physics.

Preceding modern twentieth-century physics and succeeding medieval and Renaissance science, classical physics spans the seventeenth to the nineteenth centuries. It has its roots in Greek, Arab, and medieval European philosophy. The question of why and how early modern science arose in Western Europe and not anywhere else has been the subject of many discussions.[2] An important factor, amplified in the present chapter, was the rise of European medieval and Renaissance crafts and arts, providing a fruitful environment for a critical and innovative approach to science, and leading to the merger of theory and the crafts into experimental physics. The transition of scientific knowledge from the Hellenic through the Arab to Latin culture will be illustrated by the history of the problem of natural and artificial seeing, leading to classical and ultimately modern optics. The development of classical physics was accompanied by an incessant struggle between instrumentalism, Descartes’ mechanism, Newton’s experimental philosophy, and occasionally by other world views. Paradoxically, in the end classical physics did not fail in theoretical mechanics but in experimental optics, yet the crisis did not shatter experimental philosophy but mechanism. The history of optics appears to provide a short version of the history of physics.

Intromission and extramission theories of vision

About 800 Muslim scholars became interested in Greek and Roman science, in particular in optics. They translated ancient manuscripts into Arabic, using earlier transcripts from Greek into Syrian or Persian, made by Nestorian Christians.[3] Scholars in Baghdad, Cairo, Morocco, and Spain studied the Greek-Hellenic heritage profoundly, writing many commentaries. About 1150 in Spain and in Southern Italy, Europeans came into contact with the Arabic culture. This led to a second wave of translations, from Arabic, sometimes via Hebrew or Catalan, into Latin.[4] At Toledo, Gerald of Cremona translated at least ninety books, including Ptolemy’s Almagest (2.1).

The translated manuscripts contained two major theories of seeing. The intromission theory assumed that the action of light starts at the observed object. Each visible object causes a change in its environment. The image formed at the object moves to the eye. This theory, more physical than mathematical, requires a medium for seeing, explaining why some substances are transparent and others are not, and why one has to ignite a lamp in order to see in the dark. For the superlunary spheres, this medium was called the ether.

In the extramission theory the action of light starts in the eye, according to the metaphor of a blind person who with his stick exerts an action and experiences a reaction. This theory, advanced by Euclid, Ptolemy, and other Platonic scholars working in Alexandria, explained geometricalaspects of light: its rectilinear propagation, the formation of shadows, reflection, and perspective. Ptolemy’s law of refraction lasted until in the seventeenth century it was replaced by the sine law.

About 850 in Baghdad, Hunayn ben Ishâq pursued Claudius Galen’s medical work on the structure and functioning of the eye, whereas Al-Kindi accepted the extramission theory, and commented mostly on the books of Euclid and Ptolemy. Al-Kindi criticised the intromission theory for its negligence of geometrical analysis. It assumed that each object forms an image, moving to and absorbed by the eye. Al-Kindi observed that this theory fails to make clear how an image of a large object (for instance, a mountain) shrinks sufficiently before it is able to enter the eye through its quite small pupil. Moreover, he stated that the intromission theory cannot explain perspective seeing. Observing a circular object like a coin from aside, one sees a line or an ellipse. According to the intromission theory one ought to observe a circle, because the image of a circle is a circle, said Al-Kindi.

Whatever their value, these arguments were taken seriously by later scholars, among them Abu Ali al-Hasan ibn al-Hasan ibn al-Haytham. He was a Persian living in Cairo, a medical doctor, mathematician, and astronomer, who in Europe became known as Alhacen or Alhazen.

Alhazen’s theory of vision

Between 1011 and 1021 Alhazen wrote Kitab al-Manazir (Book of optics), in which he changed the intromission theory radically and arrived at a synthesis of all available theories of seeing. Alhazen’s predecessors had assumed that in the intromission theory each observed object sends an image as a coherent whole to the eye. Instead, Alhazen conjectured that each point on the surface of a visible object emits light, in all directions, and independent of the other points on the surface of the same object. The emission or reflection of light is therefore incoherent, without any coherence between the different points of the object. This theory could explain the typical geometrical aspects of seeing equally satisfactory as the extramission theory by Euclid, Ptolemy, and Al-Kindi, and it avoided Al-Kindi’s objections to the old intromission theory.

Alhazen supposed that the imaging of the object to the eye would occur point for point. Each point on the object is geometrically connected to one point in the image on the retina, by a single light ray. This would be correct if the pupil, the opening of the eye, were very small with respect to the rest of the eye, as is the case in a pinhole camera, with which Alhazen experimented. Because in fact the pupil is not very small, the rays propagating from one point on the object arriving in the eye would form an extended blot on the retina instead of a point.

For this problem Alhazen suggested the following solution. Of all light rays (coming from one point) only one will arrive at the eye ball perpendicularly, proceeding to the retina without refraction. All other light rays are refracted and thereby weakened so much that these do not reach the retina. Alhazen supposed that the perpendicularly incoming light ray creates a point-like image. This light is transferred to the brain through the optic nerve. However, Alhazen could not make clear why the obliquely incoming rays are ineffective. Nevertheless his theory of seeing was a brilliant achievement, unsurpassed during six centuries. The insight that each object in each point emits or reflects light in all directions, and that the formation of its image takes place in the eye, required an exceptional level of abstraction.

By blending the two major ancient theories of vision, Alhazen geometrized the intromission theory. This is interesting, because the mathematization of science is commonly considered a hallmark of Renaissance and classical physics. In optics Alhazen achieved it in the eleventh century, and soon it became part of the standard scholastic curriculum of Islamic and Western schools.

European optics

About 1200 Alhazen’s book was translated into Latin as Thesaurus opticae. One of the first Western scholars occupied with optics, Roger Bacon reviewed it in Opus maius (1268). He was followed by Erazmus Witelo (Vitellius) in Perspectiva (about 1275), an influential optical textbook in the Middle Ages. It was read by Johannes Kepler, who in 1600 experimented with a camera obscura, a pinhole camera as described by Alhazen, Bacon, and Witelo, but recently improved by Giovanni della Porta, who applied a lens instead of a very small opening. It inspired Kepler to find the theory of image formation in the eye that is still accepted as correct. (Della Porta may have found the same solution, independent of Kepler. He also claimed the invention of the telescope.)

Kepler observed that the foremost part of the eye, immediately after the pupil, acts as a lens. The diverging light rays, falling on the pupil coming from a point on the object, are converged by the eye lens to a single point on the retina. In this way the point-for-point imaging from the object to the retina is achieved. This provided Kepler with the definitive solution of Alhazen’s problem. Moreover he explained the corrective function of eye glasses.

Kepler did not believe that his discovery contributed much to the theory. He modestly called his work Additions to the optics of Witelo.[5] It is indeed remarkable that the elements of Kepler’s solution were ages before available in the manuscripts of Alhazen, Bacon, and Witelo. This may be illustrated by two pictures in Bacon’s book, appearing in two different chapters, and not connected. The first picture is recognizable as Alhazen’s presentation of the motion of the light rays in the eye. The second picture does not regard the eye, but describes a glass bottle or a sphere filled with water, acting as a burning glass.[6] It shows how the light rays coming from a source move through the burning glass, such that the initially diverging light rays converge to a focus. Like Robert Grosseteste before him, Roger Bacon used this experiment to investigate the emergence of the rainbow. (Robert Grosseteste, first in Oxford, where he may have been Bacon’s tutor, later bishop of Lincoln, was mainly interested in the mystical aspects of light.) The drawing proves a clear insight into the refraction to the normal (the perpendicular to the glass’ surface), when the light enters the burning glass, and away from the normal if the light exits the glass. Kepler did nothing but positioning this burning glass in the front of the eye.

In Harmonice mundi (1619) Kepler reflected on his theory of 1604, and he admitted that his explanation did not reach further than the retina. The question remains, he wrote, how the inverse image formed on the retina is processed via opaque parts of the body to the inside of the soul, and he felt quite embarrassed for not having an answer. Kepler distinguished between the physical problem of image formation (which he solved), and the physiological problem of seeing (which he left aside, not being a physiologist). He separated physics from physiology, which Alhazen had united.

The failure of European medieval optics

Why had Kepler’s fairly obvious step not been made by Bacon, Witelo, or any other scholar occupied with optics between 1050 and 1600? An explanation of this remarkable historical phenomenon may start with pointing to the relatively low level of the European culture before circa 1100. The invasion of the ancient heritage translated into Latin together with Arabic comments overran the West. The predominant impression was that all possible knowledge was available in these books, and that nothing could be added. The light from the east (ex oriente lux) shining in Witelo’s transcription of Alhazen’s work blinded its European readers, such that they did not see the obvious solutions of old and new problems, if the authorities did not mention these. The fact that Alhazen did not suggest Kepler’s solution of his problem inhibited the university students of his work from finding it independently. Since Alhazen, the science of optics made hardly any progress until the sixteenth century. Exceptions were Al-Shirazi in Persia and Dietrich of Freiberg in Germany, who about 1400 independently explained the rainbow qualitatively from the combined reflection and refraction of light in drops of water.

A new impulse had to come from outside the universities, from the crafts. Once again this was hampered by the blinding effect of the eastern light. With the ancient culture a Greek view of society entered Western-Europe, in particular the aversion of the crafts. Usually, craftsmen were slaves, and even if they were free, their social status was low. Philosophy was the domain of wealthy people who could afford to do nothing but discuss theories. Although this view is at variance with Christian doctrines, it became common among scholars during the Middle Ages. Theologians, philosophers, and other clerics believed to deserve a much higher standing than artisans. Scholastic science was entirely theoretical, consisting of studying, discussing, and commenting books. Experiments like those of Grosseteste and Bacon were rare and even suspect. Although Roger Bacon was aware of the shortcomings of a one-sided stress on theoretical considerations, he could not withdraw from the common practice.

Influence of the crafts

Shortly after Roger Bacon wrote Opus Maius, reading glasses became used in Europe. Invented by an unknown artisan, no scholar before circa 1550 paid it any attention, though Grosseteste and Bacon discussed magnifying glasses. The connection between the eye glass and the lens in the eye seems to be even more obvious, if one imagines that several scholars had to use glasses in order to read their manuscripts on optics. The gap between scholars and craftsmen was so large, that the scholars besides Bacon’s burning glass overlooked the eye glass as a means to explain the formation of imaging in the eye. During the Middle Ages and the Renaissance, many books were written about the arts and crafts, but these escaped the attention of the university scholars.[7] Only after circa 1550, Renaissance scholars like Della Porta and Kepler, becoming interested in the arts, started to pay attention to lenses.

A positive eye glass has a convergent effect, and a negative a divergent one, but the pretty rough medieval lenses were hardly fit to form a reliable image. Eye glasses were probably accidental by-products of the windows applied in churches and later in houses. By trial and error, some pieces of glass were found to be good enough to correct near or far-sightedness. After discovering that different people needed different glasses, artisans learned how to grind these to make high-quality lenses on demand. Circa 1550 Giovanni della Porta could use a positive lens to improve the pinhole camera. Shortly after 1600 lenses were good enough for craftsmen at Middelburg to combine two lenses into a spyglass. Hans Lippershey from Middelburg is now considered the inventor of the telescope. In 1608, the Dutch Staten-Generaal refused to grant him a patent, however, after Jacob Metius and Sacharias Jansen also claimed the invention. In 1609 Galileo used it as a telescope for astronomical observations.[8] He discovered new stars, mountains on the moon, and Jupiter’s satellites, besides Venus’ phases and the sunspots. After publishing Sidereus nuncius (1610), Galileo used these discoveries in his propaganda for the Copernican theory.

In these circumstances, Kepler could do what Bacon and Witelo neglected. Even if lenses would have been better, and if the gap between scholars and craftsmen would have been smaller, it would have been difficult for medieval scholars to connect the functioning of eyes with eye glasses. The Greek natural philosophers sharply distinguished between nature and artefacts. Physics was concerned with the nature (physis) of things, their essence, and their natural form and functioning. It could only be understood by contemplation, by observation and thought. Mechanical craft was imitation of nature, artificial. The study of the eye belonged to physics, but magnifying glasses and eye glasses as artificial products of the crafts belonged to mechanics. Before the rise of mechanical philosophy in the seventeenth century, nobody would consider explaining a natural system like the eye, or a natural activity like seeing, with the help of an artificial thing like a lens or a camera obscura.

Kepler’s kind of optics differed from the medieval one, not because of better data, new experiments or geometrical theories, but because it was stimulated by a new world view. In ancient and medieval philosophy, the cosmos was considered a whole, and people were part of the cosmos. In the Renaissance a new view of the world and of mankind emerged, in which free and creative men were opposed to nature, which they desired and deserved to rule in thought and in action. Before Kepler optics was the theory of natural seeingand of the functioning of the eye. Kepler devised a theory of instrumental seeing, of the functioning of lenses, telescopes, and microscopes. The process of image formation belongs to objective nature, not to subjective humanity. Starting with Kepler, classical physics changed optics more than scholarly science at the universities did since Alhazen.

Optics and astronomy

Optical instruments were invented in order to see better than would be possible otherwise. In a natural way, one cannot see the phases of Venus. Turning the recently invented spyglass to the heavens, Galileo saw naturally an image, but artificially the phases of Venus. Artificial seeing extends natural seeing, by making Jupiter’s moons visible. Artificial seeing is not contrary to natural seeing, but depends on it. Yet artificial seeing may be controversial, as Galileo experienced, when his colleagues refused to accept his discoveries. Artificial seeing must be understood and learned, and requires an explanation. After finding the solution of the problem of eyesight in 1604, Kepler explained in 1611 Galileo’s telescope, having a positive and a negative lens. He also invented an alternative, having two positive lenses, now known as Kepler’s telescope.

It is more than a coincidence that the telescope and the microscope were invented during the Copernican revolution. The new world view made these discoveries possible, and needed them all the same. The idea that instruments may be able to improve our experience was opposed by conservative Aristotelians, who heavily relied on common sense. Progressive Copernicans associated it with the new insight that theories can be instrumental in the exploration of reality.

Classical physics started with astronomy and optics. Like many astronomers before them, nearly all leading Copernicans were concerned with optics, studying refraction and reflection; mirrors, lenses, and prisms; telescopes and microscopes. Kepler, Galileo, Descartes, Fermat, Huygens, Hooke, and Newton, all contributed to the development of this field of science. They did not care very much about the distinction of science and the arts. In fact, many Copernicans were able craftsmen. Galileo built and sold scientific instruments. As a technical designer cooperating with several artisans, Huygens became famous because of the pendulum clock and the spring balance clock, as well as his improvements of the telescope (5.2). The relation of optics with astronomy is, of course, not accidental. Kepler treated optics as a presupposition of astronomy. Before Astronomia nova (1609), Kepler published Astronomia pars optica (1604). His Dioptrice (1611) reflected on Galileo’s astonishing astronomical discoveries with his telescope. Promoting the idea of the moving earth, the Copernicans argued against the view that the relative motion of the earth and the sun is merely a matter of optical perspective.

They rejected the Aristotelian distinction between celestial and terrestrial phenomena. According to Aristotle, the four elements are strictly confined to the sublunary spheres, and the celestial spheres consist of some other material, ether (Greek aither, shining) or quintessence, the fifth element. But if this distinction is so radical, how can a celestial body be seen? Is light a kind of matter, existing both in superlunary and sublunary spheres? Or is the luminiferous ether, the bearer of light, present in both regions? The problem of light has always been an embarrassment in Aristotelian physics,[9] even after its geometrization by Alhazen. Finding a mechanical solution to the question of the nature of light, the Copernicans thought to catch the Achilles tendon of the Aristotelian philosophy they opposed.

6.2. Geometrical optics

Since the invention of the telescope and Galileo’s astronomical discoveries, and since Kepler’s Dioptrice, many Copernicans became concerned with the construction and application of lenses. It induced an impressive problem shift in optics. Dioptrics deals with refraction of light, katoptrics or catoptrics with reflection. Together with diffraction and dispersion, discovered in the seventeenth century, this became geometrical optics. (Later, wave optics became known as physical optics.)

In Descartes’ philosophy, the propagation of light played a crucial part. He assumed that the sun and the stars were composed of the finest matter generating light. The same fine matter also filled the interstices of coarser kinds of matter and took care of the propagation of light. Unlike Galileo,[10] Beeckman, Fermat, and Huygens, he believed that light propagates instantly. Both Descartes’ physics and cosmology started from this clear and distinct idea, which he considered certainly true, beyond any doubt, a crucial element of his philosophy:

‘To my mind, it is so certain that if, by some impossibility, it were found guilty of being erroneous, I should be ready to acknowledge to you immediately that I know nothing in philosophy … if this lapse of time could be observed, then my whole philosophy would be completely upset.’[11]

Descartes contemplated the possibility of light consisting of rotating corpuscles in fine matter, relating the colours to different periods of rotation.

During the seventeenth century at least four theories of the refraction of light were proposed, all except one based on the mechanist program of explaining motion by motion (3.3-3.4).[12] Long before, the problem of reflection by a plane mirror was quite satisfactorily solved. The angle of reflection (as measured with respect to the perpendicular on the surface) equals the incident angle. Starting from this, one could determine the focal points of concave spherical and parabolic mirrors. Ptolemy investigated refraction empirically, presenting tables for the angle of refraction as depending on the incident. Independently, Thomas Harriot, Willebrord Snel, and René Descartes discovered  the sine law for refraction, finding that different colours correspond to different refraction indices. The sine of the refraction angle divided by the sine of the incident angle is a constant for any pair of media separated by a flat surface, as some able experimenters soon confirmed beyond reasonable doubt. Harriot and Snel did not publish their results. Snel’s manuscript was read by Huygens and probably by Beeckman, Descartes and others, but is lost since the end of the seventeenth century.

Descartes’ sine law

Descartes made his discovery circa 1620 and published it in La dioptrice (1637), together with Alhazen’s and Kepler’s theory of seeing, though without mentioning their names. In Les météores (1637) he presented a quantitative explanation of the rainbow, using the sine law for his calculations. Descartes not only contributed to the physics of seeing (like Kepler before him), but also to its physiology and psychology. He paid much attention to light because of his view that seeing is the most important way of perceiving,[13] The relation between object and observing subject is mediated by light.

For the time being he set aside his firm conviction that light moves instantaneously. Light as a pressure is a tendency to motion, and therefore obeys the same laws of inertial motion as a tennis ball does. He assumed that the speed of light in various media is different, but that – in refraction as in reflection – its component in the direction of the surface does not change. From this he proved the sine law, the constant index of refraction being the proportion of the speeds in the two media. As a consequence, he found to his surprise that in a dense medium like water, the speed of light would be higher than in a rare medium like air. He also explained the phenomenon of total reflection, occurring when the light ray proceeding from a dense to a rare medium exceeds the maximum possible incident angle. In this case the index of refraction n>1, such that sin r=n.sin i<1 implies that the minimum value for i for total reflection is determined by sin i=1/n. Here i is the incident angle, r is the refraction angle.

Fermat’s principle of least time

Pierre de Fermat was very critical of Descartes’ method of proof. He believed that Heron’s method of least resistance, by which he proved the law of reflection, was a better one. Heron assumed that light follows the shortest path between two given points, which leads to the correct result only when the path of motion is entirely within one medium. In the case of refraction one has to make an assumption about the magnitude of the resistance light experiences in a medium. Fermat believed that this would be inversely proportional to the speed of light, meaning that light travels between two given points in the shortest time. Like Galileo (3.3) he replaced covered distance by time passed. The application of this principle turned out to be quite difficult, but to his surprise he found the same sine law as Descartes.  However, in Fermat’s theory the speed of light in a dense medium is lower than that in a rare one, as one would expect.

Newton and the problem of colour

Artisans experimenting with lenses discovered that a lens having one or two spherical surfaces does not produce a perfect image. This became even worse when they combined two or more lenses in a telescope or microscope. Several attempts were made to solve this problem. Galileo restricted the aperture of the objective of his telescope in order to minimize spherical aberrations. Antoni van Leeuwenhoek preferred microscopes having one small lens, achieving a magnification of 270 with small aberrations. Descartes designed lenses with different non-spherical surfaces which would avoid spherical aberration, but these turned out to be extremely difficult to grind. Even then these lenses would still have had colour defects.

The question of whether lenses could be made without chromatic aberration led Isaac Newton to his experiments with prisms, concluding that the problem is unsolvable, because the index of refraction depends on colour. Therefore he designed a reflecting telescope, lacking the chromatic defects due to refraction. A model of this invention sent to the Royal Society in 1672 made him famous at one stroke, although only in 1789 William Herschel constructed a reflecting telescope that could compete with refractive ones. Meanwhile, by combining several lenses with different refraction indices, opticians solved the problem of chromatic aberration approximately.

In his first published contribution to natural science, shortly after his invention of the reflecting telescope, Newton reported his experiments on colours, without bothering about the nature or essence of light.[14] By studying the effect of one or two prisms on a ray of white light, he succeeded in proving to his own satisfaction that the latter is a mixture composed of coloured rays.

In his experiments Newton analysed light as a bundle of rays.[15] Only later he suggested that each ray is an emission of particles.[16] At refraction each ray gains or loses momentum by an impulse perpendicular to the surface. This is a force acting during a short time, maybe a constant force acting only in a very thin layer between the two media. It increases or decreases the component of the velocity perpendicular to the surface, but would not influence the parallel component. The change of momentum would be different for different colours, explaining the spectrum caused by a prism. By introducing force as a new principle of explanation, Newton improved Descartes’ theory, confirming his results. This became the standard geometric theory of light until the beginning of the nineteenth century. The fact that a ray of incident light is partly reflected and partly refracted Newton tried to explain from his theory of periodic fits of easy reflexion and fits of easy transmission, derived from his experiments on light in thin layers.[17] However, his theory of light as exposed in Opticks was not mechanical.

6.3. Newton’s Opticks and

Huygens’ Traité de la lumière

Although preceded by Francis Bacon and Robert Boyle, who stressed the heuristic function of experiments to discover natural laws,[18] in experimental philosophy Newton’s views, his theories and his emphasis on experiments were dominant. ‘Newton was among the most skillful experimental scientists in history. This is less widely recognized not merely because we tend to celebrate theoreticians, and not experimenters, but also because such a large fraction of Newton’s experimental effort is not well known.’[19] Whereas Principia marks both the end of the Copernican era and the beginning of rational mechanics, the much more widely read Opticks is more characteristic for experimental philosophy. It is a description, if not a prescript, of experimental interactive research. From the start till the end it emphasizes:

‘My Design in this Book is not to explain the Proper­ties of Light by Hypothe­ses, but to propose and prove them by Reason and Experiments.’[20] ‘For Hypotheses are not to be regarded in experimental Philosophy’.[21]

This is a manifesto against Descartes’ mechanical philosophy, which considered optics to be part of geometry and mechanics, and was not shy of hypotheses. Nevertheless, Opticks contains a treasure of hypotheses guiding many scientists to new experiments and measurements according to Newtonian standards. Besides rational mechanics, it became the program of classical physics for more than a century. After the acceptance of the wave theory of light, Opticks became discredited, because Newton had propagated a corpuscle theory, even if Thomas Young testified that his path breaking views on wave optics were indebted to Newton’s work.[22]

Opticks consists of three books, containing several problem shifts.[23] The first book concerns reflection, refraction, and the refractive properties of colours. The second treats the effects of light in thin transparent bodies and the colours of bodies.

The third book has two parts. First, Newton investigated and discussed the diffraction of light discovered circa 1650 by Francesco Grimaldi, showing that light is indeed inflected by a pinhole or a sharp edge, but apparently in the opposite direction than was expected by the wave theory.[24] Yet it became the main argument for the nineteenth-century wave theory of light. Newton was critical of any such theory, because light does not bend along a screen. Newton studied reflection and refraction in thin layers,[25] in particular the rings, discovered by Robert Hooke but named after Newton. In order to explain these he assumed that the corpuscles of light differ in having periodic easy fits of transmission or reflection.

The second part of book III starts with some observations on the diffraction of light, but then Newton writes:

‘When I made the foregoing Observations, I design’d to repeat most of them with more care and exactness, and to make new ones for determining the manner how the Rays of Light are bent in their passage by Bodies, for making the Fringes of Colours with the dark lines between them. But I was then interrupted, and cannot now think of taking these things into farther Consideration. And since I have not finish’d this part of my Design, I shall conclude with proposing only some Queries, in order to a farther search to be made by others.’[26]

The first English edition (1704) contained sixteen short queries. The Latin translation (1706) added seven more (now numbered 25-31), and the second English edition (1717-1718) inserted another eight, making it thirty-one queries in all.[27] In query 28 Newton rejected any wave theory because it could not explain the formation of sharp shadows. Waves like sound or on the surface of water are able to bend around a corner, light is not. Though mostly dealing with light, the queries also concern other subjects. The now final and longest query reports on Newton’s chemical experiments.[28] During the eighteenth century, in particular English and Dutch Newtonians were strongly inspired by these queries.

Together with Newton’s unpublished papers and letters, many of which only became known during the twentieth century, both Principia and the queries in Opticks testify to his interest in almost all fields of scientific investigation of his days. But the first two books of Opticks exclusively deal with theories and experiments in the field of optics. In his experiments Newton isolated a part of reality, in order to control states, events, and processes, such that these remain constant or change in a determined way. Each experiment manipulates and idealises reality.[29] When experimenting with prisms in a darkened room, Newton isolated a ray of light propagating through one or two prisms in order to investigate the refrangibility of various colours. External influences were excluded as much as possible, or ignored as being irrelevant. The chemical composition and the shape of the prism were not natural but carefully produced. Even the analysis of each experiment and its report was schematic and idealised, though Newton took care to make his description transparent such that other investigators were able to repeat his experiments (in which they did not always succeed). He concluded that white light is a heterogeneous mixture of rays corresponding to different primary colours, each having its own index of refraction. He maintained that this followed from his experiments, not from any theory. He did not want to say anything more about the nature of colours, for want of experimentally testable arguments.

Classical physicists usually avoided to call light to be coloured. They assumed that colour is a perceived property of light, a secondary property, but they also distinguished between primary colours as produced by a prism, and secondary colours being mixtures of primary colours. Thomas Young discovered that the human eye has three different receptacles for light, sensitive to different parts of the visible spectrum. Newton distinguished the rays of light manufactured by a prism by their refrangibility; Young, Fresnel, and all nineteenth-century scientists by their wave length; and twentieth-century physicists by their frequency.

Mechanist critique

The mechanists Robert Hooke and Christiaan Huygens praised Newton’s experimental skill, but did not accept his interpretation, because it was counter-intuitive and not mechanical. They reproached Newton for not providing a mechanical explanation of colour. They did not believe that his experiments proved that white light is a heterogeneous mixture of coloured rays. Like Robert Hooke, Huygens supposed that white light could be composed from blue and red light only, but that sunlight is homogeneous. Although Hooke, who was the Royal Society’s curator for experiments, considered himself the outstanding expert on the science of colours, and Newton admitted to have learned a few things from Hooke’s Micrographia (1665), he recognized Hooke’s objections against his views to be wide of the mark and incompetent.

In his Traité de la lumière (1690), which Newton extensively discussed in Opticks, Huygens presented a mechanical theory of light propagation. He considered light like sound to consist of a succession of pressures or pulses (not periodic waves) in a continuous medium.[30] Contrary to Descartes, he supposed the subtle matter bearing light to be elastic. He stated that the propagation of light could be explained by the combined motion of secondary pulses. He demonstrated that the reflection and refraction of light can be explained from assuming that light pulses have a different speed in various media. Like Fermat but contrary to Descartes and Newton, he found that the speed of light in a dense medium like glass or water is lower than in a rare medium like air.

From a careful measurement of the periods of Jupiter’s moons, Ole Rømer (1676) discovered that the measured period of revolution of these moons depends on Jupiter’s motion with respect to the earth, a kind of Doppler-effect. Rømer concluded that the speed of light is not infinite, as Descartes had stated. From this result, which Newton used in Principia, Huygens estimated the value of the speed of light for the first time in history, at about three-quarters of the present value.

In 1850, when the wave theory of light was generally accepted, Jean Foucault compared experimentally the speed of light in water and in air. The speed of light in water turned out to be smaller than in air, as predicted both by Huygens’ pulse theory and by Fermat’s principle of least time, and contradicting Descartes’ and Newton’s explanations of refraction. Like the sine law, the determination of the speed of light in different media was not as crucial as is often believed.

6.4. The wave theory of light

The next major problem shift in optics occurred during the first half of the nineteenth century. A century after Opticks was published, Thomas Young studied it carefully, developing and publishing in a few years a periodic wave theory of light, sustained by some clever experiments.[31] His paper Experiments and calculations relative to physical optics was based on Huygens’ principle.[32] When a moving wave front reaches some point in space, this becomes a secondary source of a spherical wave. Huygens had rejected the periodic character of his waves.[33] Young conjectured that the various secondary waves having the same frequency and amplitude but different phases interfere with each other, sometimes constructive (if they have the same phase), sometimes destructive (if they have different phases). This interference produces a new wave front, in which the secondary sources are coherent. In order to demonstrate this effect, Young needed two coherent sources of light, having exactly the same intensity, wave length, and phase. He produced these by shining a beam of light on a screen fitted with two small holes or narrow slits. On a second screen behind the first one, an interference pattern appeared, alternately light and dark. This pattern was more distinct when using monochromatic light. The geometrical proportions of the experimental set-up allowed Young to estimate the wave length of the used light. Next he returned to Newton’s rings and was able to measure the wave lengths of the rays used by Newton. His results testified both to the correctness of his theory and to the accuracy of Newton’s experiments.

Thomas Young was a gifted and imaginative experimenter, but he was a poor mathematician. He operated in the spirit of Newton’s experimental philosophy. Perhaps for this reason, Young’s views were largely ignored until Augustin Fresnel in 1818, more inspired by Huygens’ mechanism in Traité de la lumière than by Newton’s experimental Opticks, presented a mathematical wave theory. He showed that the bending of waves by an opaque object depends on their wave length. Because the wave length of light is much smaller than that of sound, light casts sharper shadows. French mathematical physicists like d’Alembert, Lagrange, Laplace, and Poisson had elaborated Newton’s theory of point masses moving under the influence of central forces. Accepting Newton’s corpuscular theory of light, they were not impressed by any attempt to describe the motion of light in terms of a wave in a continuum. In a contest organized by the Paris Académie des Sciences in 1818, Fresnel was criticised by Siméon Poisson, pointing out that his theory would have the absurd consequence that a circular opaque body placed in a beam of light would produce a light spot at the centre of its shadow. Almost immediately, François Arago performed an experiment to show that this was indeed the case, and Fresnel received his prize in 1819.

Being first of all an engineer, Fresnel invented the composite lens named after him, making possible intense beams of light to be used in light houses. He devised diffraction gratings, allowing of the measurement of the wave length of monochromatic light more precisely than is possible with Newton’s rings. Neither Young nor Fresnel could make clear why a light wave would not move backwards. After 1850, Gustav Kirchhoff showed with rather difficult mathematics that the backward secondary waves always cancel each other.

Polarization

Both Huygens and Newton investigated the double refraction in Iceland spar that splits a ray of incoming light into an ordinary and an extraordinary ray. (This was discovered by Erasmus Bartholin and published in 1669.). Huygens suggested that the ordinary ray (satisfying Snel’s law) corresponds to a spherical wave, and the extraordinary ray to an elliptical one.[34] He assumed two different media in the same space, but could not explain experiments with two crystals placed behind each other. Newton discussed Huygens’ experiments in queries 25 and 26, explaining them by assuming that the corpuscles of light have two sides and different fits of transmission and reflection.

After 1800 it became clear that these polarization properties, as they were now called, occurred in many crystals, and also in reflection at any surface separating two media like air and glass. Sometime between Young and Fresnel published their results, David Brewster used asymmetric crystals to investigate reflection and refraction. He found that when the reflected light beam is perpendicular to the refracted beam, the reflected beam is completely polarized, and the refracted beam partly in the opposite sense. The tangent of the corresponding angle of incidence, now called the Brewster angle, equals the index of refraction. These experiments showed that polarization is not a property of asymmetric crystals, as Huygens had believed, but of light, as Newton had assumed.

Initially, Young supposed that light waves would be longitudinal (the oscillations occurring in the direction of propagation of the wave), like sound in air, in fact like any mechanical wave in a fluid medium. Because of polarization, he also considered the possibility that light has a transversal component besides the prevailing longitudinal one. (Transversal mechanical waves occur in solids). Fresnel proposed that light waves are not longitudinal, but entirely transversal, accepting as an unexplained consequence that light is not a mechanical wave. A ray of light could be described as having two independent mutually rectangular components, both perpendicular to the direction of propagation, which could not interfere with each other. This was confirmed by experiments in which two crystals were used. The mutually perpendicular components can be separated, producing linearly polarized light, or combined such that the result is circularly polarized. Natural light contains all kinds of combinations.

The speed of light

This history of physical optics, as wave optics is often called, as an isolated field of science was finished by some celebrated experiments by Hippolyte Fizeau and Léon Foucault at Paris. Fizeau in 1849 and Foucault in 1862 determined the speed of light in laboratory experiments, improving astronomical measurements based on Rømer’s discovery of the varying period of Jupiter’s moons. (Fizeau found 313,000 km/sec, Foucault 298,000 + 500 km/sec. In 1877 Albert Michelson measured 300,140 km/sec. The modern value is 299,792.458 km/sec.)

Together they showed in 1849 that the speed of light in water is less than in air, thereby disproving Descartes’ and Newton’s theories of light. However, it did not prove that any corpuscular theory would be wrong, as became clear at the start of the twentieth century. In an ingenious experiment, François Arago showed that prismatic refraction of rays from the fixed stars is not influenced by the motion of the earth. Fresnel investigated whether the speed of light in different media would depend on the motion of the ether, assuming that the ether in a moving object would move together with the object. He did not solve this problem, but after his death it inspired Fizeau in 1859 to do an experiment to find this ether drag, by conducting a split beam of light through two oppositely moving streams of water. He found neither that the ether accompanies bodies in their motion, nor that bodies move through the ether. Something happened in between. It is the second anomaly in the theory of light waves moving in the ether. Earlier, Young objected to the complete transversality of light. Combined with the large speed of light, this would imply that the luminiferous ether would be a very rigid solid, evidently incompatible with the fact that it offers no observable friction to the planets.

The acceptance of the wave theory of light was felt as a blow to Newtonian physics. It was seen as a confirmation of Huygens’ mechanical philosophy. However, further progress in optics, reached by relating it to other fields of science, showed that light could in no way be considered a mechanical wave. Even if they contradicted Newton’s optical theory, Young, Brewster, and Fresnel were much more in line with Newton’s experimental philosophy than was initially appreciated.

A new view of light

The next problem shift occurred when James Clerk Maxwell published his electromagnetic field theory, implying that light is an electromagnetic wave (4.6). A varying magnetic field results in an electric field, and a varying electric field causes a magnetic field. If one field changes periodically, the other will do the same. Under suitable circumstances, this may lead to a self-continuing process, a wave motion, in which the magnetic field is always perpendicular to the electric one, and both are perpendicular to the wave’s direction of propagation. Maxwell derived a wave equation, and the theoretical speed of the wave in empty space turned out to be close to the speed of light (c). Maxwell’s calculation depended on the relation between electrostatic and magnetostatic units (4.3). At the time, both this relation and the speed of light were subject to rigorous measurements, initially leading to divergences of several percents. This controversy is one reason why many physicists hesitated to accept Maxwell’s theory of electromagnetism. Only after 1880 a consensus was reached between English, German, French, and American investigators.[35] Maxwell conclu­ded that light is an electromagnetic wave phenomenon, consistent with Fresnel’s assumption that light propagates as a transversal polarized wave. Simultaneously he demonstrated that electromagneti­c energy could be transported apart from matter, as illustrated by the sun’s radiation. This energy (E) turned out to be accompanied by a linear momentum (p=E/c), transferable to a body absorbing radiation. Hence he proved that energy and momentum may exist apart from matter and that the electro­magnetic field is a physical reality. As a consequence, Newton’s third law of motion was no longer valid for objects exchanging electromagnetic energy, whereas the laws of conservation of energy and linear momentum were upheld.

The possibility to transfer momentum was illustrated by the radiation mill, invented in 1873 by William Crookes. The mill consists of four metal vanes, each at one side reflecting, at the other side black. If the mill, suspended in a vacuum tube, is illuminated from one side it turns around. Reflecting light produces twice the amount of linear momentum transfer than absorbing light, giving rise to a net transfer of momentum. (Commercial radiation mills having a poor vacuum usually rotate in the opposite direction. Now the vanes are heated more at the black side than at the reflecting one, meaning that the heated air exerts a net pressure on the vanes.) The radiometer was widely discussed during the nineteenth century.[36]

According to Maxwell’s theory the frequency of electro­magneti­c waves can take on any value, not merely those of visible light. Hence Maxwell explained the occurrence of ultraviolet and infrared light, discovered at the start of the nineteenth century, but he did not predict the existence of electromagnetic radiation of larger or smaller wavelengths, and he did not attempt to research these experimentally, let alone to apply them.[37] Only later the whole electro­magnetic spectrum (from gamma radiation to long wave radio) was investigated and applied in communication technology, in medicine, in solid state physics, and in chemistry.

Maxwell’s theory indicated the end of the separation of electromagnetism and optics, but at a price. By the unification with optics, electromagnetism was confronted with a large number of problems: how light emerges; how it is reflected or refracted; how it propagates in transparent media; how it is absorbed; in short, how radiation interacts with matter. The experiments involved in this problem shift in optics (the final one to be discussed in this chapter) led to the end of classical theoretical physics.

6.5.  Emission and absorption of light

James Clerk Maxwell proved light to be an electromagnetic wave, but he did not specify its wave length or frequency. In the nineteenth-century discoveries on the interaction of electromagnetic radiation with matter this aspect became crucial. The emission spectrum of a light source may be discrete or continuous. A discrete spectrum displays the wavelengths of the emitted or absorbed lines of a gaseous chemical pure substance and their relative strengths. A continuous spectrum is a graph of the intensity (energy per second emitted by the source) as a func­tion of the wavelength or the frequency of the radiation. It turned out to be strongly dependent on the temperature of the source, usually a glowing solid. The investigation of both kinds of spectra caused new problem shifts in optics.

The visible part of the sun’s spectrum was investigated with the help of a prism or a diffraction grating, selecting waves according to their frequency or waveleng­th. (A diffraction grating is a piece of glass or metal on which a lattice of parallel scratches is engraved, at small but regular distances. The grating transfers or reflects light in various directions dependent on the frequency or wavelength. Henry Rowland was known for the quality of the gratings he produced.)[38]

Since the start of the nineteenth century, the spectrum included ultraviolet and infrared radiation.

Spectral lines

After the discovery by William Wollaston (1802), Joseph Fraunhofer investigated in 1814-1817 the sun’s spectrum, observing the discrete spectrum of black lines called after him, each indicating the absence of light of a certain wave length in the spectrum. He assigned these lines letters, such as the D-line in the yellow part of the spectrum.[39] (Later it turned out to be a double line). Besides these absorption lines, emission lines were discovered in the spectrum of gaseous light sources, which apparently only emit light at one or more sharply determined wave lengths. In Heidelberg, Robert Bunsen improved a gas burner (named after him but invented by Faraday) that emitted hardly any light of its own.[40] It allowed chemists and physicists to study the light emitted by heated elements and compounds. It turned out that heated compounds sharing a chemical element often emit the same spectral lines, presumably because the molecules are dissociated into atoms. This provided a new tool in analytical chemistry.

Bunsen’s colleague Gustav Kirchhoff established that the dark D-lines in the sun’s spectrum have the same wave length as the light emitted by hot sodium. He assumed that the former are caused by absorption of light originating from hot parts of the sun in its much cooler atmosphere, which he confirmed in combined astronomical and laboratory experiments. This discovery allowed astronomers to investigate the chemical composition of the sun and other celestial objects. They even found a new element in the sun’s atmosphere, aptly called helium (1868).

Reasoning from the law of conservation of energy, Kirchhoff derived in 1859 a simple but very important relation between the emis­sion and absorption lines. If a substance can emit light of a certain wavelength, it can also absorb light of the same wavelength. Otherwise equilibrium between matter and radiation would not occur.[41]

In this way one could identify the dark lines in the solar spectrum with emission lines in the laboratory, for instance the D-lines with sodium from kitchen salt dissociating in a hot flame. It turned out that more or less simple line spectra are characteris­tic for elements, and the usually much more complicated line spectra for compounds. Therefore the first could be ascribed to chemical atoms, the latter to molecules. Spectroscopy became an important tool both in analy­tical chemistry and in astronomy.[42] In spectroscopes, spectrometers, or spectrographs, radiation is respectively detected, measured, or regis­tered, with the help of the eye, a photographi­c emulsion, or with thermoelectric or photoelectric elements. Because in spectrography the wave length is measured, it became customary to indicate spectral lines by this property. The origin of the spectral lines remained a mystery until 1913, when Niels Bohr ascribed them to transitions between electronic energy levels in atoms, ions, or molecules.[43] Then it also became clear that the frequency (inversely proportional to wave length and proportional to energy, as Einstein would find) is more fundamental than wave length. In refraction the wave length of light changes, whereas its frequency remains the same.

Continuous spectrum

The spectrum of a glowing body like the filament wire in a bulb is called black body radiation.[44] Physicists call a surface black if it absorbs all incoming radiation, not only its visible part. According to Kirchhoff’s law, it can also emit radiation of every wave length. Kirchhoff observed that the best experimental approximation of a black surface is a hole in a hollow box, a furnace with a variable temperature, which blackened internal walls can be heated homogeneously. Kirchhoff argued that the spectrum of black radiation should be a universal function of the wavelength independent of the nature of the emitting source and only having different values at different temperatures. He challenged both theoretical and experimental physicists to determine this function. That turned out to be far from simple.

Experimentally it was found that the spectrum has a maximum, and that the corresponding values for wavelength and intensity as well as the shape of the curve depend only on temperature as required. Gustav Kirch­hoff and Wilhelm Wien demonstrated the shape of the curve to be the same for all sources of black light at the same temperature. Wien established how the maximum of the curve is displaced when the temperature of the source changes. The product of the absolute temperature with the wavelength corresponding to the maximum in the graph is constant (Wien’s displacement law, 1893). The total intensity of radiation appeared to be proportional to the fourth power of the absolute temperature, according to experiments by Joseph Stefan (1879). Five years later Ludwig Boltzmann derived this law theoretically from thermodynamics and Maxwell’s electro­magnetic theory, but the shape of the curve resisted any easy solution.

In 1896 Wien proposed an exponential relation, agreeing with the experimental results in the near in­frared part of the spectrum. A classical theory by John Raleigh and James Jeans (1900) calculated the intensity to be proportional to the absolute temperature and inversely proportional to the fourth power of the wavelength. This would mean that the intensity, instead of showing a maximum, would grow to infinity at decreasing wavelength. Later this would be called the ultraviolet catastrophe.

About the same time a break-through occurred in experimental techniques, allowing of accurate measurements of the intensity of the emitted light on both sides of the maximum, the near infrared (Friedrich Paschen) and the far infrared part of the spectrum (Otto Lummer and Ernst Pringsheim; Heinrich Rubens and Ferdinand Kurlbaum). These physicists worked in Berlin, where Max Planck was professor of theoretical physics.[45] After learning of their results (which did not agree with Wien’s exponential formula) Planck derived in several hours an alternative curve fitting formula agreeing with the expe­rimen­ts.

Max Planck

Planck had been involved many years in the problem of the spectrum of black radiation. Hence it was not very difficult to provide his formula with a theoretical foundation. Planck published his new theory on December 14, 1900 making use of statistical metho­ds developed by Boltzmann. He assumed that oscillating atoms or molecu­les form the source of the radiation. The emission or absorption of radiation with frequency f would occur in energy amounts of E=hf. Later to be called Planck’s constant, the proportionality constant h is universa­l, independent of the specific character of the atoms and molecules concerned. Planck ascribed the discrete character of the interaction between the electromagnetic field and matter to the material source of the radiation. The field itself remained continuous.

To the energy quanta Planck applied a statistical formu­la that he could not principally justify by any classical theory. The only justification of his ‘act of despair’, as he called it, was that it led to a result that agreed with recent measurements. It is now commonly accepted that it meant the end of classical physics, and the start of a new era, but that was not the view of his contemporaries. The study of black body radiation was a specialism, and initially Planck’s discovery had much less impact than Wilhelm Röntgen’s discovery of X-rays (1895).[46] Those who paid attention doubted Planck’s interpretation, and this state of affairs was maintained for at least ten years. Planck himself commented later on:

‘Usually, a new scientific truth will not become accepted because its opponents are convinced and declare themselves converted, but rather because the opponents die out, whereas a grown up generation is used to it from the start.’[47]

Yet, apart from its theoretical failure, it was a great success, even if this was not immediately recognized. It unified physics in a remarkable way. Planck observed that Boltzmann’s constant equals the gas constant (R) divided by Avogadro’s number (N). Hence, Planck’s theory of electromagnetic radiation confirmed the existence of ions and electrons, and enabled to calculate the number of molecules in a certain amount of gas some years before Einstein and Perrin did, based on Brownian motion.

Planck was able to calculate the value of some other constants of nature. The accuracy of his data was due to the painstaking experiments of his colleagues Paschen, Lummer, Pringsheim, Rubens, and Kurlbaum. The quality of their results as used by Planck may be illustrated as follows.[48] Expressed in then used units, Planck found for the value of what was later called Planck’s constant: h=6.55×10-27 (modern value: 6.63×10-27); for what Planck called Boltzmann’s constant: k=1.34×10–16 (1.38×10–16); and for the elementary electric charge: e=4.69×10-10 (4.80×10-10). The latter differed considerably from the then best determined value: e=6.5×10-10. Only in 1908, when Rutherford and Geiger measured the charge of alpha-particles (2e=9.3×10-10) one recognized the accuracy of the values derived from measurements in the far in­frared part of the optical spectrum of black body radiation.

In 1916, Einstein found the most elegant derivation of Planck’s radiation law.[49] He assumed the reversibility of emission and absorption stimulated by other radiation, besides the irreversibility of spontaneous emission of radiation. Moreover, he used Wien’s thermodynamically based displacement law. Hence, in one stroke, Einstein showed the fruitfulness of the cooperative use of the two complementary theories, thermodynamics and statistical physics, and the necessity to introduce irreversibility independent of mechanical considerations.

The photoelectric effect

The next problem shifted the physicists’ attention from the infrared to the ultraviolet region. It is a remarkable coincidence that in the same series of experiments, in which Heinrich Hertz definitively demonstrated the correctness of James Clerk Maxwell’s electromagnetic theory of light, he also discovered the photoelectric effect that later on gave rise to a thorough revision of that theory. When ultraviolet light falls on a metal surface, this may lead to the emission of electrons. Later research established that no classical theory was able to explain the properties of the photoelectric effect. This concerns in particular Philipp Lenard’s discovery (1902), that the number of released electrons is determined both by the intensity of the absorbed light and by its frequency, and that the maximum energy of the emitted electrons only depends on the frequency of the light beam.[50]

This became an important argument in favour of Albert Einstein’s photon hypothesis, suggesting that light with a frequency f only occurs in energy packets E=hf.[51] Hence, in contrast to Max Planck, Einstein ascribed the discrete character of the interaction between matter and field to the field. The electromagnetic field became quantized, just like the electric fluid was quantized by the discovery of the electron (1897). In 1916 Einstein added that these quanta have linear momentum as well (p=E/c, where c is the speed of light), inversely proportional to the wave length. In 1922 Arthur Compton confirmed this in a celebrated experiment.

The relation between energy and frequency, applied by Niels Bohr in his atomic theory of 1913, was experimentally confirmed by Robert Millikan in 1915. Nevertheless Millikan did not accept the existence of light quanta:

‘I spent ten years of my life testing that 1905 equa­tion of Einstein’s and contrary to all my expecta­tions, I was compelled in 1915 to assert its unam­biguous verification in spite of its unreasonable­ness, since it seemed to violate everything we knew about the interference of light.’[52]

Only in his autobiography (1950) Millikan admitted that his experiments could only be interpreted by Einstein’s hypothesis.

Until 1920, Planck and Einstein did not have many adherents to their views. Einstein’s light quantum hypothesis experienced as much if not more opposition than Planck’s quantum theory. Initially even Planck did not adhere to it. In his Lectures on the theory of thermal radiation (1906), summarizing his earlier publications, Planck remained classical, and his theory was ‘incompatible with the quantization of resonator energy … it played no role in his thought’.[53]

However, from 1913 in new editions of the Lectures quantization does play a central role.

When in that year Einstein was nominated to become a member of the Prussian Academy of Sciences in Berlin, Planck, Warburg, Nernst, and Rubens wrote in their recommendation:

‘Summing up, we may say that there is hardly one among the great problems, in which modern physics is so rich, to which Einstein has not made an important contribution. That he may sometimes have missed the target in his speculations, as, for example, in his hypothesis of light quanta, cannot really be held too much against him, for it is not possible to introduce fundamentally new ideas, even in the most exact sciences, without occasionally taking a risk.’[54]

In 1921 Einstein received the Nobel Prize for this discovery, not for his much better known theory of relativity, probably because the Nobel prize was preferably given for important contributions to experi­mental physics.[55] Einstein’s photon ­hypothesis was distrusted, but his prediction of the linear relation between the energy of the emitted electrons and the frequency of the absorbed light, in 1915 confirmed by Millikan, made a deep impression.

As late as 1924, Niels Bohr, Hendrik Kramers, and John Slater published a theory of electromagnetic radiation, fighting the photon hypothesis at all cost.[56] They even abandoned the laws of conservation of energy and momentum at the atomic level, ignoring Arthur Compton’s publication (1922) describing the collision of an X-ray particle with an electron conserving energy and momentum. Within a year, experiments by Walther Bothe and Hans Geiger proved the ‘BKS-theory’ to be wrong. Only then Niels Bohr accepted the wave-particle duality. In 1924 Satyendra Bose and Albert Einstein derived Planck’s law from the assumption that electromagnetic radiation in a cavity behaves like an ideal gas consisting of photons,[57] the name Gilbert Lewis introduced in 1926.

The twentieth century led to many new problems and insights into light, and to a new optical industry, improving and inventing, producing and selling, many new electric sources of light and other optical devices, which made a definite end to the nineteenth century.


[1] On the history of optics, see Whittaker 1910; Ronchi 1939; Scott  1952, chapter 4; Sabra 1967; Westfall 1971, 50-64; Lindberg 1976; Levenson 1994, chapter 3; Park 1997; Cohen 2010.

[2] Landes 1983; 1998; Cohen 1994; 2010; Gaukroger 2006.

[3] Lindberg 1992, chapter 8.

[4] Lindberg 1992, 203-206; Grant 2001, 85-87.

[5] Johannes Kepler, Ad Vitellionum paralipomena, quibus astronomiae pars optica traditur (1604), often abbreviated Astronomiae pars optica, discusses the relevance of optics for astronomy.

[6] This picture is reproduced in Dijksterhuis 1950, 162 (II: 69) and in Lindberg 1976.

[7] Eaman 1994.

[8] Galileo 1610.

[9] Aristotle, On the heavens, II, 7.

[10] Galileo 1638, 42.

[11] Descartes, Oevres, I, 307, letter to Beeckman, 1634; 1637, 43, 84; 1647, 136; 1664, 98; see Duhem 1906, 33-34.

[12] Sabra 1967.

[13]  Descartes 1637, 42-81.

[14] Thayer (ed.) 1953, 68-81; Cohen (ed.) 1958, chapter 2.

[15] Newton 1704, 1, Definition I.

[16] Newton 1704, 370, Query 29 (added in 1706)

[17] Newton 1687, 231-233; 1704, 281.

[18] Eamon 1994, 289.

[19] Cohen, Smith (eds.) 2002, 17.

[20] Newton 1704, 1.

[21] Newton 1704, 404.

[22] Cohen 1952, xli-xliii.

[23] See the extensive ‘Analytical table of contents’ in Newton 1704, lxxix-cxvi, prepared by D.H.D. Roller; Sabra 1967; Cohen 2010, 678-715.

[24] Newton 1704, 317-338 (Part  I of the third book).

[25] Newton 1704, 193-244 (Parts I-II of the second book).

[26] Newton 1704, 339.

[27] The fourth edition appeared posthumously in 1730. This is reprinted in the Dover edition of 1952, which is usually referred to.

[28] Query 31 counts 31 pages, all others together 37.

[29] Galileo 1638, 251-257 discussed the disturbing factors in ballistic experiments.

[30] Huygens 1690; Shapiro 1980.

[31] Gillispie 1960, 406-435; Achinstein 1991, chapter 1.

[32] Young 1804.

[33] Shapiro 1980, 207.

[34] Huygens 1690, chapter V.

[35] See Maxwell 1873, II, 436 (note added after 1890); Schaffer 1995.

[36] Brush, Everitt 1969.

[37] Chalmers 1973b.

[38] Sweetman 1995.

[39] Jungnickel, McCormmach 1986, I, 268-270.

[40] Jungnickel, McCormmach 1986, I, 298.

[41] Jungnickel, McCormmach 1986, I, 297-301.

[42] McGucken 1969.

[43] Bohr 1913; Jammer 1966, chapter 2; Heilbron, Kuhn 1969; Pais 1991, chapter 8; Kragh 1999, 53-57.

[44] Whittaker 1953, 78-86; Dugas 1959, 239-259; Klein 1962; 1963a; 1970, 217-263; Jammer 1966, 1-56; Hermann 1969; Brush 1976, 640-649; Kuhn 1978; Schirrmacher 2003.

[45] Kuhn 1978; Heilbron 1986; Jungnickel, McCormmach 1986, II, 228-231, 256-268.

[46] Garber 1976; Kuhn 1978, 114; Kragh 1999, 63.

[47] Planck 1948, 16-17.

[48] Kuhn 1978, 110-111; Pais 1986, 74.

[49] Einstein 1917; Whittaker 1953, 197-198; Klein 1964, 16-21; 1980; Jammer 1966, 112-114.

[50] Wheaton 1978.

[51] Einstein 1905c; Klein 1963b; 1964, Pais 1982, part IV.

[52] Millikan, ci­ted by Pais 1982, 357.

[53] Kuhn 1978, 125, 126.

[54] Jammer 1966, 44; Kuhn 1978, 182.

[55] Pais 1982, chapter 30.

[56] Bohr, Kramers, Slater 1924; cp. Slater 1975, 11; Pais 1982, chapter 22; 1991, 232-239.

[57] Pais 1982, 425-428.


Chapter 7

The unification of physical science

7.1. The Newtonian synthesis

The previous chapter showed how optics became more and more connected with other fields of science. Dealing with the relations between various theories, chapter 7 will further investigate the unification of physics, to start with the Newtonian synthesis at the end of the seventeenth century. Theories are linked up via data and presuppositions. To find data for a theory, one usually relies on other theories, on instrumental observations, and on experiments. Most presuppositions of a theory are theories themselves. Problems for a theory are often provided by other theories, in need of the solutions as data or presuppositions. For Newton’s theory of gravity, for instance, mathematics and mechanics are presupposed theories, delivering lemma’s when necessary, whereas optics is applied to find data from instrumental observations.

It seems obvious that theories can only be linked if they share one or more statements. This, however, is hardly ever the case, at least not in a literal sense. Any statement derives its meaning from its theoretical context. Therefore, a concept or a statement in one theory shows at best some similarity with a statement in another theory. In applied geometry, for instance, it is assumed that a light ray resembles a straight line. The laws according to which light is propagated are similar to the geometrical laws concerning straight lines. Hence, theories can be related only as far as such similarities are recognizable, perceivable. The network of theories is constituted by a logical kind of sensory activity.

In the context of the Newtonian synthesis[1], section 7.1 discusses how the axioms of one theory may be derived in another one, how the axioms of an old theory are perceived to have a similarity with theorems in a newer one. After the quite successful separation of various fields of science, it turned out that these have many, sometimes unexpected, connections (7.2). It led to the search for more unity in science, culminating in the discovery of the law of conservation of energy (7.3), the development of thermodynamics as a continuum theory (7.4), and of statistical physics based on atomism (7.5).

Excluding Cartesian physics, the Newtonian synthesis concerned in particular Kepler’s planetary laws and Galileo’s law of fall and theory of projectile motion. It also included the principle of inertia; the concepts of force and acceleration; the theories of impact by Huygens, Wallis, and Wren; the theory of pendulum motion by Galileo and Huygens; the principles of relativity, conservation, and composition of motion; Huygens’ theory of circular motion; Torricelli’s and Pascal’s views on the void; as well as Boyle’s theory of matter.

The synthesis is expressed in the theory of mechanics, treated by Newton in the introduction of Principia (3.6), and in the theory of gravity, constituting the main topic of this book. Besides, Newton wrought a synthesis of seventeenth-century optics, by means of his influential Opticks (6.3).

Logic versus physics

Some philosophers[2] tend to doubt this synthesis, in particular the so-called reduction of Kepler’s and Galileo’s laws to Newton’s. They argue that Newton’s theory is incompatible with the former laws, pointing out that the theory of gravity contradicts Kepler’s and Galileo’s laws on several accounts. Three examples are frequently given.

1. Kepler’s first law states that the planets move in elliptical orbits, with the sun at a focus. Philosophers observe that Newton’s theory proves this law to be false, because each planet’s orbit is disturbed by other planets. A physicist or astronomer would counter that this disturbance is small, and unobservable by Kepler’s means. Within the limits of accuracy achievable in the seventeenth century, Newton’s theory proved Kepler’s law to be true.

2. Kepler’s third law relates the dimension R of a planet’s orbit with its period, T, by stating R3/T2 to be the same for all planets. Philosophers observe that Newton’s theory proves that this relation for each planet is proportional to (M+m), where is the mass of the sun, and m is the mass of the planet. Because m has different values for the various planets, R3/T2 is not the same for all planets. A physicist might rejoin that M is much larger than m, such that m, the planet’s mass, can be neglected with respect to M, the mass of the sun. Within the accuracy of the observations then available, Kepler’s law is proved to be true in Newton’s theory. The same applies to Jupiter’s and Saturn’s moons, albeit that M, now the mass of the central body, differs in the three cases. Hence, the constant value of R3/T2 also differs, which was known before Newton found his theory. Hence, Kepler’s harmonic law was a reasonable approximation in Newton’s theory.

3. Galileo’s law of fall states that in a vacuum all bodies fall with the same acceleration. Philosophers say that Newton’s theory shows the acceleration to depend on the height of the falling body above the earth’s surface, as well as on its position (the latitude) on the earth.[3] A physicist could reply that Galileo’s law concerned laboratory experiments of balls rolling down an inclined plane, for instance, or of pendulums with various bobs. According to Newton’s theory, the acceleration of free fall in a space confined to a laboratory was constant within the accuracy of measurements possible during the seventeenth century. Hence, Newton’s theory confirmed Galileo’s law. Moreover Newton described an independent experiment on pendulums, in order to confirm Galileo’s law that the acceleration of gravity is independent of the mass of the falling body.[4]

The difference between the philosopher’s and the physicist’s attitude appears to be this. Whereas the philosopher is only interested in the logical relations between the two theories, the physicist also takes into account how the laws are physically justified, finding a large amount of agreement. From a strictly logical point of view the philosopher may be right. But the theories discussed are not logical but physical. Since the Middle Ages, physicists are not first of all interested in logical aspects of theories, but in their physical relevance, including the accuracy of measurements.

Physics versus logic

Virtually all twentieth-century philosophers of science pretend to be empiricists, emphasizing that scientific results are tentative, stressing that hypothetical law statements cannot do more than approximate reality. Yet, they immediately tend to forget these cautions as soon as they start a logical discussion. The arguments of our imaginary physicist derive their strength from the provisional and approximate character of any law statement. Also Newton’s law of gravity has turned out to be no more than an approximation, and philosophers now call it false. Physicists put forward that Newton’s results are not contrary to those of Kepler and Galileo, but show sufficient similarity to explain, deepen, and relativize them.

1. Newton explained Kepler’s and Galileo’s laws by demonstrating these to be close approximations of special cases of his more general theory.

2. Newton’s theory relativized Kepler’s and Galileo’s, because it showed their limited applicability.

3. Newton’s theory also deepened Kepler’s and Galileo’s laws, first because it connected these, showing that the apparently widely differing motions of a projectile and of the moon are closely related. Next, it demonstrated how Kepler’s and Galileo’s laws can be extended. It showed, for instance, that the orbit of a planet or a comet may be (approximately) a circle, an ellipse, a parabola, or a hyperbola. Newton’s theory enables one to calculate the force of gravity at the surface of Jupiter or Saturn, and the variation of the acceleration of free falling bodies as a function of the latitude, or of height above the earth’s surface.

The deviations from Kepler’s and Galileo’s laws were largely unknown before Newton wrote his Principia. On the one hand, Newton was able to start from the assumed truth of Kepler’s and Galileo’s laws, on the other hand he was able to indicate how these laws should be corrected. Newton was well aware of the fact that these empirical generalizations are approximations.[5] Also Galileo stressed the approximate character of his theory of projectile motion.[6]

The principle of correspondence

One should prefer not to say that Kepler’s or Galileo’s theories are absorbed by, or can be derived from, Newton’s theory. It would be  better to say that the axioms of the older theories are justified by the new one, and are valid within a certain margin of accuracy. This means that the old theories, being discredited according to the philosophers, in a physicist’s eyes gain in credibility, rather than losing it. It also means that the new theory explains why the old theory could be successful, even if its axioms have only limited validity. Hence, it is not necessary to reconsider all problems solved with the old theory.

The relation between a new and an old theory is not always of this kind. Newton’s theory of gravitation collided with Descartes’ vortex theory, because Descartes’ axioms (for instance, the identification of matter and space) deviated too much from Newton’s axioms and theorems. As a consequence, all problems solved by Descartes’ theory had to be solved again.

Accepting Newton’s theory allowed people to continue teaching and using Galileo’s and Kepler’s laws. But without coming into conflict, it would be impossible to teach Newtonian and Cartesian theories simultaneously.

In the twentieth century, Niels Bohr introduced the principle of correspondence to describe the relation between successive, non-contradicting theories.[7] He applied it to the relation between quantum and classical physics. It has two aspects. First, it says that every new theory has to explain why the old theories were able to give more or less correct solutions to their problems. Second, in order to find new theories, one may take one’s lead from the old ones. In both respects, the principle of correspondence applies to the relation between Newton’s theory and those of Kepler and Galileo.

7.2. Connections between

separated fields of science

Whereas the preceding chapters mostly discussed theories as independent and autonomous, this chapter argues that theories can only be fruitful if they are multiply connected. This shows another function of theories, besides prediction, explanation and problem solving. It is the systematization of knowledge, i.e., to bring pieces of knowledge into contact with each other, in order to find a comprehensive view of reality Also this function of theories is governed by the principal law of logic, the law of non-contradiction. A systematic, coherent theoretical view of reality can only be achieved if statements contained in one theory do not contradict those of other theories. This aim cannot be achieved in one stroke, and possibly it will never be achieved to any degree of perfection. It is an ideal of science, to be approximated but never to be reached. As a matter of fact, the more problems scientists are able to solve, the more unsolved problems they seem to discover. Nevertheless, the pursuit of systematic, coherent knowledge is worth the effort.

Despite the Newtonian synthesis, eighteenth-century physicists adhering to experimental philosophy continued to isolate fields of science from each other and to develop these separately. For each field this implied the introduction of specific concepts, like a fluid with action at a distance, a specific force or potential allowing of mathematization, and experiments sustained by technical apparatus, in particular measuring instruments. This required formulating an operational definition and establishing a metric for any measurable quantity (1.4). In this way physicists separated and developed electricity and magnetism (chapter 4); optics (chapter 6); thermal physics (7.3); gravity (9.2); as well as hydro- and aerostatics (9.5).

Influenced by the mechanist philosopher Immanuel Kant and the romantic philosophers of nature (Naturphilosophen), after 1800 an increasing number of investigators wondered whether the various fields of science could be connected. The unity of physics should be accounted for in a unity of all natural forces. Newton’s concept of force implies the possibility that quite different forces acting on the same object balance each other, but these forces do not thereby become unified. The driving forces of electric and thermal currents did not even satisfy Newton’s laws.

Hence, at the start of the nineteenth century the problem was posed: Is it possible to transform the interaction in one field of science into that of another one? This question led first to the discovery of quite a few connections between the separate fields of physical science, such as in electrochemistry, next to the law of conservation of energy (7.3) and the development of thermodynamics (7.4). Romantic energeticists considered this a unifying theory. Another candidate for unification became the electromagnetic field, relating first electricity with magnetism (4.4), then optics with electromagnetism. It was a physical field, because energy played a part in it (4.6). This led to the question of how electromagnetic radiation would interact with matter (6.5). Its experimental success led inexorably to the wave-particle duality and the end of classical physics.

Effective bridges

The fields of science did not remain isolated forever, but became connected by several effects, as these phenomena later were called.[8] Franz Aepi­nus observed the pyro-electric effect already in 1756: some crystals become electrically polarized by heating. In 1821 Thomas Johann Seebeck discovered thermoelectricity.[9] When two different metals (for example copper and iron) form a circuit having two contacts at a different temperature, in the circuit an electric current will accompany the thermal flow. Interrupting the circuit, a potential difference can be measured. This is called the Seebeck-effect. Georg Ohm used a thermoelectric current source in his experiments (4.5). Thermoelectric thermometers convert a temperature difference into a measurable electric tension. In 1834 Jean Peltier found the reverse effect, named after him. If a metal carries an electric current, simultaneously a heat flow will occur, resulting in a temperature difference along the terminals.[10] It is applied in some refrigerators.  In 1880, Pierre and Jacques Curie explored the piezoelectric effect, soon applied in gramophones.[11] An electric tension will occur on a crystal on which a mechanical pressure is exerted, and reversely. In 1887 Heinrich Hertz discovered the photoelectric effect (6.5).

In 1763 Ebenezer Kin­ner­sley found that the discharge of a Leyden jar through a thin wire caused so much heat that two thin iron wires could be welded together.[12] In 1807 Thomas Young observed the same for a current delivered by a voltaic pile. Since 1837 James Prescott Joule measured the heat developed by an electric current,[13] leading to the formulation of Joule’s law. It says that the heat developed per second in a wire is proportional to the current and the potential difference. Because Joule also determined the equivalence between mechanical work and heat, electricity and mechanics became connected. The now common unit of energy, called after Joule, can be expressed both in electric and in mechanical units.

Faraday

After Hans Christian Oersted’s discovery of the action of a current carrying wire on a magnet, André-Marie Ampère and others reduced it to a force (4.4). Michael Faraday, however, assumed that Oersted’s experiment should be explained from the ‘electroto­nic state’ of the current carrying wire and the space surrounding it. He did not like Ampère’s explanation of the force between two current carriers by means of action at a distance. According to Faraday, rotation was the most important aspect of Oersted’s experiment. In 1821 he succeeded in making a current carrier to rotate around a magnetic bar and reversely.[14] His demonstration inspired the development of electric motors.

Michael Faraday cannot be directly or indirectly related to any philosophy whatever except experimental philosophy.[15] He may have arrived at his views about the unity of all physical forces independent of any romantic influence.[16] Although Faraday made several important theoretical contributions, he is first of all known as an able experimenter. Besides he was famous for his popular lessons on science, among others for children.

From 1821 to 1831 occasionally and between 1831 and 1855 almost continuously, Faraday was involved with his Experimen­tal researches on electricity, a series of papers which sections were numbered successively, and which have been published in three volumes.[17] After many unsuccessful experiments, in 1831 he discovered electromagnetic induction.[18] Whereas an electric current causes a magnetic action, Faraday observed that an electric current is caused by a changing mag­netic action, such as a magnet moving in a coil. In 1832 he demonstrated that the induced current in a circuit is inversely proportional to the resistance in that circuit. Much later, in 1850 he found that the induced current is proportional to the change per unit of time of the number of lines of force surrounded by the circuit. Electromagnetic induction is applied in dynamos, the strongest and most effective sources of electricity known at present.

In 1845 Faraday found that a magnetic field could change the direction of polarization of a beam of light, implying an experimental relation between light and electromagneti­c interaction:

‘Thus it is established, I think for the first time, a true, direct relation and dependence between light and the magnetic and electric forces; and thus a great addition made to the facts and considerations which tend to prove that all natural forces are tied together and have one common origin.’[19]

For the same reason he investigated whether besides iron other substances can be magnetized. In 1778, Sebald Brugmans had observed the magnetic properties of cobalt, bismuth, and antimony. Faraday started a systematic research and concluded in 1845 that all substances have magnetic properties.[20] He proposed two explanations. The first made use of Ampère’s theories, the second, which he preferred, assumed that most substances are poor (and some good) conductors for magnetic lines of force. In 1847 and 1871 Wilhelm Weber would publish a theory of various kinds of magnetism according to his Newtonian theory.

Naturphilosophie

All these connections were discovered in a time when Kantian mechanism and roman­tic Naturphilosophie exerted an important influence, especially in Germany, where natural science was often considered to be part of philosophy. Yet, with the exception of Johann Ritter, Hans Christian Oersted, and Thomas Seebeck, it would be difficult to prove that the investigators concerned were influenced by romantic views.

Friedrich Schelling developed his Naturphilosophie before 1800, and about 1830 he had some reason to conclude with satisfaction that his views had been confirmed by the discoveries of Oersted, Seebeck, and Faraday. Yet even Oersted, who was Schelling’s friend, took distance from his specula­tions, because these had few or no relation with empirical reality. Georg Hegel’s influence on the development of natural science is negligible. Sooner or later, any physicist or chemist having a good reputation rejected the speculative character of Goethe’s, Schelling’s, and Hegel’s romantic Naturphilosop­hie.

Probably most physicists remained loyal to experimental philosophy, even after Immanuel Kant and his many adherents based their variant of the philosophy of mechanism on Newton’s mechanics.

7.3. The law of conservation of energy

The urge to find the unity of all natural forces, the discovered convertibility of various interactions, and the analysis of work producing machines, have all contributed to the discovery of the law of conservation of energy, in which several scientists took part.[21]

Mathematical physicists developed rational mechanics (avoiding heat and friction) in which conservative forces performed work, converting work into vis viva. Julius von Mayer (1842, 1845) determined the equiva­lence of heat and work from the difference between the specific heats of a gas at constant volume or at constant pressure.[22] In 1816, Laplace showed the speed of sound to depend on the specific heat ratio (cp/cv), where cis the specific heat at constant pressurecv at constant volume of a gas. This ratio could be measured by a method due to Nicholas Clément-Desormes (1811, 1819).

James Prescott Joule was an experimentalist, despising all speculations due to Kantian mechanism or romantic Naturphilosophie. His family owned a brewery in Manchester, where he performed his many experiments, first on electricity, later on heat. As an industrialist he was more interested in the mutual convertibility of heat and work than in conservation laws. Starting in 1843, Joule determined with increasing precision how much work could be transformed into a certain amount of heat.[23]

The law of conservation of energy has three aspects: the convertibility of different kinds of interac­tion; conservation of a certain quantity of interaction; and the universality of the energy ­principle. Michael Faraday formulated the first and third aspect:

‘I have long held an opinion, almost amounting to convic­tion, in common I believe with many other lovers of natu­ral knowledge, that the various forms under which the forces of matter are made manifest have one common origin; or, in other words, are so directly related and mutually dependent, that they are conver­tible, as it were, one into another, and possess equiva­lents of power in their acti­on.’[24]

But Faraday overlooked that a conservation law can only function within a mathematically developed theory.[25]

Historians stressing the first aspect are inclined to consider Mayer and Joule as the independent discoverers of the law of conservation of energy. Those emphasising the second aspect point to earlier applications of a restricted conservation law in mechanical problems. And historians who believe the universali­ty of the energy principle to be most important ascribe the conservation law to Helmholtz, who moreover did not overlook the other two aspects.

Helmholtz

Hermann Helmholtz’ treatise Über die Erhaltung der Kraft (1847: On the conservation of force) shows a broader vision and a deeper insight than any other contribution.[26] Like Mayer he took distance from the Naturphilosophen. Inspired by Kant’s version of mechanism, based on Newton’s laws of motion, both aimed at reducing the physiological functions of plants and animals to mechanical force laws.[27] Helmholtz’s treatise consisted of an introduction followed by six parts. He opened his work by stating his intention to represent the foundations of his theory independent of any philosophical view, but in 1881 he admitted Kantian mechanicist influences.

Helmholtz started his analysis with a strictly mechanical sys­tem. He demonstrated that only vis viva (living force, the product of mass and the square of the speed of an object) could be a candi­date for a conserved magnitude. However, he defined vis viva as (1/2)mv2, not as mv2m being the mass of a body and v its speed. Now the increase of vis viva as a consequence of the action of some external force equals the work done by that force. In a system in which only central forces act, vis viva only depends on the relative position of the parts of the system. This he called the law of conservation of vis viva. He proved that this magnitude in a mechanical system is conserved if the system consists of point-masses interacting through central forces satisfying Newton’s third law. (A central force between two objects is directed along their connecting line, and depends only on their distance. The force discovered by Ampère between two electric current elements depends on the angle these elements make with their connecting line, and is therefore not a central force.) Hence, Helmholtz accepted both the Newtonian matte­r-force dualism and an atomistic view of matter, but he took no position with respect to action at a distance.[28]

In the second part Helmholtz introduced the concept of Spannkraft (tensile force). Now the conservation law states that the increase of vis viva equals the decrease of all tensile forces. In the third part he gives some examples. Helmholtz also treated wave motion, discussing some possible transformations in the absorption of light. In the fourth part he dealt with the force-equivalent of heat, starting with a brief remark on inelastic collisions. He argued that the loss of force due to friction and in inelastic collisions was compensated by the development of heat. He referred to Joule’s work (in 1847 he did not know of Mayer’s), and he discussed the development of heat by electric currents and in chemical processes. The introduction of the concept of energy implied that the caloric theory and its law of conservation of heat were abolished definitively (7.4). Solids, liquids, and vapours were no longer considered compounds with caloric as a component, but different states of aggregation with the same molecular composition.

In the final two parts Helmholtz treated electricity and mag­netism. He derived Faraday’s electro­mag­netic induction from the new conservation law. In 1847 he was convinced that the law of conservation of energy would be valid only if all forces would be central. Therefore he criticized the theories of Ampère and Weber, containing forces depending on velocity and acceleration (7.2). In 1881 Helmholtz admitted velocity-dependent forces if these cannot deliver work. This is the case with the force later discovered by Lorentz, which is always perpendicular to the direction of motion.

One of the most striking aspects of Helmholtz’ work is its title: On the conservation of force. Because it clearly differed from Newton’s vis impressa, in 1853 Rankine proposed to call the conserved magnitude energy, a word that Young circa 1800 used for vis viva. Tensile force was renamed potential energy and related to the concept of potential (4.5). Vis viva soon became kinetic energy.

Helmholtz did not object to these proposals, yet even in 1881 he saw no reason to change the title of his treatise. It shows that about 1850 the word force still had a much wider meaning than it has now.[29] It meant more or less what is now called interac­tion. As such it lacked mathematical precision, unless specified. However, Helmholtz was very clear about his mathematics. In his treatise there is no confusion at all about central force, vis viva, work, tensile force, or heat. He did not object to giving the conserved magnitude a new name, while considering the title of his paper quite to the point. By the end of the nineteenth century physicists restricted the concept of force to the vis impressa operating in Newton’s laws of motion and gravity, in Coulomb’s laws, and in Weber’s electrodynamic theory.

Helmholtz applied two ways of defining mathematical expressions of interaction. The first is by using a mathematical formula, leading to an operati­onal definition, indicating how the magnitude concerned can be measured or calculated. An important means for the analysis of these mathematical expressions of interaction is the so-called dimension analysis, inspecting the units of these magnitudes. For instance, if work can be transformed into heat both should be expressible in the same units. This method was developed circa 1830 and Helm­holtz applied it to show the relation between work and vis viva, work and heat, etc.

The second way of defining a mathematical expression of interaction is by formulating laws. Now besides the method to measure them, the conceptual meaning of the magnitude concerned is made clear in a comprehensive theory. Because of Newton’s laws of motion, the most important expression for interac­tion was vis impres­sa, what is now called force (3.6). Because of the law of conservation of energy, this concept started to surpass that of force. This came to the fore in the development of thermodynamics, with the law of conservation of energy as its first law.

7.4. Thermodynamics

Although the concept of degrees of heat stems from the Middle Ages, the development of thermometers started early in the seventeenth century, in the context of meteo­ro­logy and botany (in hothouses).[30] Several scientists constructed a thermo­sco­pe, a tempera­ture indicator without a scale based on the thermal expansion of air. The discovery that substances like melting ice or boiling water always have the same temperature allowed of the design of a reproducible thermometer with a scale.  About 1740 mercury, contained in a small glass bulb and expanding in a sealed capillary, was accepted as the most convenient and reliable design for the construction of a thermometer. This was based on the assumption that mercury expands uniformly on heating. The most common scales still in use are Fahrenheit’s (1717) and the centigrade or Celsius scale (1742). Both are linearly interpolated between and extrapolated from the fixed points of melting ice (0°C or 32°F) and of boiling water (100°C or 212°F). At the beginning of the nineteenth century gas thermometers became the standard,[31] though in practice mercury thermometers remained in use until the introduction of electronic devices.

Temperature is an intensive magnitude, a universal equilibrium parameter. This means that two bodies that may be different in kind or size have the same temperature if they have been in thermal contact until equilibrium has been reached. The measurement of tempera­ture rests on this property. For instance, the temperature of a glass of water is measured by putting a thermometer in it, reading the temperature of the thermometer after being convinced that it equals that of the water. It is also a transitive magnitude, an equivalence relation. If two bodies A and B have the same temperature, and B and C too, then A and C have the same temperature as well. This is important if B is a thermometer. If temperature were not transitive, the use of a thermometer would make no sense.

Phlogiston

Among Empedocles’ four elements the medieval alchemists recognized fire to be the most important for distillation and the purification of metals. They considered metals to be mixtures or compounds of ores (earth) with fire. Georg Stahl accepted this view, but like Robert Boyle he abandoned Aristotle’s four elements theory, assuming that the number of elements could be more than four. One of these would be phlogiston, the inflammatory part of fuel.[32]

In combustion, phlogiston is sensible as heat and visible as light. In calcification (rusting) of metals, phlogiston is liberated. Reversely, in order to make metals from ore, phlogiston has to be added by heating the ore, most effectively by means of charcoal. This very inflammable substance was assumed to consist almost entirely of phlogiston, because after combustion hardly any ash is left. Combustion in a closed space can only happen during a limited time. Stahl concluded that the air present was soon saturated with phlogiston, which is only in a limited amount solvable in air. Saturated or fixed air clouds limewater and so does the air exhaled by humans and animals. The heat produced by humans and animals is related to the phlogiston produced in respiration. About 1750, the phlogiston theory was generally accepted, although it was impossible to isolate phlogiston as a pure substance.

By catching air (or gas as he called it) under an inverted glass-bell, Johan­nes Baptista van Helmont was able to include gases in chemical analysis. Until circa 1450, when Jean Buridan recognized water vapour, the only gas known was atmospheric air. Mercury vapour followed in the seventeenth century, and in the eighteenth century quite a large number of new gases were discovered, in particular by Joseph Priestley, who established atmospheric air to be a mixture of gases. Henri Cavendish found that a combustible gas is liberated when a metal is solved in an acid. He called it inflammable air, and wondered whether it could be identified as phlogiston.

Lavoisier

Antoine-Laurent Lavoisier favoured experimental philosophy and was a keen experimenter himself.[33] Like Newton he accepted mass to be a measure of the amount of matter. Lavoisier believed that matter cannot be destroyed, hence in a chemical reaction mass should be unchanged. Whereas light and the electric fluid were usually considered imponderable,[34] Lavoisier assumed that phlogiston like air belonged to the substances having weight.[35] Therefore, in 1772 he was surprised to find that sulphur and phosphorus increase in weight by combustion, although the theory predicted that phlogiston would be liberated.

Now chemists knew that some substances become heavier by combustion or calcification, but for them weight was a physical property to which they paid little attention. They were even ready to allot phlogiston a negative weight. Indeed, the Aristotelian element fire was light in an absolute sense, and in 1783 the brothers Montgolfier made a balloon flying over Paris, proving that air mixed with phlogiston is lighter than air. But for Lavoisier negative weight was unacceptable, and he arrived at the idea that in combustion fuel is not liberated, but air is bound. This implies that metals are elements and ores are compounds, a rather drastic change in the then common world view. This generated a number of problems, occupying Lavoi­sier more than ten years: What is the nature of air, bound in combustion and calcification? What is fixed air? What is inflammable air? And what is heat?

Joseph Priestley, becoming the main defendant of the phlogiston theory, played an important part in the solution of the first three problems. It cannot be precisely traced who discovered oxygen, but Lavoisier recognized it as the answer to his first problem. Atmospheric air turned out to be a mixture of about 20% oxygen and 80% nitrogen, and fixed air a compound of carbon and oxygen. Cavendish’ inflammable air together with oxygen produced water. As a water generating substance it became known as hydrogen. Lavoisier believed that all acids contain oxygen, but Humphry Davy proved in 1808 that substances like hydrochloric acid do not contain it. Yet the name ‘acid producing element’ stuck.

In combustion oxygen replaced phlogiston, but this did not solve the problem of heat, for which Lavoisier cooperated with Pierre-Simon Lapla­ce. Before them, Joseph Black and others introduced the concepts of specific heat and latent heat.[36] Black developed his ideas on heat as a substance about 1760, but although these were published only posthumously in 1803, possibly Lavoisier and Laplace knew of these earlier. Lavoisier distinguished free heat, increasing the temperature of a body, from bound heat, leading to the change of a substance. Typical examples are the melting heat; the heat of vaporization; and the heat involved in a chemical reaction like combustion. He introduced caloric (calorique in French) as an imponderable chemical element capable of forming compounds with other substances.  Lavoisier considered water to be a compound of ice with caloric, and water vapour a compound of water with caloric. He called heat the gas generating principle.

Joseph Black and Antoine Lavoisier independently introduced a new measure for heat.[37] This amount of heat is an extensive measure, to be distinguished from temperature, the corresponding intensive measure. If one heats different amounts of water over the same temperature interval, one needs amounts of heat proportional to the amount of water. The quantity of heat is not transitive, like temperature, but additive: the amount of caloric needed to heat two bodies is the sum of the amounts required for each apart.

Because Black and Lavoisier considered caloric to be a non-destructive substance, they introduced the principle of conservation of heat as the foundation of all caloric experiments. If an object is cooled the amount of heat liberated is the same as is needed to heat it through the same temperature interval. When two objects of unequal temperature being isolated from the environment come into thermal contact, one of them loses as much heat as the other gains. With this in mind, Black and Lavoisier developed the calorimeter as an increasingly accurate instrument. Their unit of heat became the calorie: the quantity of heat required to increase the temperature of 1 gram water by 1 degree.

The three assumptions (heat proportional to mass, principle of conservation of heat, and the linearity of the mercury thermometer) were tested together in the following experiment. Two unequal amounts (mass m1 and m2) of the same substance having different initial temperatures (t1 and t2) are thermally connected. The final temperature (tf) is determined by the initial ones and by the masses, according to a theo­re­tical formula.[38] This was confirmed experimentally with a reasonable accuracy, although it later turned out that the three assumptions, which laid the basis of calorimetric measurements, were at best approximately true.

Black and Lavoisier operationally defined the specific heat of a chemically homogeneous substance (the quantity of heat needed for one gram to increase its temperature by one degree) and the heat capaci­ty of a body (the quantity of heat needed to increase its temperature by one degree). The specific heat played an important part in nineteenth-century physics. For a gas, this quantity depends on the condition whether one keeps the volume or the pressure constant. (The difference cp-cv determines the work done by the gas when the volume changes at constant pressure. The ratio cp/cv depends on the molecular structure of the gas.)

In 1783 Lavoisier and Laplace published their Mémoire sur le chaleur. The experimental part was their common work, but they presented different theories.[39] Initially, Laplace adhered to the mechanist view of heat, but later he accepted Lavoisier’s caloric theory.[40] In 1785 Lavoisier published his definitive attack on the phlogiston theory, in his Réflexions sur le phlogisti­que, stressing that caloric contrary to phlogiston was measurable. Through Lavoisier’s works the caloric theory became known and widely accepted.[41] Only an increasing number of mechanists rejected it. Like Descartes and Leibniz (and even Newton) in the seventeenth century, Daniel Bernoulli, Benjamin Rum­ford, Humphry Davy, and initially Pierre-Simon Laplace, considered heat to be a manifestation of the motion of material particles.[42] About 1800, in particular Rum­ford and Davy opposed the idea that heat could be an indestructible substance. By his experiments Rumford proved that heat could be produced by friction in indefinite amounts.

Being experimental philosophers, the adherents of the caloric theory were not impressed. They considered it more important that the principle of conservation of heat allowed of its measurement and of a mathematical theory. This turned out to be impossible from Rumford’s point of view. His experiments were considered an anomaly, like the phenomenon that water at heating from 0 to 4oC does not expand but contract. The adherents of the caloric theory were not prepared to reject their theory because of refuting evidence, as long as no better (e.g., mechanical) theory was available.[43]

Meanwhile Laplace made clear that the hypothesis that heat should be an indestructible substance is not necessary, and Lavoisier admitted that the measurability of heat and the principle of conservation of heat are sufficient for the theory. In 1822, Joseph Fourier interpreted a temperature difference as a generalized force driving a heat current, in accord with the matter-force dualism. Fourier emphasized that his theory of heat conduction was independent of the assumption that heat is a substance. For Laplace, Lavoisier, and Fourier the law of conservation of heat was more important than the nature (the essence) of heat. Lavoisier observed that the term caloric has the ‘advantage in that it can adapt itself to all sorts of opinions’, and that ‘we are not obliged to suppose that caloric is real material’.[44] In contrast, John Dalton treated caloric as a real substance in his atomic theory. According to Dalton, an atmo­sphere of caloric surrounds each atom. This would explain why most materials expand on heating. Dalton believed that the form and size of this atmosphere made like atoms expel each other, whereas unlike atoms do not influence each other, unless bonded into a molecule.

Thermodynamics

No field of physical science has exerted more influence on Western society than electrici­ty and magnetism, but in the early nineteenth century this could not be foreseen. The electrification of society, starting circa 1880, was preceded by the industrial revolution after the introduction of steam engines, considered the source and symbol of progress. The study of heat was believed to be of more practical relevance than the academic investigation of electricity and magnetism.

Caloric machines producing mechanical work by thermal proces­ses were constructed in antiquity as curiosities; extensively studied during the seventeenth century; and strongly improved in the eighteenth and nineteenth centuries. Thomas Newcomen and James Watt did not invent the steam engine, but succeeded in improving older designs by the introduction of parts like the governor, allowing of stabilizing and controlling the steam machine. Together with the separate steam boiler and condenser this improved the machine’s efficiency considerably.[45]

The technical development of the steam engine since circa 1700 was of enormous significance for the industria­lisa­tion. In England steam engines replaced water mills, leading to the industrial revolution in particular in the textile industry. In Manchester, one of its centres, both John Dalton and James Prescott Joule grew up. They approached science in a practical way.

Only after the technical development of the steam engine its theoretical investigation started. In 1824 Sadi Carnot studied from a theoretic point of view the behaviour of caloric machines, producing work in a cyclic thermal process, a periodic process repeating itself indefinitely.[46] Carnot compared such a machine with a water wheel, in which water falls from a high to a lower potential, and which efficiency was then a matter of much dispute. According to Carnot, a caloric machine transports heat from a high to a lower tempera­ture level, producing work. As an adherent of the caloric theory he assumed heat to be an indestructible substance, like water in a water wheel moving from a high to a low level. In this view, during the process work is produced, but neither heat nor water is converted into work.

Carnot defined the efficiency of a heat engine as the ratio of the net produced work and the applied heat, both during one cycle. He argued that a caloric machine could only perform work in the presence of a tempera­ture difference driving a flow of heat. The larger this temperature difference is, the larger the machine’s efficiency. This is the theoretical foundation for the earlier introduced condenser, lowering the final temperature, and of the closed boiler, in which water boils under pressure at a temperature higher than 100oC.

Carnot assumed that the machine works most efficiently if the supply of heat takes place exclusively at the maximum temperature occurring during the process, and the discharge of heat only at its lowest temperature. As an ideal process Carnot considered a cycle consisting of two isotherms and two adiabatics. (The first diagram of the cycle was not drawn by Carnot but by Benoît Clapeyron, who in 1834 presented Carnot’s theory in a mathematical form.)

The two isotherms are processes during which either at a constant high temperature heat is supplied, or at a constant low temperature heat is discharged. These processes are alternated by two adiabatics, in which the tempera­ture is increased respectively decreased without the exchange of heat with the surroundings. Such an idea­lised cycle, now called a Carnot cycle, can only approximately be realized. The reversed process acts as a heat pump with the same efficiency. Whereas the Carnot-cycle is reversible, the spontaneous transfer of heat from a hot to a cold body, in which no work is produced, is irreversible. Carnot argued that no machine (reversible or irreversible) working between the same maximum and minimum temperatures could be more efficient than a Carnot engine, in which all heat transported between the two reservoirs is transformed into work. In actual machines this is never the case.

The laws of thermodynamics

At the end of his relatively short life, Carnot started to doubt the validity of the principle of conservation of caloric on which his theory depends.[47] About 1850 William Thomson returned to this matter, proving that Carnot’s insight into the efficiency of caloric machines remained valid if the recently proposed law of conservation of energy replaced the principle of conservation of heat. From Joule, Thomson learned about the equivalence of heat and work. Thomson interpreted Carnot’s machine as converting heat into work, such that the net sum of heat and work is conserved.

Thermodynamics’ second law, expressing the irreversibility of many processes, was circa 1850 introduced by William Thomson and Rudolf Clausius, making use of Carnot’s discoveries. Clausius stated as an axiom that no cyclical process could have the sole effect that heat is transferred from a low to a high temperature reservoir: a heat pump needs work to be done. Thomson’s axiom was that no cyclical process is possible in which heat is converted into work, without transferring heat from a high to a low temperature reservoir: to produce work, heat must be degraded.[48] The two statements turned out to be equivalent, because each could be derived from the other one.

Next Clausius introduced the concept of entropy, strongly related to currents of various kinds, because in any current entropy is created. Consider two large reservoirs, each at a constant temperature, connected such that a small amount ΔQ of heat flows from the hot reservoir at temperature T1 to the cool one at T2. An entropy difference is defined as ΔS=ΔQ/T. The entropy of the hot reservoir decreases by ΔS1Q/T1, and the entropy of the cool reservoir increases by ΔS2Q/T2. Because T1>T2, ΔSS1>0: the total entropy increases, the heat flow creates entropy. Now the second law states that in an isolated system the entropy is constant at equilibrium and is otherwise increasing.

The advantage of the concept of entropy above heat is that entropy turns out to be an extensive state parameter, with temperature as the corresponding intensive state parameter. In diagrams like Carnot’s the entropy difference between two points is independent of the path of the process from one to the other point, whereas the exchanged heat depends on the followed path. (This is mathematically expressed as: temperature is the integrating factor for heat, because entropy (ΔSQ/T) can be integrated, whereas heat cannot.)

In order to constitute a complete theory, two more laws are needed. The zeroth law (so called because it is the oldest one) states that if two or more bodies are in thermal equilibrium with each other, their temperatures are the same. It introduces temperature as the thermal intensive equilibrium parameter mentioned above. The third and youngest law is known as Nernst’s theorem (1912). It states that in any physical or thermal process approaching absolute zero, the change in entropy approaches zero. As a consequence, by no series of cyclical processes the zero point of temperature can be reached. Thermodynamics only deals with entropy differences. Conveniently, for any system the entropy at 0 K may be defined as being zero.

The laws of thermodynamics are generally valid, independent of the specific character of a physical thing or aggregate. For a limited set of specific systems (like a gas consisting of similar molecules), statistical mechanics is able to derive the second law from mechanical interactions, starting from assumptions about their probability.[49] Whereas the thermodynamic law states that the entropy in a closed system is constant or increasing, the statistical law allows of fluctuations. The source of this difference is that thermodynamics supposes matter to be continuous, whereas statistical mechanics takes into account the molecular character of matter (7.5).

Physical time is irreversible

The law of inertia expresses the independence of uniform motion from physical interaction (3.7). It confirms the existence of uniform and rectilinear motions having no physical cause. This is an abstraction, for concrete things experiencing forces have a physical aspect as well. In reality a uniform rectilinear motion only occurs if the forces acting on the moving body balance each other.

Kinetic time is symmetric with respect to past and future. If in the description of a motion the time parameter (t) is replaced by its reverse (–t), a valid description of a possible motion is achieved. In the absence of friction or any other kind of energy dissipation, motion is reversible. Distinguishing past and future allows of discovering cause-effect relations, assuming that an effect never precedes its cause. According to relativity theory, the order of events having a causal relation is in all inertial systems the same, provided that time is not reversed.

The existence of irreversible processes cannot be denied. All motions with friction are irreversible. Apparently, the absorption of light by an atom or a molecule is the reverse of emission, but Albert Einstein demonstrated that the reverse of (stimulated) absorption is stimulated emission of light, making spontaneous emission a third process, having no reverse. This applies to radioactive processes as well. Only wave motion subject to Schrödinger’s equation is symmetric in time. Classical mechanics usually expresses interaction by a force between two subjects, this relation being symmetric according to Newton’s third law of motion. However, this law is only applicable to spatially separated subjects if the time needed to establish the interaction is negligible, i.e., if the action at a distance is (almost) instantaneous. Einstein made clear that interaction always needs time, hence even interaction at a distance is asymmetric in time. Similarly, the spontaneous transfer of heat from a hot to a cold body is an irreversible process besides the reversible processes described in a Carnot cycle.

Irreversibility does not imply that the reverse process is impossible. It may be less probable, or requiring quite different initial conditions. The transport of heat from a cold to a hotter body (as occurs in a refrigerator) demands circumstances different from the reverse process, which occurs spontaneously if the two bodies are not thermally isolated from each other.

In the common understanding of time, the discrimination of past and future is a matter of course.[50] Yet, irreversibility as a temporal order is philosophically controversial, for it does not fit into nineteenth-century mechanism.[51] This worldview assumes each process to be reducible to motions of as such unchangeable pieces of matter, interacting through Newtonian forces. Ludwig Boltzmann attempted to bridge reversible motion and irreversible processes by means of the concepts of probability and randomness. In order to achieve the intended results, he had to assume that the realization of chances is irreversible.

It is sometimes stated that all basic laws of physics are symmetrical in time, implying that only mechanical laws are basic. This seems to be true as far as kinetic time is concerned, and if any law that belies temporal symmetry (like the second law of thermodynamics, or the law for spontaneous decay) is not considered basic. Anyhow, all attempts to reduce irreversibility to the subject side of the physical aspect of reality have failed (7.5). ‘The one great law of irreversibility (the Second Law) cannot be explained from the reversible laws of elementary particle mechanics’.[52]

The thermodynamic temperature scale

The concept of a Carnot cycle forms the basis of the thermodynamic temperature scale. William Thomson, since 1892 Lord Kelvin, defined its metric by equating the ratio of the cycle’s high and low temperatures to the ratio of the corresponding exchanged amounts of heat. It was soon proved that this scale can be made to coincide with the ideal-gas thermometer, based on the law of Boyle and Gay-Lussac. Later on the unit of this scale got Kelvin’s name. This theoretical temperature scale is independent of the specific properties of mercury in a mercury thermometer; of the actual gas in a gas thermometer; or of a metal or semiconductor in an electrical thermometer.

The importance of Thomson’s definition can hardly be exaggerated. The Celsius and Fahrenheit scales depend on the arbitrary assumption that mercury expands linearly on heating. In experiments on gases, Guillaume Amontons (circa 1700), Jacques Charles, and Louis Gay-Lussac (circa 1800) had found that many gases expand nearly linearly (if compared with the mercury scale), all having the same thermal expansion coefficient.[53] This suggested the construction of an ideal-gas thermometer scale, an ideal gas being defined as satisfying Boyle’s law, with a temperature-independent thermal expansion coefficient. This was a conceptual improvement, for it was supposed to be determined by the properties of caloric alone, but this advantage disappeared after the decline of the caloric theory. Thomson derived his thermodynamic scale entirely independent of the properties of any substance. It has the same universal validity as the laws of thermodynamics from which it is derived. Therefore it was called absolute, like absolute time and space serving as a standard for practical instruments (3.7).

Thermodynamic forces and currents

A thermodynamic force is not related to mechanical acceleration, but it drives a current. A temperature difference (or gradient, in three dimensions) drives a thermal current; an electrical potential difference causes an electric current; and a concentration difference drives the flow of a chemical substance. This idea could be fruitfully applied in what came to be known as physical chemistry. In each current, entropy is created, making the current irreversible. (A current in a superconductor is a boundary case. In a closed superconducting circuit without a source, an electric current may persist indefinitely, whereas a normal current would die out very fast.)

In a system in which currents occur, entropy increases. Only if a system as a whole is in equilibrium, there are no net currents and the entropy is constant.

The further development of thermodynamics (by Rudolf Clausius, William Thomson, Max Planck,[54] Joshua Gibbs,[55] and others) has been of great importance to physics and even more to physical chemistry.[56] It led to the discovery of several general relations independent of the specific structure of the substances concerned.[57] Like mechanical forces are able to balance each other, so do thermodynamic forces and currents. This leads to mutual relations like thermoelectricity, the phenomenon that a heat current balances an electric current in the Seebeck-effect and in its reverse, the Peltier-effect (7.2). Relations between various types of currents are subject to a symmetry relation discovered by William Thomson and generalized by Lars Onsager (1931).

Thermodynamics was neither the fruit of mechanical nor of experimental philosophy, even if both Clausius and Thomson were mechanists and theoreticians. It was inspired by the investigation of steam machines, not by the romantic Naturphilosophie.

Energy, force and current as unifying concepts

The law of conservation of energy did not confirm the romantic idea of the unity of all natural forces. Like the Newtonian force, energy is an abstract concept, only fruitful because it can be specified. Yet energy, force, and also current are unifying concepts.

Gravitational, elastic, electric, and magnetic forces each have their own character, but they can be compared to each other, because forces of different kinds, acting on the same object, are able to balance each other. By accepting one force as a standard, the others can be measured. Forces are commensurable.

A force without further specification does not exist, and the same applies to energy. Many kinds of energy are known, such as kinetic, electric, thermal, gravitational, and chemical energy. These can be transformed into each other, meaning that like the forces, energies are commensurable. Accepting one form of energy as a standard, one can measure others. For mechanical philosophers this standard could only be mechanical work. Both force and energy should be used to integrate various fields of physical science on the basis of mechanics. For experimental philosophy the measurability of force and energy was more important, and the choice of a standard was determined by practical considerations like accuracy and reproducibility.

Besides force and energy, the mutually related concepts of entropy and current, developed in thermodynamics, may be considered unifying, because they are commensurable. Entropy, thermodynamic forces, and currents are less easy to reduce to mechanics than energy and force, in particular because mechanics is supposed to be reversible, symmetric with respect to kinetic time.

The introduction of the universal concept of energy, its law of conservation, its convertibility, and its commensurability led to a new abstraction, giving rise to a new answer to the question of how physical systems can interact. But it did not give rise to a new view on the specific character of electricity, magne­tism, heat, or chemical affini­ty. Thermodyna­mics, the most consequent elaboration of the new conservation law, is also the most abstract theory of the nineteenth century, just because it is independent of any theory about the detailed structure of matter. It did not provide a solution for the romantic attempt to replace the abstract Newtonian concept of a mathematical force by a concrete, general principle of interaction.

Energeticism

For various scientists, Helmholtz’s mechanist inter­pretation of the law of conservation of energy was not acceptable. Ernst Mach, Heinrich Hertz, and Friedrich Ostwald in Germany; Pierre Duhem in France; and William Rankine, William Thomson, and Peter Tait in Scotland were of the opinion that the Newtonian force, determined by the laws of motion, is not the most important expression of physical interaction.[58] They proposed to consider the new law of conservation of energy the constitutional law of physics and chemistry. The energeticists, as they were called, stressed thermodyna­mics to be independent of mechanics and of the atomic hypothesis. They considered the principles of Carnot, Thomson, and Clausius (different expressions of the second law) as empirical generalisations, testable by experiments independent of metaphysical suppositions, like the atomic hypothesis at that time was. But they overlooked that thermodynamics, because of its universality, was not fit to explain the specific properties of matter, the properties distinguishing one substance from another one. For this the atomic hypo­thesis turned out to be indispensable.

To consider mechanics as the foundation of physics Ernst Mach called a prejudgment. He argued that the law of conservation of energy is independent of any mechanical world view.[59] Mach laid the foundation of energe­ticism, but he never became one of its convinced adherents, because he valued his views on sensationa­lism higher.[60] Mach considered science to be an economic ordering of sensorial impressions. Therefore he distrusted any concept that was not directly based on observations. For instance, he tried to prove that Newton’s force could be defined in terms of observable kinetic magnitudes like velocity and acceleration.[61] Heinrich Hertz (who was mainly interested in a logical analysis of mechanics) followed him by designing a theoretical mechanics exclusively based on the fundamen­tal concepts of space, time, and mass.[62] They defined the Newtonian impressed force operationally as the product of mass and acceleration, but they failed to make clear how to distinguish electric, magnetic, gravitational, and other types of interaction only based on mechanical principles. Mach related the law of conservation of energy to the economy of thought, the attempt to define concepts such that a minimum number is sufficient.[63] He rejected atomism without reservation, because atoms lacked any contact with observable reality. Until his death in 1916, he considered the atomic hypothesis to be metaphys­ical and entirely superfluous.[64]

Wilhelm Ostwald was the most avowed energe­ticist. As a physical chemist he was more attracted to thermodynamics than to mechanics, and he rejected Boltzmann’s theories explaining the thermodyna­mic laws from interactions between atoms, though he publicly accepted the reality of atoms and molecules in 1909.

The decline of the matter-force dualism

Initially the law of conservation of energy appeared to confirm the idea of the unity of all natural forces. Because this view was mostly propagated by the speculative Naturphilo­sophen, the new law was met with suspicion. Mayer, Joule, and Helmholtz experienced trouble to publish their views. These were only accepted when it became clear that they were not at variance with Newtonian mechanics.

It would be equally mistaken to interpret the law of conservation of energy to be a triumph of mechanism. Helmholtz’ derivation of the new law from atomic principles was highly programmatic. The law can be demonstrated from Newton’s laws of motion assuming that only central forces acting between atoms occur. But in 1850 it was entirely unknown whether this assumption was applicable to forces determining chemical or thermal processes, friction, inelastic collisions, electromagnetic interaction, or the emission and absorption of light. On the contrary, the force of friction, the magnetic field caused by an electric current, and the electric field induced by a changing magnetic field were impossible to derive as a central force with a uniquely definable potential. Helmholtz met with difficulties when he tried to apply his views to electromag­netism.

For the Newtonians, conservation laws were either connected to some indestructible substance (for instance the law of conservation of charge) or to a kinetic property of material things (like the laws of conservation of linear and angular momentum). The new law deprived the Newtonians of the conservation law of caloric, undermining their fluid theories. Clearly, energy was not a substance. As kinetic or potential energy it could be interpreted as a spatial or a kinetic property of material bodies. But the later development of the physical concept of a field proved that energy could exist apart from matter.

As a consequence Newton’s third law of motion lost it universal validity. This law could only be applied to situations in which no energy is exchanged between interacting objects. In Helm­holtz’ derivation of the law of conservation of energy Newton’s third law was the starting point. If it turned out to be impossible to derive this conservation law from Newton’s third law, one would rather reject the latter than the new law of conservation of energy. In this way the duality of matter and force became unsettled. Energy belongs neither to matter nor to force.

Gradually one arrived at a new dualism, the dualism of matter and field. Matter became specified as chemical atoms and molecules. The idea of a physical field went back to the Cartesian aversion of action at a distance and of empty space, but it took distance from Descartes’ identification of matter and space. This new dualism arrived at problems circa 1900, when physicists tried to understand the interaction between matter and field (6.5). The dualism could be maintained as long as matter and fields were studied apart.

7.5. Atomism in experimental philosophy

and in mechanism

About 1850 the law of conservation of energy was accepted, and the caloric theory, considering heat to be an independent weightless chemical element, was abandoned. This made room for mechanical philosophers to pursue the idea of heat as caused by molecular motion. Earlier, several adherents of kinetic theories (Bernoulli, 1738; Herapath, 1820; Joule, 1843; Water­ston, 1845) did not succeed in convincing the Newtonian majority, and their views, were ignored.[65]  Initially, physicists also ignored the rising atomic chemistry. Since about 1800, John Dalton connected the concepts of elements and atoms, supposing that all atoms of an element are equal to each other and unchangeable, having the same mass and the same chemical properties. He proposed that all molecules of a chemical compound are composed of atoms in the same characteristic way. In chemical processes, molecules change, whereas the atoms remain the same. Jöns Jacob Berzelius accepted Dalton’s theory even before the latter’s book was published (1808). With an amazing accuracy, he determined the relative atomic weights (relative to oxygen, because there are many compounds containing this element) of 45 out of 49 elements then known. Berzelius found for lead 207.4 (modern value: 207.2), for chlorine 35.47 (35.46), and for nitrogen 14.18 (14.01).[66] In 1820 he had established the chemical composition of no less than 2000 compounds. Several of these turned out to be wrong, but his achievement laid the basis for later improvements. Among these was the law of Dulong and Petit (1819) relating the specific heat of solid elements to their atomic weight.[67] Some elements, in particular diamond, did not confirm this law. It was corrected by Einstein (1907) and Debye (1912).

The atomic theory lost adherence because of a conflict between the theories of John Dalton and of Louis Gay-Lussac. The first based his atom theory on the supposition that equal masses of the chemical elements interact with each other, whereas Gay-Lussac believed to have discovered in 1806 that equal volumes of gases form the basis of chemical reactions. Both statements were much less secure than they are now, because of the inaccuracy of their measurements.

Accepting a hypothesis due to Amedeo Avogadro (1811, in 1814 independently formulated by Ampère) could have solved this contradiction. Avogadro suggested that equal volumes of gases at the same temperature contain the same number of molecu­les, irrespective of the nature of the gas. This hypothesis overtaxed the imagination of his contemporaries.[68] It would lead inevitably to the existence of two-atomic molecules like H(hydrogen), O2 (oxygen), and N2 (nitrogen), which did not fit in the frame of thought of Dalton and Berzelius. Atoms had been introduced to form the smallest parts of an element, and it was by no means clear why a gas like hydrogen should have two smallest parts, H and H2.

The existence of atoms became a fruitful assumption enabling to explain and predict many kinds of phenomena. An atom was still believed to be indivisible and elastic, but it was no longer the smallest amount of a chemically pure substance, which became a molecule. Between 1830 and 1860 many chemists and physicists doubted the reality of these atoms, although they all applied the atomic theory to analyse the composition of chemical compounds. The atomic hypothesis was applied by all chemists, and defended by almost none. Rocke states that it is a ‘… myth, as prevalent today as it was in the nineteenth century, that there existed a nonatomic chemistry which formed a viable alternative to the Daltonian system.’[69] Nevertheless, almost without exception chemists refused to defend the atomic hypothesis. Still in 1869, Alexander Williamson stated: ‘I think I am not overstating the case when I say that, on the one hand, all chemists use the atomic theory, and that, on the other hand, a considerable number of them view it with mistrust, some with positive dislike.’[70] This paradox deserves to be explained.

At the time, the hypothesis had two aspects. First, it was the ancient idea of indivisible, indestructible, infinitely hard yet completely elastic smallest parts of matter. Many scientists found this mechanical idea useless and superfluous. Like Mach, Ostwald (9.2), and other instrumentalists, they considered it a metaphysical assumption, which should find no place in experimental science.[71]

The second aspect was Dalton’s idea of atoms as carriers of quantitative properties: their mass and their propensity to form molecules in fixed proportions. Even the most convinced adversaries of the atomic hypothesis applied this second idea. After Dalton, no chemist could be taken seriously without it. Acting as experimental philosophers, they were mostly interested in measurable quantities, and in technical methods to do measurements, for instance of atomic weights. Nevertheless most chemists were sceptical about the real existence of atoms.

Like Lavoisier, Dalton adhered to experimental philosophy. His theory did not start from the mechanist dualism of matter and motion, because his atoms did not move, but from the dualism of matter and force, for his atoms and molecules interacted with each other. Only after 1860, when Dalton’s static model was replaced by the dynamic models of Clausius and Max­well, scientists started to consider the atoms and molecules realistically. Physicists developed a mechanical atomic theory, leading to experiments on gases, and attempts to reduce the laws of thermodynamics to mechanics.

Clausius

The first physicist having success with a kinetic theory was Rudolf Clausi­us.[72] His program is expressed in the titles of two papers: ‘Über die bewe­gende Kraft der Wärme’ (1850: On the moving force of heat), and ‘Über die Art der Bewegung, welche wir Wärme nennen’ (1857: On the kind of motion we call heat).[73] According to Clausius heat was not a weightless substance, but an effect of the motion of material particles.

Clausius investigated the model of a perfect gas, consisting of point-like particles moving in a void that do not exert forces on each other except at very short distances, when they collide elastically. Applying the law of conservation of linear momentum, Clausius proved that this gas has the same properties as an ideal gas, meaning a gas that perfectly satisfies both Boyle’s law and Gay-Lussac’s law.

(Boyle’s law states that for a given amount of gas at constant temperature, its pressure is inversely proportional to its volume. Gay-Lussac’s or Charles’ law states that for a given amount and volume of a gas, the temperature plus a constant (depending on the chosen temperature scale) is proportional to its pressure. Taking this constant to be zero defines the gas temperature scale, later to be identified with the absolute or thermodynamic temperature scale. Together, these laws state that the product of pressure (p) and volume (V) is proportional to the absolute temperature (T). In modern terms: pV=nRT, where n is the amount of gas measured in moles and R is the universal gas constant. Whereas an ideal gas is an idealization of real gases, a perfect gas is a model.)

Gases like hydrogen, oxygen, and nitrogen satisfy these laws in a good approximation, at moderate temperatures and pressures. Water vapour does not. Because in the century of the steam engine water vapour was considered one of the most important gases, the investigation of its properties delayed the development of the kinetic theory of gases considerably. In many respects ice, water, and steam are exceptional substances, which was not explained before the twentieth century by the assumption that a water molecule has an electric dipole moment.

The caloric theory assumed that the laws of Boyle and Gay-Lussac represented the properties of caloric, deviations to be explained from the chemical properties of the atoms in the gas. By identifying a perfect gas with an ideal one, Clausius made the caloric hypothesis superfluous, but the deviations from the perfect gas required the same explanation as before. He could also explain Dalton’s law of partial pressures for a mixture of ideal gases. (According to this law, the pressure of a gaseous mixture is the sum of the pressures each of the components would exert partially.) However, he did meet with criticism.

The first critical remark was made by Christophorus Buys-Ballot. The mean speed of the molecules as could be derived from Clausius’ theory, several hundreds of metres per second at room temperature, is much too high to explain the relatively low speed of diffusion of one gas into another. To meet this objection, Clausius introduced the concept of mean free path.[74] He assumed that between two collisions a molecule moves with a constant velocity about a not very large distance, and after each collision the direction of motion is arbitrary. He supplied a satisfactory estimate of the order of magnitude of the mean free path.

The second point concerns the ratio of the specific heat of a gas at constant pressure, cp, and at constant volume, cv(7.3). For this ratio Clausius calculated a value of 1.67, whereas each gas known at the time gave a value of 1.4 or less. This anomaly in Clausius’ theory was solved later by Ludwig Boltzmann, explaining the value for cp/cv=1.40 by assuming that the molecules concerned were diatomic. (Monoatomic molecules can only have kinetic energy. Boltzmann assumed that the atoms in a diatomic molecule can also have rotational energy about two axes perpendicular to the connecting line of the two atoms. Maxwell was not satisfied with Boltzmann’s explanation, for it neglected vibrational energy along the connection.[75] The theoretical explanation of the existence of diatomic molecules and the corresponding specific heat ratio had to wait for the development of quantum physics after 1925.)

For gases with monatomic molecu­les (like mercury vapour or the noble gases) one found later the experimental value of 1.67. Hence, Boltzmann both removed an objection against the kinetic theory of gases and he sustained the acceptance of Amedeo Avogadro’s hypothesis.

Van der Waals

The third objection was that Clausius’ model only applied to permanent gases, not to liquids or solids. It could not explain the phase transition of a gas into a liquid. In 1873 Johannes van der Waals modified the perfect gas model by assuming that each molecule has a finite volume and that molecules attract each other at a large distance, whereas at a short distance they collide, repelling each other. This model could explain that real gases deviate from the ideal gas. It described the phase transition between a vapour and a liquid, but not the transition between a liquid and a solid. It explained the specific properties near the so-called critical point, the highest temperature at which a liquid can be in equilibrium with its vapour. Only below this temperature, a gas can be liquefied.

Van der Waals’ state equation contains two constants which values differ for different gases and can only be determined experimentally. The first constant refers to the mutual attraction of the molecules, explaining the surface tension between a vapour and a liquid. The second constant refers to the volume of the molecules themselves and accounts for the fact that a liquid has a specific density and is virtually incompressible. After the two constants are experimentally determined it is possible to predict at which temperature and pressure a gas can be liquefied.

The fact that Van der Waals’ equation contains only two empirical constants different for different gases led to the law of corresponding states. The similarity of the behaviour of different gases allowed of calculating the value of the critical temperature from measurements at relatively high temperatures. This law became an important lead to successful attempts at liquefying gases like air (1877), nitrogen and oxygen separately (1883), hydrogen (1898), and finally helium (1908).

Maxwell’s distribution law

Clausius derived the average speed of molecules in a gas, but he could not calculate how many molecules would have a given speed. In 1860 James Clerk Maxwell found the distribution law named after him for molecular speeds in a gas in thermal equilibrium.[76] His proof, derived from probability calculus, dazed his contemporaries because of the elegant application of the symmetry of the system. Maxwell observed that the distribution of the molecular speeds in a gas (if external forces are neglected) in the xy, and z-directions must be independent of each other, and that the distribution can only depend on the speed (the absolute value of the velocity). Only an exponential function satisfies these requirements, its exponent being the square of the speed times a constant. Maxwell derived a formula for the pressure that the gas should exert if it has a given volume. By relating this formula to the laws of Boyle and Gay-Lussac, he demonstrated that the constant is inversely proportional to the temperature. Moreover, Maxwell could prove Avogadro’s law. For technical reasons an experimental confirmation of Maxwell’s distribution law had to wait for more than sixty years.

Maxwell applied Clausius’ concept of the mean free path to determine the coeffi­cien­ts of internal friction (visco­sity), diffusion, and heat conductivity, respectively transferring momentum, mass, and energy. As an unexpected and counterintuitive result he found the viscosity to be independent of the density of the gas, as he confirmed experimentally shortly afterwards. However, the calculated value for the viscosity differed considerably from the measured one.

Clausius’ model predicted that a gas would be perfect at all temperatures. He observed that the mean kinetic energy of the molecules is proportional to the absolute temperature. At T=0 the kinetic energy of the molecules would be zero, hence the molecules would no longer move, and a temperature below zero would be impossible. In Maxwell’s distribution formula, the kinetic energy of a molecule occurs divided by kT. Shortly later, Ludwig Boltzmann (after whom the universal constant k is called) added the potential energy of the gravitational field, allowing of finding an expression for the atmospheric pressure as depending on height. It turned Maxwell’s law for the distribution of molecular speeds into the Maxwell-Boltzmann distribution law for molecular energies.

One more striking result of classical statistical physics was Albert Einstein’s explanation of Brownian motion.[77] In 1827 Robert Brown had discovered that microscopically small particles like pollen in a gas or liquid move spontaneously but irregularly. In 1905, Einstein explained this from random collisions with invisible molecules. Applying his theory, Jean Perrin determined experimentally Avogadro’s number (the number of molecules in a standard amount of gas).[78] This combined theoretical and experi­menta­l result convinced the majority of scientists of the reality of atoms. Between 1900 and 1910 Planck (6.5) and Einstein invented various methods of estimating Avogadro’s number, and the results agreed satisfactorily with each other. Determining the number of molecules in a given amount of gas, Avogadro’s number made it possible to calculate the individual mass of atoms and molecules. Applied to a liquid it allowed of an estimate of the molecules’ size.

Mechanical philosophy

The kinetic theory was regarded a triumph of the mechanical philosophy, although it was not entirely successful.[79] Clausius’ kinetic theory was unmistakably Cartesian. Avoiding action at a distance, his only kind of interaction was elastic impact. As properties of the molecules he exclusively considered their extension (as applied in the mean free path concept), mass, and momentum. Newtonian forces were suspiciously absent. Van der Waals’ first correction remained within this Cartesian frame, but the second assumed action at a distance between the molecules.

In the kinetic theory of gases, heat is considered the unordered kinetic energy of molecules; work is some ordered kind of energy; and temperature is a measure of the meankinetic energy of the molecules. In this way the mechanical foundation of the first law of thermodynamics was found, at least for the model of a perfect gas.

Between 1870 and 1900, Ludwig Boltzmann was the most important and most controversial investigator of molecular physics.[80] In 1872 he introduced an undetermined collision probability for any pair of molecules in a gas. He derived an equation describing in general terms the temporal development of a system consisting of many interacting molecules. Assuming that any system proceeds irreversibly from a low-probability state to a high-probability state, he demonstrated the existence of a quantity that will almost always decrease in time until it reaches an equilibrium state. He connected this quantity with Clausius’ thermodynamic entropy. Whereas in thermodynamics the equilibrium state is the final unchanging stationary state in any process in a closed system, in statistical mechanics the equilibrium state is the most probable state.

The mechanists’ claim that thermodynamics could be reduced to mechanics was soon criticized.[81] William Thomson (1874) and Joseph Loschmidt (1876-1877) pointed out that any mechanical system is reversible, and any molecular state must be considered equally probable as its time-reversed state, that is, the same state with all molecules moving in the opposite direction. If some state gives rise to increasing entropy, time-reversal means that there is an equally probable state with decreasing entropy, contrary to the second law. Henri Poincaré (1890) and Ernst Zermelo (1896) observed that in an isolated mechanical system any mechanical state would sooner or later recur. Moreover, Boltzmann’s derivation depended on some restrictive properties of molecules, and therefore lacked the universal validity of the second law of thermodynamics.

Boltzmann attempted to bridge reversible motion and irreversible processes by means of the concepts of probability and randomness. In order to achieve the intended results, he had to assume that the realization of chances is irreversible. Boltzmann admitted his theory not to be completely mechanical, having an irreducible statistical character, as Maxwell had observed already in 1870. Boltzmann had to introduce a kind of molecular disorder as an independent axiom, and the method of counting microscopic states contained some arbitrariness.

After the rise of quantum physics, this arbitrariness allowed of the existence of different statistics dependent on the typical properties ascribed to the relevant particles. Besides the classical Maxwell-Boltzmann distribution, these are the quantumphysical Fermi-Dirac and Bose-Einstein distributions. This means that statistical mechanics cannot completely account for the thermodynamic laws. At most it can be shown that they do not contradict each other. Statistical theories always suppose some properties of the character of the objects dealt with. For instance, the molecules of the gas must all be alike. Thermodynamics’ independence of the typical properties of matter is both its strength and its weakness. Hence, the statistical approach is complementary to the thermodynamic one. Statistical mechanics is less universal than thermodynamics, but is better equipped to cope with the typical structure of matter. Meanwhile, acceptance of Boltzmann’ statistical approach implied the recognition that the laws of thermodynamics are only approximately true.

Clausius was a mechanist and determinist who never accepted Maxwell’s and Boltzmann’s statistical approach.[82] Determinist scientists and philosophers tried to save mechanism by assuming that statistics was only introduced because of the lack of knowledge of the detailed motion of the molecules. It became gradually clear, however, that statistical mechanics is intrinsically stochastic, a view reinforced by Einstein’s treatment of Brownian motion, by radioactivity, and ultimately by quantum physics. 

By the end of the nineteenth century, natural scientists accepted the existence of random events as a necessary complement of natural laws for the explanation of physical, chemical and biological phenomena.[83] 


[1] Koyré 1965, chapter 1; Dijksterhuis 1950, 509-510.

[2] Duhem 1906, 190-195; Popper 1972, 197ff; 1983, 139-144, 148. Hempel 1965, 344; Feyerabend 1965, 168; 1975, 35-36; Cohen 1974; 1980, chapter 5; Finocchiaro 1973, 180-188, 196-198; Brown 1977, 60-66.

[3] Popper 1972, 16, 198-200, 357; see Newton 1687, 55.

[4] Newton 1687, 21-22, 411-414.

[5] Newton 1687, 405, 422. See Glymour 1980, 222-224; Laudan 1977, 24.

[6] Galileo 1638, 251.

[7] Bohr 1925, reprinted in Bohr 1934, chapter I; Jammer 1966, section 3.2.

[8] Cardwell 1989, 192; 314 (note 24).

[9] Caneva 2005, 190-221.

[10] Whittaker 1910, 236-239.

[11] Katzir 2003; 2004.

[12] Wolf 1938, I, 231.

[13] Cropper 1988;  Cardwell 1989, chapter 8.

[14] Faraday 1839-55, II, 127-162; Berkson 1974, 39-45.

[15] Whittaker 1910, 170-197; Gillispie 1960, 435-458; Hesse 1961, 198-206; Williams 1965; Tricker 1966; Agassi 1971; Meyer 1971, 52-61; Berkson 1974, 16-125; Gooding 1978; 1981; Cantor 1991.

[16] Agassi 1971, 28-29.

[17] Faraday 1839-55.

[18] Faraday 1839-55, I, 1-41.

[19] Faraday 1839-55, III, 19,20.

<a title=”” href=”file:///C:/Users/m.d.stafleu/Documents/Stafleu%20Verzameld%20werk/2014%20Theory%20and%20experimenSave

Blog at WordPress.com.