Học lô-gíc cơ bản với Tiến sĩ Gowers, giải thưởng Fields 1998

Lô-gíc học gắn liền với các suy luận liên quan đến

  • và,
  • hoặc;

từ hay cụm từ liên hệ như

  • nhưng,
  • bất cứ khi nào,
  • trong trường hợp,
  • vì thế,
  • kết quả là,
  • nhiều đến nỗi mà,
  • với mọi,
  • tồn tại.

Người học tất cả các ngành khoc học đều phải học qua môn lô-gíc học. Lô-gíc được sử dụng trong tất cả các ngành khoa khọc, kỹ thuật và cả trong giao tiếp hằng ngày. Một người nắm chắc các nguyên tắc lô-gíc thường nói dễ nghe và viết dễ đọc hơn người không nắm.

Việc nắm vững lô-gíc cơ bản sẽ thuận lợi trong quá trình học vànghiên cứu Toán học. Đặt biệt giảng viên phụ trách môn phương pháp giảng dạy Toán học, phân tích chương trình Toán phổ thông cần nắm chắc các nguyên lí suy luận lô-gíc.

Loạt bài viết sau của Gowers về vấn đề này rất thú vị:

Basic logic — connectives — AND and OR

By Gowers

In my introductory post I talked about fake difficulties. It will take some time and several more posts before I can say what I really mean by that notion, but this post will get me a bit closer. So far I have mentioned that if you can’t solve a problem because you haven’t been bothered to look up what the words mean, then your difficulties are not genuine. A more interesting category of fake difficulties is a failure to grasp a few basic logical principles. I call this a fake difficulty even though for some people it is genuinely difficult; the reason I do so is that when mathematicians consider a problem to be hard, it is not for basic logical reasons. To put that another way, with a little bit of practice one can make basic logical deductions completely mechanically, and it is absolutely essential to learn how to do so. It is a simple skill (which is not to say that no work is needed), and it underlies all mathematical reasoning. Trying to understand university-level mathematics without a secure grasp of basic logic is like trying to learn long multiplication without knowing your tables — only a lot harder.

What am I talking about when I use the phrase “basic logic”? I am talking about having a good understanding of the following.

Logical connectives. The main ones are “and”, “or”, “not” and “implies”.

Quantifiers. The phrases “for all” and “there exists” come up a lot in mathematics and you have to be capable of dealing with sentences like this: for every \epsilon there exists N such that for every n\geq N, |a_n-a|<\epsilon.

Relationships between statements. Given a statement, you should have no trouble forming its negation, its converse and its contrapositive. Of course, for that you need to know what those three things are.

Each of the items above is a potential source of confusion. In particular, quantifiers take a lot of getting used to. However, when you are used to basic logical principles you will find that in many situations you can apply them in a purely mechanical way, sometimes without even needing to understand the statements to which you are applying them. In the next few posts I will try to help you develop that ability if you don’t already have it.

These posts, all with titles of the form “Basic logic —,” are not intended to be typical of the posts in this series, since they are introductory and not tied to any particular course. But basic logic is something that I have to get out of the way before we move on to the more interesting stuff. So let me now discuss a couple of logical connectives in more detail. I’ll save “not”, “implies”, quantifiers, and sentence relationships for later posts.

If you think your basic logic is not quite 100% secure, then I recommend that you read through all the posts that I shall put up on the topic. If put together, they are quite long, and therefore you probably won’t want to reread them too often, so I intend to write a brief summary of the main points, for easy reference later. But that summary will just give you the basic facts, without the justifications and explanations that will be presented in the posts.

AND and OR.

It may seem at first as though there is not much to say about the words “and” and “or”. After all, are they not just common English words that mathematicians happen to use as well? In a sense they are, but their meanings are subtly different from their usual everyday meanings, and it is important to understand the differences.

For the purposes of this discussion I am going to do something that you may not have seen before. You will be very familiar with the idea that letters can be substituted for numbers. But at university, letters (or other symbols) get substituted for absolutely everything. Here I am going to use letters to stand for statements. For some reason the traditional letters to use seem to be p and q, or sometimes P and Q.


How might we explain what “and” means? It seems difficult to do so without being circular and saying something like this: “P and Q” is true if and only if P is true and Q is true. I appear to have defined “and” using the word “and”. If I substitute actual sentences for P and Q I get something like this: “London is in England and Paris is in France” is true if and only if “London is in England” is true and “Paris is in France” is true, which is a bit of an odd observation to make.

And yet somehow it does tell me something. It tells me that if I want to establish the truth of the compound sentence “London is in England and Paris is in France” then it is enough to establish that London is in England and also to establish (possibly by a completely different method) that Paris is in France.

However, that observation is not enough to stop the example feeling odd. In my opinion, one gets a much better idea of the role of logical connectives in mathematics by considering statements with parameters. I’m talking here about statements like, “n is prime”, where all you know about n is that it is a positive integer. The statement “London is in England” is either true or false (and happens to be true). The statement “n is prime” involves the parameter n, and is true for some values of n, such as 13 and 17, and false for others, such as 14, 15 and 16.

The great advantage of thinking about statements with parameters is that their truth value is not fixed, but depends on the parameters. That this is an advantage will become particularly clear when we come to think about the word “implies”. But for now let us return to the word “and” and consider the statement, “n is prime and n is even.” For this to be true, we need n to be both a prime number and an even number, which tells us that n=2. For all other values of n it is false. For example, if n=3, then n is prime but n is not even, and because n is not even we cannot say “n is prime and n is even.” Similarly, if n=4 then the statement is false because n is not prime.

Here is another example that is perhaps even more typical of how mathematics uses the word “and”. One of the topics in the Numbers and Sets course is, needless to say, sets, and one of the definitions associated with sets is that of the intersection of two sets (which you may have met at school and represented in Venn diagrams). If A and B are two sets, then their intersection A\cap B is defined to be the set of all objects x such that x belongs to A and x belongs to B. I am trying to avoid using symbols here, but for those who have seen them, the definition would normally be written like this:

A\cap B=\{x: x\in A and x\in B\}.

Another way of expressing the same idea would be this: x belongs to A\cap B if and only if x belongs to A and x belongs to B.

What does this tell us? Well, when we try to solve a problem, we usually have a statement that we are trying to prove, and some statements that for one reason or another we know to be true. Suppose that one of the statements we know to be true is that x belongs to the intersection A\cap B, or in symbols x\in A\cap B. Then the meaning of the word “and” tells us that we can replace this information by the two bits of information that x belongs to A and that x belongs to B. That is, if we were to keep a list of statements that we knew to be true, and on that list we had

x belongs to A\cap B.

we would know that we could replace that list item by the two items

x belongs to A.
x belongs to B.

This might be a very useful thing to do: for example, perhaps the information that x belongs to A is used in one place in the argument and the information that x belongs to B is used in another place.

I still haven’t really demonstrated an interesting difference between the mathematical use of the word “and” and the everyday use. But here’s one. In everyday usage the following sentence makes good sense: “Jack and Jill went up the hill.” But in mathematics, at least when it is written formally, the word “and” connects statements (which is why it is called a logical connective) rather than connecting objects. So a mathematician would feel an urge to rewrite the well-known nursery rhyme in the following rather less catchy way: “Jack went up the hill and Jill went up the hill.” That’s not to say that you never see a sentence like “x and y are positive”, but you should be very aware that that is a different use of the word and should think of it as a useful shorthand for “x is positive and y is positive”.


The first thing to say about the word “or” as it is used in mathematics is that it is always the inclusive or rather than the exclusive or. In abstract terms, the statement “P or Q” is true if and only if P is true or Q is true or both. As with “and” one can illustrate this in a rather unilluminating way with examples such as the following.

London is in England or Paris is in France.
London is in England or Paris is in Pakistan.
London is in Scotland or Paris is in France.
London is in Scotland or Paris is in Pakistan.

Of those four sentences, the first three are true (because in each case at least one of the two constituent parts is true) and the fourth is false (because neither constituent part is true).

But again the real use of the word “or” is much better illustrated by statements with parameters. For instance, the statement “n is prime or n is of the form 4m+1″ is true if at least one of “n is prime” and “n is of the form 4m+1″ is true. Let’s list the first few values of n for which this statement is true:


For all the other positive integers up to 60 it is false (unless I’ve made a slip). Had I been using the exclusive or (so called because the two statements “exclude” each other) then I would not have included 5, 13, 17, 29, 37, 41 or 53 because those number are both prime and of the form 4m+1. But I was following conventional mathematical practice and using the inclusive or, so I did include them.

Perhaps the single most important thing you need to know about “or” is how to respond to it if it comes up in proofs. For example, suppose you find that amongst the list of statements you know a statement of the form “P or Q”. In other words, you know that either P is true or Q is true (or possibly both). In most cases, P and Q will be statements with parameters: an example might be “n=2 or n is odd”, which would follow if you knew that n was prime.

If you know that “P and Q” is true, then you can add both P and Q to your list of things that you know and can use. But if all you know is that “P or Q” is true, then your task is more difficult. This time what you have to do is split your argument into cases. If you are given that “P or Q” is true, and you want to deduce R, then you must show that you can deduce R if you assume P (and whatever else you know) and that you can also deduce R if you assume Q (and whatever else you know). That way, since at least one of P and Q is guaranteed to be true, R must be true. So in a proof you might write something like this: “First let us assume that P. Then … blah blah … so R follows. Now let us assume that Q. Then … blah blah … and again R follows. Therefore the result is proved.”

What I mean when I say that you need to be on top of basic logic is that you should get to the stage where reacting to a statement of the form “P or Q” in the way I have just described is a pure reflex: you see that you know that either P or Q, so you just automatically go into “If P, then … and if Q, then …” mode.

What about if the statement “P or Q” is the statement you are trying to prove (as opposed to what I have just been discussing, when “P or Q” is one of the statements you are trying to use to prove something else)?

If P and Q are fixed statements, then you can prove “P or Q” by picking one of P and Q and proving that. For instance, if I want to prove the statement “There are infinitely many primes or every group of prime order is cyclic” then I could simply prove that there are infinitely many primes and have done with it. But that is not the situation that usually occurs. Usually when one wishes to prove “P or Q” it is a statement with parameters. For example, an important result in the Numbers and Sets course is this.

Theorem. Let p be a prime number and let a and b be positive integers. Suppose that p is a factor of ab. Then either p is a factor of a or p is a factor of b.

Before I say anything more about this, note that the result would be false if I were not using the inclusive or. For example, p could be 5, a could be 10 and b could be 15. Then p is a factor of ab (which equals 150) but it is a factor of both a and b. Part of the reason I’m mentioning this is that the word “either” suggests to many people that one is using the exclusive or, probably because in many ordinary-language contexts we use the word “either” to emphasize that the two alternatives we then go on to lay out are incompatible. An example: if I say, “I don’t know where I’m going to go on holiday next year. I think I’ll either go to the South of France or to Greece.” then it would be quite reasonable of you to interpret me as ruling out going to both places. But in mathematics the word either does not have this implication. If it sounds to you as though it does, then you need to get into the habit of ignoring the word whenever you see it.

Returning to the theorem, what is the statement we are trying to prove. It is “p is a factor of a or p is a factor of b.” Now because we don’t actually know anything about p, a or b, beyond the fact that p is a prime and a and b are positive integers, we can’t just say, “Right, I’m going to prove that p is a factor of a,” or “Right, I’m going to prove that p is a factor of b.” Instead what we have to do is more like this.

1. Assume that p is not a factor of a.

2. Draw some consequence of that assumption and use it to prove that p is a factor of b.

If we can do that, then the result is proved, since either p is a factor of a or the above argument can be used to show that p is a factor of b. The consequence one uses if p is not a factor of a is that the highest common factor of p and a is 1. I won’t say any more at this stage but in a future post I shall discuss this proof in detail.

Sometimes when one wishes to prove a statement of the form “P or Q” a slightly more complicated plan of attack works. It goes something like this.

1. Attempt to prove P, and try to identify what has to happen for your proof not to work.

2. Show that if that happens, then Q must be true.

Here is a simple example: the infinite pigeonhole principle. I will state the result and its proof.

Theorem. Suppose that the positive integers are each coloured either red or blue. Then either there are infinitely many red integers or there are infinitely many blue integers.

Proof. [First comes the attempt to prove one of the two statements.] Let us attempt to find infinitely many red integers. We’ll try to do it in the simplest way imaginable: we’ll just pick a red integer n_1, then a larger red integer n_2, then a larger red integer n_3, and so on. If we succeed, then we have produced an infinite sequence of distinct red integers and we have therefore obtained the first statement.

[Next comes the process of understanding what must be the case if the proof of the first statement doesn’t work.] The only way we can fail is if for some k we find that we have identified red integers <img title=”n_1n that is bigger than n_k. (If we could find such an n, then we could have used it for n_{k+1}, so we wouldn’t have been stuck at n_k.)

[And now, having drawn a consequence of our failure to prove the first statement, we go ahead and use it to prove the second.] But in that case all the integers from n_k+1 onwards are blue, so in particular there are infinitely many blue integers. \square

The associativity of AND and OR.

There is a rule that I ought to mention briefly, even though you will either be aware of anyway or will happily use it without even thinking about it. It is the rule

  • “P or (Q or R)” is the same as “(P or Q) or R”

and the companion rule

  • “P and (Q and R)” is the same as “(P and Q) and R”.

For “P or (Q or R)” to be true, we need at least one of P and “Q or R” to be true. And for “Q or R” to be true we need at least one of Q or R to be true. Putting all that together, we find that for “P or (Q or R)” to be true we need at least one of P, Q and R to be true. And clearly the same argument shows that the same condition needs to hold for the statement “(P or Q) or R” to be true. So we just drop the brackets and write “P or Q or R” and think of this as saying that at least one of the three statements is true.

Similarly, we don’t bother with brackets for repeated ANDs, writing “P and Q and R” for the statement that all three of P, Q and R are true. And similar considerations hold for larger collections of statements as well.


Basic logic — connectives — NOT

By Gowers

I realized after writing the title of this post that it might look as though I was saying, “I’m going to discuss connectives … not!” Well, that’s not what I meant, since “not” is a connective and I’m about to discuss it.


If you don’t know how to negate a mathematical statement, then you won’t be able to do serious mathematics. It’s as simple as that. So how does the mathematical meaning of the word “not” differ from the ordinary meaning? To get an idea, let’s consider the following sentences.

“We are not amused.”

“He is not a happy man.”

“That was not a very clever thing to do.”

n is not a perfect square.”

a is an element of the set A, but it is not the largest element of A.

A is not a subset of B.”

When Queen Victoria said, “We are not amused,” it is clear that what she meant was she was distinctly unamused, and not merely that she had failed to laugh. Similarly, if I say, “He is not a happy man,” I will usually mean that he is positively unhappy rather than neutral on the contentment scale. And if I say, “That was not a very clever thing to do,” I am saying, in a polite British way, that it was a stupid thing to do. (I could perhaps avoid that interpretation by stressing the word “very”. For example, if someone had made a good but reasonably standard move in chess and I knew enough about chess to tell that — which I don’t — then I might say, “That was not a very clever thing to do — but it was pretty good, so well done.”)

In all the examples above, the word “not” takes us from one end of some scale to the other: from amused to unamused, from happy to unhappy, from clever to stupid. In mathematics, the word “not” does not have this sense. If P is a mathematical statement, then “not P” is the mathematical statement that is true precisely when P is false. That is, if P is true, then “not P” is false, and if P is false, then “not P” is true. So if you want to understand a statement of the form “not P” then you should think to yourself, “What are the exact circumstances that need to hold for P to be false?”

In the case of the statement “n is not a perfect square” this is completely straightforward. We don’t have a notion of an utterly imperfect square, so there is no possibility of misinterpretation. We just mean that it is not the case that n is a perfect square. But take the statement “a is not the largest element of the set A.” We have been told that a is an element of A. If we want to show that a is not the largest element of A, what do we have to establish? Do we need to show that a is the smallest element of A? No. All we need to do is establish that it is not the case that a is the largest element of A. The usual way to set out that objective would be to formulate it as the following statement.

  • There is some element of A that is larger than a.

If that statement is true, then a is not the largest element of A. And if that statement is false, then there is no element of A that is larger than a, and since a is an element of A that tells us that a is the largest element of A.

The third mathematical statement above was “A is not a subset of B.” Let me give two mistaken interpretations of that statement.

  • First mistaken interpretation: B is a subset of A.

This is mistaken because it is perfectly possible for A to fail to be a subset of B without B being a subset of A. For example, A could be the set \{1,2,3\} and B could be the set \{2,3,4\}.

To work out what it means for A to fail to be a subset of B, let us write out more carefully what it means to say that A is a subset of B. The usual definition (written out in a slightly wordy way because I’m still trying to avoid symbols) is this.

  • A is a subset of B if every element of A is also an element of B.

By the way, I should mention here a convention that you need to know about. When mathematicians give definitions, they tend to use the word “if” where “if and only if” might seem more appropriate. For example, I might write this: “an integer n is even if there exists an integer m such that n=2m.” You might argue that this definition says nothing about what happens if no such integer m exists. Might 13 be even too? No, is the answer, because I am defining something and the convention is that you should simply understand that the words “and not otherwise” are implicit in what I’ve said, or equivalently that the “if” is really “if and only if”.

OK, if “A is a subset of B” can be translated into “Every element of A is also an element of B,” then how should we translate “A is not a subset of B”? This brings me to the second incorrect answer.

  • Second mistaken interpretation: No element of A is an element of B.

This makes the going-to-the-opposite-extreme mistake. Don’t do that. Faced with a statement of the form “not P”, think to yourself, “What needs to be true for P to be false?” Applied to this case, we ask, what needs to be true for “Every element of A is an element of B to be false?” The answer is that at least one element of A should fail to be an element of B. Or as mathematicians might normally write it,

  • There exists an element of A that is not an element of B.

Or more formally still,

  • There exists a such that a\in A and a\notin B.

As I hope you can guess if you didn’t know already, the symbol \notin means “is not an element of”. As I also hope you can guess, putting a line through a symbol usually has the force of a not. You are probably already familiar with the symbol \ne, which means “does not equal”.

There is a pair of logical principles known as de Morgan’s laws that you may well just take for granted, but that it is probably good to be consciously aware of. They concern what happens when you negate a statement that involves an “and” or an “or”. Let me illustrate them with some depressing scenes from my everyday life.

Once a year I have to renew the tax disc on my car. If I didn’t do that I would have to pay a fine, but doing it without a fuss involves being a bit more organized than I normally have it in me to be, so I find myself facing a last-minute panic. What is difficult about it? Well, I have to bring along my insurance and MOT certificates, as well as a form I am sent and a means of payment. (For non-UK readers, MOT stands for “Ministry of Transport” but it also means a test of roadworthiness that you have to have carried out once a year if your car is over three years old, which mine very definitely is.) I take all those along to the post office and can buy the new tax disc there.

Now suppose I were to arrive back from an expedition to the post office, the aim of which had been to get a new tax disc, and say to my wife, “I can’t believe it. I thought I had everything I needed, but I didn’t, and now I’m going to have to make another trip. Damn.” What could she deduce? Well, in order for me to have everything I needed, the following statement would have had to be true.

  • I had my insurance certificate and I had my MOT certificate and I had the form and I had the means to pay.

So from my failure to get a tax disc, she could deduce that the above statement was false. But what does it take for a statement like that to be false? It just takes one little slip-up. So she could deduce the following statement.

  • I didn’t have my insurance certificate or I didn’t have my MOT certificate or I didn’t have the form or I didn’t have the means to pay.

What happens here is that NOT turns AND into OR. Or to be a bit clearer about it, the statement “not (P and Q)” is the same as the statement “(not P) or (not Q)”. In the tax-disc example we had four statements linked by “and”, so the rule was a generalization of the basic de Morgan law, which told us that “not (P and Q and R and S)” was the same as “(not P) or (not Q) or (not R) or (not S)”.

As you might guess, the other de Morgan law is that NOT changes OR into AND. Suppose we vary the scenario above slightly. This time I am trying to open a bank account and I need some ID. I end up having to go back home and I say to my wife, “Damn, I didn’t manage to open the account, because they said that the only ID they would accept was a passport or a driving licence with a photo on it.” What could she deduce this time? Well, in order to open the account, I needed the following statement to be true.

  • Either I had my passport on me or I had a driving licence with photo on me.

What does it mean for that statement to be false? It means that I failed on both counts. In other words, this happened.

  • I did not have my passport on me and I did not have a driving licence with photo on me.

The more abstract rule here is that “not (P or Q)” is the same as “(not P) and (not Q)”.

A very general and possibly helpful rule of thumb applies to negation, including to several of the examples above. It’s this.

Negating something strong results in something weak, and negating something weak results in something strong.

For instance, suppose you are given two statements, which we’ll call P and Q. Then the statement “P and Q” is quite strong because it tells you that both the statements P and Q are true. By contrast, the statement “P or Q” is fairly weak because all it tells you is that one or other of P and Q is true and you don’t know which. Why do I use the words “strong” and “weak”? Well, it is easier for “P or Q” to be true than it is for “P and Q” to be true. That means that if “P and Q” is true, then I am getting a lot of information, whereas if “P or Q” is true I am getting less information. If you still don’t find that intuitively clear, then consider the statements

  • n is prime and n is even
  • n is prime or n is even

The first statement tells us that n=2, which is an extremely strong piece of information — we get to know exactly what number n is. The second statement merely tells us that n is one of the numbers 2,3,4,5,6,7,8,10,11,12,13,14,16,17,18,19,20,22,… which is giving us much less information. So “strong” basically means “tells us a lot”.

The rule of thumb is us that negating something strong — that is, pretty informative — gives us something weak — that is, not very informative — and vice versa. Therefore, de Morgan’s laws

  • “not (P and Q)” is the same as “(not P) or (not Q)”
  • “not (P or Q)” is the same as “(not P) and (not Q)”

are exactly what you would expect. The first law negates the strong statement “P and Q” and gets a weak statement “(not P) or (not Q)” and the second law negates a weak statement and gets a strong statement.

For another example, consider the statement


That, I hope you will agree, is strong information: it describes a most unusual state of affairs. So we would expect that negating it produces a very weak statement. And indeed, if I were to say,


you might well give me a funny look and ask, “Was there some reason that you expected it to be?” Note that the rule of thumb gives us a quick way of seeing that the negation of “This room is INCREDIBLY INCREDIBLY HOT” is not “This room is INCREDIBLY INCREDIBLY COLD.” After all, the second statement is also very strong, and we do not expect the negation of a strong statement to be strong.

Double negatives.

I haven’t mentioned all the basic logical laws that concern AND, OR and NOT. One important one is the rule that two NOTs cancel. If I say, “It is not the case that A is not a subset of B” then I mean that A is a subset of B. In general, “not (not P)” is the same as P.

We can actually use that to deduce the second de Morgan law from the first. Here they are again.

  • “not (P and Q)” is the same as “(not P) or (not Q)”
  • “not (P or Q)” is the same as “(not P) and (not Q)”

To deduce the second, let me begin by applying the first to “not P” and “not Q”. (What I’m doing here is just like what you are allowed to do with an identity: I am substituting in a value. In this case, the first statement holds for any statements P and Q, so I am substituting in “not P”, which is a perfectly good statement, for P and “not Q” for Q.) I get this.

  • “not(not P and not Q)” is the same as “not not P or not not Q”

I’ve decided to dispense with some brackets here. Again just as with equations, there are conventions about what to do first when you don’t see brackets. And the convention is “do all your NOTs first”. So here “not P and not Q” means “(not P) and (not Q)”. It does not mean “not (P and not Q)”.

Using the rule that two NOTs cancel, we can simplify the above law to this.

  • “not(not P and not Q)” is the same as “P or Q”

Now we can “apply NOT to both sides” (just as with equations, if two things are the same and you do the same to both then you end up with two things that are still the same). We get this.

  • “not not(not P and not Q)” is the same as “not(P or Q)”

And finally, using once again the fact that two NOTs cancel, we get to the second de Morgan law.

  • “not P and not Q” is the same as “not(P or Q)”

A philosophical digression.

It would annoy some people if I left the discussion here, because some mathematicians feel strongly, and to many other mathematicians puzzlingly, that two NOTs do not cancel. That is, they maintain that “not (not P)” is not the same statement as P. That is because these mathematicians do not believe in the law of the excluded middle. If you believe that every statement is either true or false, then you can define (as I did) “not P” to be the statement that is true precisely when P is false and false precisely when P is true. But what if a statement doesn’t have to be true or false?

I don’t recommend worrying about this, but let me try to explain why mathematicians who ask this kind of question are not mad (or not necessarily, at any rate). There are at least three reasons that one might decide, in certain contexts, that not every statement has to be true or false.

The first is when we are dealing with statements that are not completely precise. Let me illustrate this with a few ordinary English sentences.

  • Tony Blair is happy.
  • The weather is awful.
  • He was out LBW.
  • Abolishing the 50% tax rate would be unfair.
  • Democracy is better than dictatorship.

Do you want to say of each of those sentences that there must be a fact of the matter as to whether they are true or not (even if we might not know which)? Sometimes we can be pretty sure that Tony Blair is happy, but what about when he had just got up this morning and was cleaning his teeth? Was it definitely the case that one of the two statements, “Tony Blair is happy” or “Tony Blair is not happy” was true and the other false? (Here I’m interpreting the second statement as “It is not the case that Tony Blair is happy”.) A more reasonable attitude would surely be to say that being happy is a rather complex and not entirely precisely defined state of mind, so there is a bit of a grey area.

If you concede that there is this grey area, then how should you interpret the following sentence?

  • It is not the case that it is not the case that Tony Blair is happy.

Or to put it more concisely, “not not (Tony Blair is happy)”.

When P is a vague sentence like “Tony Blair is happy” then it seems to me that a reasonable interpretation of “not P” is that it is sufficiently clear that P is false for it to be possible to state that confidently. Under this interpretation, “It is not the case that Tony Blair is happy” means that he is sufficiently clearly not happy for it to be possible to say so with confidence. Then “It is not the case that it is not the case that Tony Blair is happy,” means something like “It is clear that we cannot be clear that Tony Blair is not happy,” which is not the same as saying “It is clear that Tony Blair is happy.” When we say “Tony Blair is happy” we are ruling out the grey area, but when we say “not not(Tony Blair is happy)” we are allowing it. (Why? Because if Tony Blair’s mood is not clear to us, then we clearly cannot say with confidence that he is not happy, and therefore not not(Tony Blair is happy).) Perhaps if you asked him whether he was happy he would give a small sigh and say, “Well, I’m not unhappy.”

Yes, you might say, but the great thing about mathematics is that it eliminates vagueness. So surely the above considerations are simply irrelevant to mathematics.

That is by and large true, so let us consider a second type of statement.

  • There are infinitely many 7s in the decimal expansion of \pi.
  • e+\pi is irrational.

These are both famous unsolved problems. So we don’t know whether they are true or false. Worse still, Gödel has shown us that it is at least conceivable that one of these statements (or another like it) cannot be proved or disproved. (That’s a bit of an oversimplification of what Gödel’s theorem says, for which I apologize to anyone it irritates.) So if we don’t have a proof, or even any certainty that there is a proof, what gives us such a huge confidence that these statements must have a determinate truth value? What does it mean? The problem now is not vagueness, but rather the lack of any accepted way of deciding whether or not the statement is true.

Suppose, for example, that we try to argue that even if we don’t know whether e+\pi is irrational, there must nevertheless be a fact of the matter one way or the other. We might say something like this: either there are two integers p and q sitting out there such that e+\pi=p/q or there aren’t. In principle we could just look through all the pairs of integers (p,q) and check whether they equal e+\pi. Either we would find a pair that worked, or we wouldn’t.

Hmm … what is this “in principle” doing here? We live in a finite universe, so we can’t just look through infinitely many pairs of integers. So what happens in the actual universe? We find that at any one time the best we can hope for is to have looked through just finitely many pairs (p,q). What can we conclude if none of the corresponding fractions p/q is equal to e+\pi? Precisely nothing about whether e+\pi is irrational.

Note, incidentally, that another “infinite algorithm” for solving the problem would simply be to work out the entire decimal expansion of e+\pi and then go back and see whether it is a recurring decimal or not. Again, we can’t do this algorithm in practice because we live in a finite universe.

As a result of considerations like these, some mathematicians do not agree that a statement like “e+\pi is irrational” must have a determinate truth value. So again we have a grey area, but this time the reason is not vagueness but rather the lack of a proof.

I should also make clear that most mathematicians (I think) do believe that there must be a fact of the matter one way or the other, regardless of what we can prove. I myself don’t, but I am in the minority there.

A third reason for abandoning the idea that every statement must be either true or false is to insist on stricter standards for what counts as true. If you would like some idea of what I mean by this, I refer you to the excellent comment by Andrej Bauer below, and also to the Wikipedia article on intuitionism.

[This section was rewritten in response to criticisms from Andrej Bauer and Michael Hudson-Doyle below. There is no reason to set it in stone at any point, so further criticisms are welcome if you have them.]

Generalizing de Morgan’s laws.

If you feel like doing a little exercise, you could try using de Morgan’s laws together with the associativity of addition to deduce that

  • “not(P and Q and R)” is the same as “(not P) or (not Q) or (not R).”

If you manage it, then that is a good sign: you are probably at, or well on the way to, the level of understanding of and fluency with “and”, “or” and “not” that you need to do undergraduate mathematics.


Hint: You should begin by putting the brackets back in and writing “P and (Q and R)” instead of “P and Q and R”.


Basic logic — connectives — IMPLIES

By Gowers

I have discussed how the mathematical meanings of the words “and”, “or” and “not” are not quite identical to their ordinary meanings. This is also true of the word “implies”, but rather more so. In fact, unravelling precisely what mathematicians mean by this word is a sufficiently complicated task that I have just decided to jettison an entire post on the subject and start all over again. (Roughly speaking what happened was that I wrote something, wasn’t happy with it for a number of reasons, made several fairly substantial changes, and ended up with something that simply wasn’t what I now feel like writing after having thought quite a bit more about what I want to say. The straw that broke the camel’s back was a comment by Daniel Hill in which he pointed out that “implies” wasn’t, strictly speaking, a connective at all.

I’ll mention a number of fairly subtle distinctions in this post, and you may find that you can’t hold them all in your head. If so, don’t worry about it too much, because you can afford to blur most of the distinctions. There’s just one that is particularly important, which I’ll draw attention to when we get to it.

“Implies” versus “therefore” versus “if … then”.

The three words “implies”, “therefore”, and “if … then” (OK, the third one isn’t a word exactly, but it’s not a phrase either, so I don’t know what to call it) are all connected with the idea that one thing being true makes another thing true. You may have thought of them as all pretty much interchangeable. But are they exactly the same thing?

Some indication that they aren’t quite identical comes from the grammar of the words. Consider the following three sentences.

  • If it’s 11 o’clock, then I’m supposed to be somewhere else.
  • It’s 11 o’clock implies I’m supposed to be somewhere else.
  • It’s 11 o’clock. Therefore, I’m supposed to be somewhere else.

The first one is the most natural of the three. The second doesn’t quite read like a proper English sentence (because it isn’t), and the third, though correct grammatically, somehow doesn’t quite mean the same as the first, which is partly reflected by the fact that it is two sentences rather than one. (I could have used a semicolon instead of the full stop, but a comma would not have been enough.)

Let’s deal with the difference between “Therefore” and “if … then” first. The third formulation starts with the sentence, “It’s 11 o’clock.” Therefore, it is telling us that it’s 11 o’clock. By contrast, the first formulation gives us no indication of whether or not it is 11 o’clock (except perhaps if there is a note of panic in the voice of the person saying the sentence). So we use “therefore” when we establish one fact and then want to say that another fact is a consequence of it, whereas we use “if … then” if we want to convey that the second fact is a consequence of the first without making any judgment about whether the first is true.

How about “implies”? Before I discuss that, let me talk about another distinction, between mathematics and metamathematics. The former consists of statements like “31 is a prime number” or “The angles of a triangle add up to 180″. The latter consists of statements about mathematics rather than of mathematical statements themselves. For example, if I say, “The theorem that the angles of a triangle add up to 180 was known to the Greeks,” then I’m not talking about triangles (except indirectly) but about theorems to do with triangles.

The sort of metamathematics that concerns mathematicians is the sort that discusses properties of mathematical statements (notably whether they are true) and relationships between them (such as whether one implies another). Here are a few metamathematical statements.

  • “There are infinitely many prime numbers” is true.
  • The continuum hypothesis cannot be proved using the standard axioms of set theory.
  • “There are infinitely many prime numbers” implies “There are infinitely many odd numbers”.
  • The least upper bound axiom implies that every Cauchy sequence converges.

In each of these four sentences I didn’t make mathematical statements. Rather, I referred to mathematical statements. The grammatical reason for this is that the word “implies”, in the English language, is supposed to link two noun phrases. You say that one thing implies another.

A noun phrase, by the way, is, roughly speaking, anything that could function as the subject of a sentence. For instance, “the man I was telling you about yesterday” is a noun phrase, since it functions as the subject of the sentence,

  • The man I was telling you about yesterday is just about to pass us on his bicycle for the third time.

Other noun phrases in that sentence are “his bicycle” and “the third time”.

Let me write something stupid:

  • The man I was telling you about yesterday implies his bicycle.

I wrote that because there is an important difference between two kinds of nonsense. The above sentence doesn’t make much sense, because you can’t imply a bicycle. However, it is at least grammatical in a way that

  • The man I was telling you about yesterday. Therefore, his bicycle.

is not.

All this means that when we use “implies” in ordinary English, we are not connecting statements (because statements are not noun phrases) but talking about statements (because we use noun phrases to refer to statements).

I can think of three ways of turning statements into noun phrases. The first is rather crude: you put inverted commas round it. For example, if I want to do something about the incorrect sentence

  • It is 11 o’clock implies I am supposed to be somewhere else.

then I could change it to

  • “It is 11 o’clock” implies “I am supposed to be somewhere else.”

The second method is to come up with some name for the statement. That doesn’t work well here, but let’s have a go.

  • The mid-morning hypothesis implies the inappropriate personal location scenario.

It works better for mathematical statements with established names such as the Bolzano-Weierstrass theorem.

The third method is to stick “that” or something like “the claim that” in front.

  • The fact that it is 11 o’clock implies that I am supposed to be somewhere else.

I mentioned above that “implies” is not, strictly speaking, a connective. Why is this? It’s because connectives are used to turn mathematical statements into mathematical statements. For example, we can use “and” to build the statement “n is prime and n\geq 100” out of the two statements “n is prime” and “n\geq 100“. When we do that, the new statement isn’t referring to the old statements, but rather it contains them.

Unfortunately, as so often with this kind of thing, common mathematical usage is more complicated than the above discussion would suggest. Most people read the “\implies” symbol as “implies”. And most people are quite happy to write something like

  • x\geq 10\implies x^2\geq 100

which, according to what I said above, is ungrammatical because “implies” is not linking noun phrases. What I suggest you do here is not worry about this too much: confusion between mathematics and metamathematics is unlikely to be a problem when you are learning about Numbers and Sets and about Groups. If you are inclined to worry, then you could resolve to read a sentence like the above as “If x\geq 10 then x^2\geq 100.” I would also say that the symbol “\implies” should in general be used fairly sparingly. In particular, don’t insert it into continuous prose. For instance, don’t write something like, “Therefore x\in A and A\subset B, \implies x\in B.” Instead, write, “Therefore x\in A and A\subset B, which implies that x\in B.” (Note that in that last sentence the word “which” functioned as the subject of “implies” and referred back to the statement “x\in A and A\subset B“.)

Quotation and quasi-quotation.

If you like subtle distinctions that will not matter in your undergraduate mathematical studies, then read on. If you don’t, then feel free to skip this short section.

The distinction I want to draw attention to is between two uses of quotation marks. Just for good measure, let’s look at three different ways of doing something with the sentence, “There are infinitely many primes.”

  1. There are infinitely many primes, but only one of them is even.
  2. “There are infinitely many primes” is a famous theorem of mathematics.
  3. “There are infinitely many primes” is an expression made up of five words.

The first of these sentences is about numbers. As such, it doesn’t use quotation marks. The third sentence is about a linguistic expression. As such, it very definitely requires quotation marks, just as they are needed in the sentence

  • “Dog” is a noun and “bark” is a verb.

As for the second sentence, it is somewhere in between. It isn’t about numbers, but it’s also not about a linguistic expression. It’s about a mathematical fact. This use of quotation marks is sometimes called quasi-quotation. I won’t say any more but will instead refer you to the relevant Wikipedia article if you are interested. [Thanks to Mohan Ganesalingam for drawing my attention to it.]

Yes, but what do “if … then” and “implies” mean?

I’ve just spent rather a long time discussing the grammar of “implies”, “therefore” and “if … then” and said almost nothing about what they actually mean. To avoid confusion, I’m mainly going to discuss “if … then” since there is no doubt that that really is a connective. But sometimes I’m going to want to do what I’ve done in previous posts and use the letters P and Q to stand for statements, and here, unfortunately, there is a danger of the confusion creeping back. In particular, if one is being careful about it then one needs to be clear what “standing for a statement” actually means.

Is it something like the relationship between “The Riemann hypothesis” and “Every non-trivial zero of the Riemann zeta function has real part 1/2″? That is, are P and Q names for some statements? Not exactly, because we want to be able to make sense of the expression P\wedge Q (recall that \wedge is a symbolic way of writing “and”) and the word “and” links statements rather than names. (You don’t, for example, say, “The Riemann hypothesis and Fermat’s Last Theorem” if you want to assert that the Riemann hypothesis and Fermat’s Last Theorem are both true.) So we should think of P and Q as statements themselves — it’s just that they are unknown statements.

But in that case we shouldn’t be allowed to write P\implies Q, or at least not if \implies means “implies”. But that’s just too bad. I’m going to write it, and if you’re worried about it then read “P\implies Q” as “if P then Q”. But actually what I recommend is not worrying about it and just knowing in your heart of hearts that it would be easy to replace what you are saying by something that is strictly correct if there was ever any danger of confusion.

So let us pause, take a deep breath, allow everything I’ve written so far to slip comfortably into the back of our minds, and turn to the question of what “if … then” and “implies” actually mean. And the answer is rather peculiar. In everyday English, when we use one of these words, we are trying to explain that there is a link between the two statements we are relating (either directly or by referring to them). For example, if I say, “If we continue to emit carbon dioxide into the atmosphere at the current rate then sea levels will rise by two metres by 2100,” I am suggesting a causal link between the two.

Let me now give the standard account of what mathematicians mean by “if … then”. Later I shall qualify it considerably — not because I think it is incorrect but because I think it doesn’t give the whole picture and can be unnecessarily off-putting. The standard thing to say is that P\implies Q is true unless P is true and Q is false. That is, if you want to establish that P\implies Q, then the only thing that can go wrong is P being true and Q being false.

A brief interruption: purists will note that I have been inconsistent. If P is a statement rather than something that refers to a statement, then I can’t say “P is true”. I have to say, “”P” is true.” Alternatively, I should have said, “P\implies Q unless P and \neg Q.” Can we agree that I’ll be slightly sloppy here? (If you don’t understand why it’s sloppy, I don’t think it matters.)

Let me illustrate this with a few examples.

  • If there were weapons of mass destruction in Iraq then pigs can fly.
  • The Riemann hypothesis implies Fermat’s Last Theorem.
  • If n is both even and odd, then n=17.
  • If n is a prime not equal to 2, then n is odd.

Of these four statements, the fourth one seems quite reasonable, while the other three are all a bit peculiar. For example, it’s quite obvious that (the recent Pink Floyd stunt notwithstanding) pigs cannot fly. Doesn’t that make the first sentence false? And how can one say that the Riemann hypothesis implies Fermat’s Last Theorem when nobody expects a proof of Fermat’s Last Theorem that uses the Riemann hypothesis? And surely if n is both even and odd, it could just as well be 19. Can it be correct to say that it has to be 17? As for the fourth sentence, it seems fine: if n is a prime not equal to 2, then it cannot have 2 as a factor (or it wouldn’t be prime), so it must indeed be odd.

Well, mathematicians would say that all four statements are true. That’s because the only way “If P then Q” can be false is if P is true and Q is false. You should understand this as a definition of “if … then”. Let’s check the four statements using this definition.

For the first one to be false, we would need there to have been weapons of mass destruction in Iraq and for pigs to be unable to fly. Well, we’ve got the earthbound pigs but there were no weapons of mass destruction in Iraq, so the first statement is true. (Again, this is not some metaphysical claim. It just follows from the way we have chosen to define “if … then”.)

For the second to be false, we would need the Riemann hypothesis to be true and Fermat’s Last Theorem to be false. Well, Andrew Wiles, with help from Richard Taylor, has proved Fermat’s Last Theorem, so it’s not false. So the second statement in the list is true.

As for the third, the only way for that to be false is if n is both even and odd but n is not equal to 17. But no number is both even and odd. Therefore, the third statement is true. The problem about n equalling 19 doesn’t arise because there are no even and odd integers in the first place.

Truth values and “causes”.

There’s something unsatisfactory about the truth-value definition of “if … then” and “implies”. It seems to leave out the idea that one thing can be true because another is true. It would be quite wrong to say, for instance, that Fermat’s Last Theorem is true because the Riemann hypothesis is true.

Fortunately, there is a very close link between the truth-value definition and what I’ll call the causal concept of “if … then”. I’m not going to attempt a precise definition of the causal concept — I’m just referring to the basic idea of one statement’s being a reason for another.

Let’s go back to the one statement that felt reasonable in the list above. It was this.

  • If n is a prime not equal to 2, then n is odd.

Now comes another somewhat subtle distinction, and this is the one I really care about. What does that statement above actually mean? I think a very natural way of interpreting it is this.

  • Whenever n is a prime not equal to 2, it is also odd.

In other words, although it looks like a statement about some fixed number n, the fact that we have been told nothing whatsoever about n makes us read it in a slightly different way. We say to ourselves, “Since we’ve been told nothing at all about n, this must be intended as a general statement about an arbitrary n. So what it’s really saying is that if a positive integer has one property — being a prime not equal to 2 — then it has another — being odd.” If we’re thinking about things that way, then it’s rather tempting to say that the property “is a prime not equal to 2″ implies the property “is odd”.

What I’ve just suggested is not standard mathematical practice, but in principle it could have been. However, it is incredibly important in mathematics to be completely sure at all times what kinds of objects one is dealing with. I said earlier that “if … then” connects statements and “implies” connects noun phrases that refer to statements. I did not say that either of them connects properties. So if I want to say that one property implies another, then I have to be absolutely clear that this is a different meaning of the word “implies” (even if it is related to the previous one).

OK, so let me be careful. First of all, what is a property? It’s what you get when you take a statement that concerns a variable and you remove that variable. For example, if I take the statement “n is a perfect square” and remove the variable n from it, I get the property “is a perfect square”. A property is a thing you say about something else. (It’s almost like an adjective, but not quite because of the extra “is”.) If you want to be more formal about it, if you are given a set like the set of all positive integers, a property associated with that set is a function from elements of the set to statements. For example, the property “is prime” takes the number n to the statement “n is prime”. (It is more conventional to say that all we actually care about is the truth values of these statements. So the property “is prime” takes the value TRUE at each prime number and FALSE at all other numbers. I’ll stick with my unconventional discussion here.)

Now suppose that we have two properties A and B associated with the positive integers. When do we say that A implies B (according to my unconventional definitions)? Well, for each positive integer n, we have a statement A(n) and a statement B(n). I’ll say that A implies B (in the property sense) if for every positive integer n, the statement A(n) implies the statement B(n) (in the truth-value sense). In other words, whenever A(n) is true, so is B(n), and otherwise anything can happen. In the example above, A is the property “is a prime not equal to 2″, B is the property “is odd”, and for each n, A(n) is the statement “n is a prime not equal to 2″ and B(n) is the statement “n is odd”. Every time A(n) is true, which it is when n=3,5,7,11,13,17,19,23,29,31,..., so is B(n). This gives us the feeling that the property A “causes” the property B.

Let me go back to the statement that seemed reasonable.

  • If n is a prime not equal to 2, then n is odd.

It’s important to be careful about what this means. Is it a statement about some specific n? If so, then we must interpret the “if … then” in the strict truth-value sense. Or is it really a way of saying, “Every prime not equal to 2 is odd”? In that case, it has more of a causal feel to it.

The best way to keep everything clear at all times is not to write the above sentence when you’re really talking about all n. Instead, you can write

  • For every positive integer n, if n is a prime not equal to 2, then n is odd.

Now, if you pick out just the part of this statement that says, “If n is a prime not equal to 2, then n is odd,” then you have something that must be interpreted in the truth-value sense. But when you apply those truth-value statements to all positive integers n simultaneously, what you end up with is the nice “causal” statement that the property “is a prime not equal to 2″ implies the property “is odd”.

A silly deduction and a sensible deduction.

Because there is a sort of causal notion of implication, and because it is in a way what we really care about when doing mathematics, I very much prefer to illustrate the meaning of “implies” or “if … then” with reference to examples that include variables. If I just take two fixed statements like “Margaret Thatcher used to be Prime Minister of the UK” and “there was recently a tsunami in Japan” and tell you that, despite the lack of any obvious relationship between them, the first statement implies the second statement because the second statement happens to be true, then it it is clear the notion of implication I am using has nothing to do with one thing being true because another thing is true: not even the most rabidly left-wing person is going to blame the Japanese tsunami on Thatcher’s premiership. But a statement like, “If x\in A then x\in A\cup B,” is completely reasonable. Moreover, because x is a general element of A, which might be an infinite set, we can’t establish a statement like this by running through all x and checking the truth values of the statements x\in A and x\in A\cup B. Rather, we have to give a proof — that is, an explanation of why x must belong to A\cup B if it belongs to A. Thus, once you start looking at statements with variables, the truth-value notion of implication forces you to look for “reasons” and “causes” so that you can establish lots of truth-value facts at once. (I’m leaving out the possibility here that a statement could in some sense “just happen to be true”. For example, many people take seriously the following possibility. Perhaps the property “is even and at least 4″ implies the property “is a sum of two primes” in the sense that no number is even and at least 4 without being a sum of two primes, but perhaps also there isn’t a reason for this — perhaps it just happens to be the case.)

Here’s another illustration of the difference between statements that involve parameters and statements that don’t. Consider the following claim.

  • If \sqrt{2} is rational then there is an integer that is both even and odd.

I’m going to prove it in two different ways.

Proof 1. \sqrt{2} is irrational, so the statement “\sqrt{2} is rational” is false, and therefore implies all other statements. In particular, it implies that there is an integer that is both even and odd.

Proof 2. If \sqrt{2} is rational, then we can find positive integers p and q such that \sqrt{2}=p/q, which implies that 2q^2=p^2. Let k be the largest integer such that p^2 is a multiple of 2^k. Since p^2 is a perfect square, k must be even. (To see this, just consider the prime factorization of p.) But p^2=2q^2, and the largest k for which 2q^2 is a multiple of 2^k is odd. (To see this, just consider the prime factorization of q.) Therefore, k is both even and odd, which proves the result.

Which of these two arguments is more interesting? Undoubtedly the second, since it actually gives us a proof of the irrationality of \sqrt{2}. So is the first argument valid at all? You might object to it on the grounds that it uses without proof the fact that \sqrt{2} is irrational. But we can make the question more interesting as follows. There is (it happens) a different proof of the irrationality of \sqrt{2} that does not involve the statement that some positive integer is both even and odd. What if we used that argument, concluded that “\sqrt{2} is rational” was false, and then went on to deduce “there exists an integer that is both even and odd” in the way that argument 1 does above. Would that be a valid deduction?

I think the answer has to be yes, but it is not an interestingly valid deduction. It is not showing that the irrationality of \sqrt{2} is in any way caused by a contradiction that involves parity, since we deduced that from another, and unrelated, false statement.

If we think of implication as primarily something we apply to statements with parameters, and therefore indirectly and in a different sense to properties, then our starting point is not the statement “\sqrt{2} is irrational” but rather the statement “\sqrt{2}=p/q“. And our conclusion, that there exists an integer that is both even and odd, is deduced from the more precise (and informative) statement, “the highest k such that p^2 is a multiple of 2^k is both even and odd”.

As a final remark about the above example, which allows me to emphasize a point I have already made, suppose that I start a proof of the irrationality of \sqrt{2} by writing,

  • \sqrt{2}=p/q\implies p^2=2q^2.

What I am really saying is that whatever p and q might be, if \sqrt{2}=p/q, then p^2=2q^2. In other words, although it looks as though I’m talking about a specific pair p and q, in fact I’m making a general deduction.

What’s good about the usual convention concerning “if … then” and “implies”?

I think I have partially answered this question by pointing out that when we consider statements with parameters then the truth-value meaning of “implies” feels a lot closer to the more intuitive “causal” meaning of “implies”. However, the agreement isn’t total. One of the “silly” examples from early in this post was this.

  • If n is both even and odd then n=17.

This looks odd, because although we know that n can’t be both even and odd, we also feel that if n were even or odd, there would be nothing about that fact that steered n towards the number 17 as opposed to any other number. I can’t deny the feeling of oddness. All I can say is that the hypothetical situation never arises because the hypothesis, that n is even and odd, is impossible.

What I can do, however, is explain why I don’t want to try to find a different convention that would make this statement false. I don’t want to do that because it would force me to give up some general principles that I like. One of those I have already mentioned:

  • Property P implies property Q if and only if the set of all n such that P(n) is a subset of the set of all n such that Q(n).

I hope you’ll agree that that looks highly reasonable, and we don’t want to start having ugly exceptions to it if we don’t have to.

Here’s another mathematical principle that I think you will also have to agree with.

  • The empty set is a subset of every other set.

Now let’s apply these two principles. I’m going to let P be the property “is both even and odd” and I’m going to let Q be the property “equals 17″. Then the set of n such that P(n) is the empty set (since no n is both even and odd). The set of n such that Q(n) is the set \{17\}. Since the empty set is a subset of the set \{17\}, the first principle tells us that P(n) implies Q(n).

To summarize this discussion, the formal mathematical notion of implication is a bit strange, but most of the strangeness disappears if you just look at statements with parameters, which tend to be the statements we care about. Each such statement corresponds to a property of those parameters, and implication of properties is closer to our intuitive notion of one thing “making” another true than implication of statements. Even then there are one or two oddnesses, but these are a small price to pay for the cleanness and precision of the definition and for the fact that it allows us to hold on to some cherished general principles.

An exercise — not to be taken too seriously.

(i) Prove that Borsuk’s conjecture implies the Riemann hypothesis.

(ii) Comment on your proof.

Hint: if you find part (i) difficult, then you are not applying one of the pieces of general study advice I gave in the first post of this series.


Basic logic — quantifiers

By Gowers

When I started writing about basic logic, I thought I was going to do the whole lot in one post. I’m quite taken aback by how long it has taken me just to deal with AND, OR, NOT and IMPLIES, because I thought that connectives were the easy part.

Anyway, I’ve finally got on to quantifiers, which are ubiquitous in advanced mathematics and which often cause difficulty to those beginning a university course. A linguist would say that there are many quantifiers, but in mathematics we normally make do with just two, namely “for all” and “there exists”, which are often written using the symbols \forall and \exists. (If it offends you that the A of “all” is reflected in a horizontal axis and the E of “exists” is reflected in a vertical axis, then help is at hand: they are both obtained by means of a half turn.)

Let me begin this discussion with a list of mathematical definitions that involve quantifiers. Some will be familiar to you, and others less so.

1. A positive integer n is composite if there exist positive integers a and b, both greater than 1, such that ab=n.

2. An n\times n matrix A is invertible if there exists an n\times n matrix B such that AB=BA=I_n. (Here I_n is the n\times n identity matrix.)

3. A binary operation \circ on a set A is commutative if for every a\in A and for every b\in A, a\circ b=b\circ a.

4. A function f from a set A to a set B is a surjection if for every y\in B there exists x\in A such that f(x)=y.

5. A set A of real numbers is dense if for every real number x and for every \epsilon>0 there exists a real number a such that a\in A and |a-x|<\epsilon.

I have put those in approximately ascending order of difficulty. To see how such a definition comes about, let us take the last of them. It is a familiar and useful property of the rational numbers (that is, numbers that can be written as fractions) that they “appear everywhere”. This property can be expressed in a number of ways. One is to say that whenever a and b are real numbers and a<b there must be at least one rational number r that lies between them. Another way of saying it is that every real number can be arbitrarily well approximated by rationals.

Let’s try to turn those two thoughts into precise definitions. But before we do so, I would like to draw attention to a number of words that should alert you to the possible presence of \forall, or a universal quantifier, as it is sometimes known. A few examples are “whenever”, “always”, “every”, and “each”. For each one of these words I’ll give an example of a sentence that contains it. Then I’ll translate those sentences into a more mathematical style using a universal quantifier.

  • Whenever it rains, the grass smells wonderful.
  • I always believe what my doctor says.
  • Every country in the EU is having economic difficulties at the moment.
  • Each picture in the Fitzwilliam museum is worth getting to know.

Now the translations.

  • For every time t, if it rains at time t then the grass smells wonderful at time t.
  • For every statement S, if my doctor says S then I believe S.
  • For every country C, if C belongs to the EU then C is having economic difficulties at the moment.
  • For every picture P, if P is in the Fitzwilliam museum then P is worth getting to know.

There are other words and phrases that suggest the lurking presence of \exists, or an existential quantifier. They are things like “there is”, “for some”, “some”, “at least one”, “you can find”. I’ll content myself with just one example this time.

  • Some cars run on electricity.

This could be translated as follows.

  • There exists a car C such that C runs on electricity.

You might want to argue that the word “cars” in the first sentence implies that more than one car runs on electricity. If that bothers you, here’s another example. Suppose I receive an email and react by saying, “Somebody likes me.” The meaning there (if you are rather literal-minded and take my words at face value) is

  • There exists a person P such that P likes me.

Creating mathematical statements that involve quantifiers.

Right, let’s see what we can do with this statement:

  • Whenever a and b are real numbers with a<b there is some rational number r that lies between a and b.

I’m going to use symbols this time. The word “whenever” alerts me to a universal quantifier. Indeed, the phrase “Whenever a and b are real numbers” can be translated into \forall a,b\in\mathbb{R} before we even look at the rest of the sentence. “There is some” now looks suspiciously like an existential quantifier, and it is: we translate “there is some rational number r” into the symbolic form \exists r\in\mathbb{Q}. Finally, “that” is referring back to the number r, so we are saying that r lies between a and b, which we can put more mathematically by saying a<r<b Putting that all together gives us this.

  • \forall a,b\in\mathbb{R}\ \exists r\in\mathbb{Q} s. t. a<r<b.

To read that sentence, read \forall as “for every” or “for each” (or if you like, “for all” but that sounds somehow less idiomatic), read \in as “in”, read \mathbb{R} as “the reals”, read \exists as “there exists”, read \mathbb{Q} as “the rationals”, read “s. t.” as “such that” and read < as “is less than”. So what you would actually say when reading those symbols is this.

  • For every a, b in the reals, there exists r in the rationals such that a is less than r is less than b.

As you will have deduced from that, \mathbb{R} is the conventional symbol for the set of real numbers and \mathbb{Q} is the symbol for the set of rational numbers. We also have \mathbb{N} for the set of natural numbers (or positive integers), \mathbb{Z} for the set of all integers, and \mathbb{C} for the set of complex numbers.

What about the definition of “dense” in terms of arbitrarily good approximation? The informal definition was this.

  • Every real number can be arbitrarily well approximated by rationals.

We can make a start on this by turning the “every” into a proper quantifier. That gives us this.

  • For every real number x, x can be arbitrarily well approximated by rationals.

So now our problem is reduced to finding a formal way of saying that x can be arbitrarily well approximated by rationals. What does that mean? It means this: however well you want me to approximate x by a rational number, I can do it. Now the word “however” contains “ever” within it. Could this be hinting at a universal quantifier? Yes it could. It is saying something like, “Give me any level of accuracy you want,” which contains the word “any”, a real giveaway. Having said that, the word “any” is a bit problematic because sometimes it replaces an existential quantifier, as it does for instance in the sentence, “If there is any reason to go, I’ll happily go.” The usual advice, with which I concur, is to keep “any”, “anything”, “anywhere”, etc. out of your mathematical writing.

Anyhow, we can avoid the word “any” by going further and saying, “For every level of accuracy that can be specified.” To clarify this, let us think what “level of accuracy” means. When I approximate a real number by a rational number, I am trying to pick a rational number that is close to the real number. And the accuracy of the approximation is naturally measured by the difference. So to specify a level of accuracy is to provide a small positive number and insist that the difference should be less than that number. For historical reasons, mathematicians like the Greek letters \epsilon and \delta for this purpose. So “for every level of accuracy” ends up as the rather more straightforward “for every \epsilon>0“.

Where have we got to now? We are here.

  • For every real number x and for every \epsilon>0“, x can be approximated to within \epsilon by a rational number.

Now the word “can” is another one that sometimes conceals an existential quantifier. For example, “It can be cold in Cambridge,” means that amongst the possibilities for the weather in Cambridge there exists at least one cold one. So we could take the hint from that and rewrite the above sentence as follows.

  • For every real number x and for every \epsilon>0“, there exists a rational number r such that r approximates x to within \epsilon.

And now, to finish off, we just have to remember what we meant by “approximates to within \epsilon“. We end up with statement 5 from earlier on.

  • For every real number x and for every \epsilon>0“, there exists a rational number r such that |x-r|\epsilon“.

In symbols, this would be as follows.

  • \forall x\in \mathbb{R}, \forall\epsilon>0,\exists r\in\mathbb{Q}:|x-r|<\epsilon.

By the way, a quick piece of stylistic advice. Some people, when they first come across the symbols \forall and \exists, get too keen on them and start using them in the middle of ordinary text, writing sentences like this: “And therefore \forall x\in A we know that f(x) is at most M.” That looks awful. You should either write something like, “And therefore for every x in the set A we know that f(x) is at most M,” or you should write something more like this.

“And therefore,

\forall x\in A\ f(x)\leq M.

In general, don’t overdo the symbols. And if you do use them (in order, say, to avoid an excessively wordy sentence), then don’t mix them up with words too much. A good rule of thumb there is to make sure that each symbol is part of a bunch of words that can stand reasonably well on their own. For example, the following mixture isn’t too bad:

  • Therefore, x\in B whenever x\in A.

But these are unspeakably awful.

  • Therefore, x\in the union of A and B.
  • Therefore, x\in B\ \ \forall elements x of A.
  • Therefore, x is an element of A, which \implies x is an element of B.
  • It follows that \exists x\in A that does not belong to B.

I leave it to you to come up with nicer formulations of the above four sentences. One final exhortation: please don’t ever use the symbol \forall to stand for the word “every”. If you’re now thinking “But isn’t that what it means?” then I’m glad I brought this up. It doesn’t mean “every”. It means “for every”. That kind of distinction really matters in mathematics.

Understanding mathematical statements that contain quantifiers.

I’ve discussed how you can take a slightly vague English statement and convert it into a precise formal mathematical one. It’s tempting to give many more examples, but I’d rather save that up for the actual definitions you will encounter. So if one example isn’t enough for you, be patient and there will be more.

But what about the reverse process? Suppose you are presented with a statement like this.

  • \forall\epsilon\exists N\in\mathbb{N}, \forall n\geq N, |a_n-a|<\epsilon.

If you haven’t seen that before, you will probably find it pretty opaque. In fact, some people find it pretty opaque even if they have seen it before. So what can one do to make sense of it?

Well, most people find that the more quantifiers they have to cope with, the harder it gets. So a good technique for understanding a statement such as the above is to build up gradually. Let me illustrate how this can be done. (What I’m about to show is meant to be something you can do for yourself with other definitions. The hope would be that once you’ve gone through the process with a few of them you will get used to them and not need to go through the process any more.)

To make things really easy, let’s start with no quantifiers at all. That is, let’s start with the quantifier-free “heart” of the statement, which is

  • |a_n-a|<\epsilon.

This isn’t hard to understand: it’s saying that the nth term of the sequence differs from a by less than \epsilon.

OK, now let’s add a quantifier. The one thing to remember is that we’ll add the quantifier furthest to the right. In other words, we start at the end of the entire statement (this we’ve just done) and work backwards.

  • \forall n\geq N, |a_n-a|<\epsilon.

That’s got one quantifier, but it’s still not too bad. It’s simply saying that every term of the sequence differs by at most \epsilon from a. Or rather, it would be if it weren’t for that little condition that n\geq N. So I lied. It isn’t quite saying that every term differs by at most \epsilon. It’s saying that every term differs by at most \epsilon as long as we’ve got to N or beyond.

Right, let’s add a second quantifier.

  • \exists N\in\mathbb{N}, \forall n\geq N,|a_n-a|<\epsilon.

What is the effect of that “\exists N” on the previous sentence? Well, the previous sentence said that a_n is within \epsilon of a as long as we’ve got past N. But it gave us no idea what N was. And in fact N isn’t really a fixed number. All we know is that there is some N that makes that statement true. That is, there is some N such that a_n stays within \epsilon of a once n gets to N or beyond.

Note that I hid the “for all” quantifier inside the word “stays” there. I used the word “stays” to mean “is for evermore” and the “ever” in “evermore” is a very clear hint of a universal quantifier. This informal language isn’t part of mathematics and should be kept out of proofs, but it is a useful aid to thought.

Actually, there is another piece of informal language that I find useful for this specific situation where we have \exists N\in\mathbb{N}\ \forall n\geq N. I think of the \exists N as saying “eventually” (which could also be “from some point onwards” if you want to make the \exists stick out a bit more clearly) and the \forall n\geq N as saying “always”. So this part of the statement is saying

  • eventually a_n is always within \epsilon of a.

I quite like the word “stays” too:

  • eventually a_n stays within \epsilon of a.

We’ve still got a quantifier to go. What is \epsilon? Again, it isn’t something fixed. Let’s have a look at the whole statement.

  • \forall \epsilon>0,\exists N\in\mathbb{N}, \forall n\geq N: |a_n-a|<\epsilon.

We now reach something that’s a bit less easy to put into informal language. Here’s an attempt.

  • However close you want a_n to be to a, eventually it will always be that close.

In general, if you ever see a statement that begins \forall \epsilon>0 and ends with something being less than \epsilon, the general idea is that however small you want that something to be … complicated stuff … you can get it to be that small. (Of course, the letter doesn’t have to be \epsilon. Another popular choice is \delta.)

It would be remiss of me not to mention that the definition we have just picked apart is the formal definition of the concept of convergence. You will find over the next few weeks that if you see the sentence

  • (a_n) converges to a.

then to work with it you need to translate it into the formal statement we’ve just looked at with its three quantifiers. That’s an oversimplification because it applies only when we are reasoning from first principles. Once you have met the definition of convergence, you will prove simple facts about it such as that if a_n converges to a and b_n converges to b, then a_n+b_n converges to a+b. These facts can then be used to prove facts that involve convergence without writing out the definition in full. However, when you’re just starting, and sometimes later on too, you do need to write out the definition. So one thing you have to do is learn these definitions — off by heart. If you don’t, you might just as well give up. But if you follow some of the tips above, you may find that you don’t have to learn the definitions as if they were a random jumble of symbols. Ideally, you will develop enough understanding to have a good intuitive picture of what the definition says, and the means to translate that intuitive picture into the formal definition with quantifiers. Another thing you can do is try writing out wrong versions of the definition and seeing why they are wrong. For example, suppose we interchange the first two quantifiers in the definition we’ve been discussing. Then we get the following statement.

  • \forall N\in \mathbb{N},\forall\epsilon,\forall n\geq N, |a_n-a|<\epsilon.

That is an unnecessarily complicated way of saying that from some point on all terms of the sequence are equal to a. (If you don’t immediately see that it saying that, then try carrying out the process I’ve just outlined above. At some point the meaning will jump out at you.)

What I’ve just recommended may sound like hard work; that is because it is. But it isn’t impossibly hard, and time invested at this stage will pay huge dividends later.

I could say plenty more about quantifiers, but I think I’ll hold my fire for now, and discuss them more when they come up in the courses.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: