LONDON
Introduction
Over the Christmas break, I had the pleasure of reading Elementary Number Theory by David Burton. I picked this book up from the office bookshelf just as I left home for the break. I had studied number theory for maths olympiads in high school. Back then, I had owned some other number theory textbook, but I never made it past the second chapter. I was a very busy high school student. I was trying to make it to the IMO, but I was also trying to do too many other things, so I never made it. This is one of the failures I have regretted the most in my life, and that has left me with a slight unease whenever I confronted maths olympiad topics like number theory, inequalities and Euclidean geometry. It turns out – all I had to do was give it some time and read good textbooks.
Number theory is the study of integers and their properties. I picked up Elementary Number Theory because number theory has surfaced as an important tool in my research. I was doing something in which the following question arose:
Let
denote the
-th prime number, that is
and so on.
The function, is defined as the sum of the first
primes. That is,
. How fast does
grow? Is it roughly linear in
? or quadratic? or something else?
(There are mathematical interruptions throughout this book review, but you don’t have to read them all)
The great computer scientist Donald Knuth once said, “Virtually every theorem in elementary number theory arises in a natural, motivated way in connection with the problem of making computers do high-speed numerical calculations”. I experienced this natural connection between number theory and computer science first hand in my research in quantum computing. This is especially remarkable because before the advent of computing, number theory was thought to be a discipline “unsullied by any application”.
Apart from my trauma from maths olympiad, another reason that I feared number theory is that contemporary number theorists work on some of the hardest problems in mathematics. In my first year at Oxford, I attended a lecture by Andrew Wiles, which I excitedly told everyone about as if I had attended a concert, but in truth I learnt nothing there. Andrew Wiles is a famous mathematician who solved a long-standing open problem called Fermat’s Last Theorem, but this achievement took an arduous 10-year effort on his part. So great is his achievement that the Oxford Maths Institute building is named after him.
The buzzwords Riemann Hypothesis, Twin Prime Conjecture, and Fermat’s Last Theorem were what I thought number theory was about: lots of very difficult open questions that require decades of study to crack open. As a mathematician with a more practical bent, I thought it best that I avoid such difficult problems.

Praise
All this fear and trauma is now gone. Elementary Number Theory is my new favourite book. There are three things about this book that make it excellent. The first is that it offers a complete and systematic treatment of elementary number theory. “Elementary” in the title refers to the fact that the book only uses proof techniques that are taught to high school students. Apart from comfort with mathematical writing, the only mathematical background needed for this book is high school algebra, and basic proof techniques like induction and proof by contradiction. Burton’s work is therefore a treatise on everything that can be understood clearly and distinctly in the realm of number theory using a basic toolkit. Moreover, the fact that the book organizes the topics so well and gives a thorough treatment of each topic makes the work a great reference book. Someone recently mentioned to me in a conversation that the best way to learn something is not to read a textbook but rather to organize the information in a way that your brain is ready to accept. The linear flow of a textbook is supposedly unwelcome to our brains. While this may be true of other textbooks, I think Burton’s Elementary Number Theory is structured so as to fit in your brain.
The second admirable quality of this book is its coverage of the history of number theory. Every chapter in the book begins with a few pages of prose describing the mathematician whose work is going to be featured in that chapter. I have learned many interesting things about Euclid, Pythagoras, Fermat, Euler, et cetera. The quality of the scholarship is of a top level, for the author can even tell us which textbooks these great men learned their number theory from and how that influenced their writing and work. Reading Burton has made me want to obtain copies of the small number of canonical number theoretic texts that have accompanied humanity for the last two and a half thousand years such as Euclid’s Elements and Diophantus’ Arithmetica. What’s more, many of the exercises in the book are problems that great mathematicians created and wrote down in their treatises or letters to each other.
Finally, the book’s confidence in the face of any problem, no matter how big or how small, was inspiring to me as a young researcher. After setting up fairly basic notions like division and the greatest common divisor, Chapter 3 of the textbook concerns Goldbach’s Conjecture! Goldbach’s Conjecture states that every even integer greater than two can be written as the sum of two prime numbers. This is exactly the kind of long-standing open mathematical problem that shied me away from number theory in the past. Why would a textbook titled Elementary Number Theory have Chapter 3 on Goldbach’s Conjecture? I realized that the chapter was about what can be said about Goldbach’s Conjecture using elementary mathematics. While the final resolution of such large open questions might only come from more advanced techniques, elementary techniques can still do something for the problem. Elementary techniques can, for example, clarify what key theorems or lemmas need to be tackled with more advanced techniques. They can also point out how slight modifications to the question make the question break open in a very easy way. For example, although it is hard to prove Fermat’s Last Theorem in general, to prove it for the case of the fourth power is just one page in this textbook. A short argument in the book also establishes that Fermat’s Last Theorem need only be proved for prime powers for the whole result to follow.
Fermat’s Last Theorem: For
, no integer solutions
exist for the equation
.
For example we definitely have things like
and even
, but if someone wrote down
, you can be sure this is wrong because of Fermat’s last theorem.
It took 350 years to prove this theorem for all
. For the case where
, there is an elementary proof covered in Elementary Number Theory.
A tour of Number Theory
I will now describe the contents of this book, and thus at the same time, give an overview of elementary number theory.
Induction and Well Ordering
Number theory is about integers. We articulate and prove a large number of properties of the integers.
Every whole number is either zero, or it is the successor of a whole number.
One set of tools to prove theorems about whole numbers (non-negative integers) comes from mathematical induction. Whole numbers can be seen as built from the number zero, and by the successor function. This axiomatic characterisation is of great utility in proofs. There are infinitely many whole numbers, to prove some property of all of them we have to tame this infinity. One way to do it, is to tackle these two primitives: zero and the successor function. The principle of mathematical induction says that one can prove something about the entire infinite set of numbers by simply proving it for zero and also proving that if the fact holds for some number, it will also hold for its successor. The principle of mathematical induction, like other techniques we will see, is thus a way of taming the infinite size of the set of all numbers.
Induction example: How can you prove that an every domino in an infinte chain of dominos will fall if the first one is pushed?
Answer: We already know the first domino falls because we know it is pushed. We also know that if the-th domino falls, then the one next to it (the
-th domino), will also fall. Thus by the Principle of Mathematical Induction, every domino will fall.
A related result is the well-ordering principle, which states that any set of positive integers has a minimum element. While obvious, this is a powerful statement that enables the proof technique called “minimal counterexample”. Sometimes, you can prove something just by arguing that if exceptions existed, there would be a smallest exception, and that the existence of a smallest exception would lead to a contradiction. Some sets can’t exist because they can’t support a smallest element.
Boring numbers: While this is not a real mathematical proof it serves to illustrate the proof by minimal counterexample.
Every number has some cool property. 1 acts as identity when you multiply by it, 2 is the only even prime, 3 is the smallest odd prime, 4 is both 2+2 and 2 x 2, etc.
Prove that there is no “boring” number.
Pseudo-proof (by minimal counterexample): Suppose (for the sake of proof by contradiction) that there exist boring numbers. Consider the set of all boring numbers, and take the smallest number in that set. This number would have the cool property that it is the smallest boring number. This is a contradiction, as this number is both boring (it’s in the set of boring numbers) and cool (it is the smallest boring number). Therefore there are no boring numbers.
Division
Number theory gets really interesting when the notions of division, prime numbers, and the greatest common divisor are introduced. Addition, subtraction, and multiplication are straightforward in the sense that you can always add, subtract, or multiply any two given integers. Division is different. Dividing two integers might not result in an integer. There’s a lot more structure in division to understand and explain. Number theory properly begins by formally defining division and with it, the notions of quotient and remainder.
Division: For any pair of integers
and
, we can write
, where
and
are non-negative integers, and
. We then call
the dividend,
the divisor,
the quotient, and
the remainder.
If, we say that
divides
, or that
is divisible by
.
Division gives us a different way of taming infinities. We can partition the infinite set of integers based on remainders for some fixed divisor. For example with the divisor 2, we can partition numbers into the set of even numbers (remainder = 0) and odd numbers (remainder = 1). To prove something for all numbers, we then just have to consider two cases: odd and even.
Primes and greatest common divisor
As we tame infinities, we have to reckon with the fact that some interesting properties don’t apply to all numbers. A prime number is one that has no divisors other than one and itself. This is important in number theory, as some important facts don’t apply to all numbers. Often, they apply only to prime numbers (or sometimes only to powers of prime numbers, or only to odd primes, etc.).
What’s special about prime numbers is that they are building blocks for all other numbers, under the lens of multiplication. We saw earlier how all numbers can be built from 0 and the successor function – suggesting that from the perspective of addition, these two tools are all you need to build all the numbers. Multiplication is very different. To be able to build all numbers by multiplying some simpler numbers, you need an infinitely large set of numbers: these are the prime numbers. This insight is captured by the Fundamental Theorem of Arithmetic, which states that every number can be written as the product of primes in a unique way.
The GCD (Greatest Common Divisor) of two numbers is the largest number that divides them both. If two numbers have a GCD of 1, they are said to be relatively prime. When proving facts in number theory about pairs of numbers, it is often the case that some theorem holds only for pairs of relatively prime numbers. For example 14 = 2 x 7 and 15 = 3 x 5 are relatively prime. There’s no number other than 1 that divides both of them. However 15 = 3 x 5 and 35 = 5 x 7 are not relatively prime, they have a GCD of 5.
The rest of the book builds stronger tools from the basic tools mentioned above. It also applies all these tools to natural questions.
Modular Arithmetic
Using the theory of division and remainders, we can develop one of the most widely used tools in number theory: the theory of congruences, also known as modular arithmetic (from the root word modulus, NOT module). Modular arithmetic is like the maths you do on a clock. If the time is 1700 hrs, then 10 hours later is not 2700 hrs, but rather 0300 hours. Time of day loops back and 2400 is “equivalent” to 0000.
With this same idea, we define modular arithmetic as shown:
Definition of Modular Arithmetic
Ifand
have the same remainder when divided by
, we write this in modular arithmetic as:
.
Examples:.
Modular arithmetic is used, as discussed above, to divide the set of all numbers into a small list of cases where we can be sure all numbers within a case behave the same. If we work modulo , then we have at most
cases to consider. The number of cases can be less if we know something else about the number. For example a square number can only be 0 or 1 mod 4. When proving facts about square numbers, we often use this to divide into two cases and prove them separately.
Application of modular arithmetic:
The last digit (mod 10) of a square number must be: 0,1,4,5,6,9
The last two digits (mod 100) of a square number must come from the following list:
01, 21, 41, 61, 81,
04, 24, 44, 64, 84,
09, 29, 49, 69, 89,
00, 25,
16, 36, 56, 76, 96We can use this to quickly check if a number is NOT a square. E.g. 128317566 is definitely not a square number. The test is inconclusive in the case of 132480176, as this might be a square number but isn’t definitely one.
Algebra in modular arithmetic
Modular arithmetic also sets up a nice world in which to do algebra. Notice how modular equivalence behaves like equality in many ways:
Modular equivalence is a lot like equality
Ifand
, then
.
Ifand
, then
.
.
Ifthen for any positive integer
:
We have to learn to walk again in the modular world. In school, we learn how to solve linear and quadratic equations over real or complex numbers in high school. So, very naturally, number theory has also considered linear and quadratic equations in modular arithmetic. In modular arithmetic, solving these equations requires a different set of techniques. This is the subject of Sections 2.4, 4.4, and Chapter 9.
Linear equations in modular arithmetic
, can be solved easily:
, as long as
.
Now consider
.
We can rewrite it as, but how do we divide by
?
This introduces the concept of modular inverses.
For some pairs
, there exists a number
such that
When this exists, we get the solution
.
But when does this exist?
The answer is: whenever.
This exemplifies what was mentioned above that concepts like prime numbers and GCD are important because some properties only hold for prime or relatively prime numbers.
Quadratic equations in modular arithmetic
Consider, where
is prime.
The way such equations are solved involves some algebra trick similar to completing the square to reparameterise and obtain the simpler equation:This equation doesn’t always have a solution.
So we just have to understand this very natural problem, namely: which numbers have square roots in modular arithmetic? Remember that we looked at this problem in the case of mod 10 and mod 100 above.
There is very clear set of rules to help us determine this, see the topics: Legendre symbol and Quadratic reciprocity law.
Then there’s the Chinese remainder theorem, which is about how we can combine different modulo facts together.
Chinese Remainder Theorem
Example: There are certain things whose number is unknown. If we count them by threes, we have two left over; by fives, we have three left over; and by sevens, two are left over. How many things are there? (Sunzi)
That is, findwhere
,
, and
.
Answer: We can combine these three equivalences to the following:
So the number of items is one of: 23, 128, 233, …
Chapter 5, 7 and 8 are about exponentiation in the modular world. A common pattern for math olympiad questions looks like: Find the remainder when is divided by 11. Two theorems known as Fermat’s Little Theorem, and Euler’s Totient Theorem capture these situations very well. They tell us some conditions under which
, which allows us to quickly simplify large exponentials.
We can also say more fine grained things about exponentiation. Consider:
…
as opposed to
…
By raising 3 to different powers modulo 7, we are able to get all the numbers from 1 to 6, however when raising 2 to different powers modulo 6, we only get 2 and 4. We say that 3 is a primitive root modulo 7.
There is a complete characterisations of which moduli have primitive roots, and how many. Primitive roots are really useful because they allow us to systematically solve equations of the form , which is a very natural algebra problem. They are also an important proof technique giving us nice theorems about orders (how long is the cycle when exponentiating modulo something) and indices (the “logarithm” in the modular world).
Miscellaneous topics
Chapter 6 defines multiplicative functions – a natural class of functions in which many important number theoretic functions fall. It also describes the the Divisor Convolution and the Möbius Inversion Formula, which allow us to reason about such functions. I was really impressed by these concepts, and their use in a beautiful proof. Understanding this proof was really the moment while reading this book that I realised how much fun I was having. Let denote the number of integers modulo
that have order
. It was shown in the book that
(that is, it’s equal to Euler’s totient function). This proof rested on properties of multiplicative functions in general, equipped with which we can conclude the functions are equal just from the fact that
and
are both multiplicative, and that they satisfy
.
Chapters 10, 11, and 12 concern interesting theorems and serve as ground on which to test the tools developed earlier in the book. One interesting result from these chapters is that every integer can be written as the sum of four square numbers, you never need more. Another thing to know is that we have a complete theory of all Pythagorean triples; we know the formula to generate all of them. I love that when the power is 3 and above we have no solutions (Fermat’s Last Theorem), but when its 2 we have infinitely many and we know all of them.
Pythagorean triples
All the positive integer solutions ofare of the form:
whereare any relatively prime numbers that are not both odd, and
is any positive integer.
Any Pythagorean triple can be generated from these formulas.
Finally, Chapter 13 sets up the interesting theory of continued fractions. Continued fractions turn out to be an alternative scheme for representing numbers. Instead of representing a number in decimal form, we can represent it as a continued fraction. I had learned about continued fractions in primary school maths olympiads – I always thought they were nothing more than a contrived way of testing young kids on their mathematical ability. However, there is a theory of continued fractions which is very natural and linked to Diophantine equations (equations where we are only interested in integer solutions). It is also important for finding rational representations and approximations of numbers.
Continued fraction representation of Euler’s number
Whilehas a very random looking decimal representation, it’s continued fraction representation has a pattern:
Finally, I should mention that throughout the book, a number of algorithms and heuristics are given to achieve common number theoretic computations. These include how to find the GCD (Greatest Common Divisor) of two numbers efficiently using Euclid’s algorithm, how to factorize using Fermat’s factorization method, how to check divisibility by some special numbers using techniques developed through modular arithmetic, and so on.
I started this journey because of a question about the distribution of primes that appeared in my research. Coincidentally this was the subject of an appendix! As a close friend advised me once on reading textbooks: make sure there are some topics at the end of the book that you really want to understand, it will motivate you to finish the book. Perfect!
Elementary Number Theory is a book that I’m glad I read cover to cover, and I will keep it at hand as a reference for all my number theory needs (whether research or math olympiad teaching). If I were designing a degree programme for computer scientists I would include a semester on number theory instructed through this book. This book has introduced me to the joy of reading textbooks cover to cover and I hope to keep this going. I hope reading this has given you a sense for what number theory is about, and how important a good textbook can be.
Appendices
Stack
Here is an excerpt from the book on history:
“During his stay in Berlin, Euler acquired the habit of writing memoir after memoir, placing each when finished at the top of a pile of manuscript. Whenever material was needed to fill the Academy’s journal, the printers would help themselves to a few papers from the top of the stack. As the height of the pile increased more rapidly than the demands made upon it, memoirs at the bottom tended to remain in place a long time. This explains how it happened that various papers of Euler were published, while extensions and improvements of the material contained in them had previously appeared in print under his name.”
Contents
Chapter 1. Some Preliminary Considerations
- 1.1 Mathematical Induction
- 1.2 The Binomial Theorem
- 1.3 Early Number Theory
Chapter 2. Divisibility Theory in the Integers
- 2.1 The Division Algorithm
- 2.2 The Greatest Common Divisor
- 2.3 The Euclidean Algorithm
- 2.4 The Diophantine Equation ax + by = c
Chapter 3. Primes and Their Distribution
- 3.1 The Fundamental Theorem of Arithmetic
- 3.2 The Sieve of Eratosthenes
- 3.3 The Goldbach Conjecture
Chapter 4. The Theory of Congruences
- 4.1 Karl Friedrich Gauss
- 4.2 Basic Properties of Congruence
- 4.3 Special Divisibility Tests
- 4.4 Linear Congruences
Chapter 5. Fermat’s Theorem
- 5.1 Pierre de Fermat
- 5.2 Fermat’s Factorization Method
- 5.3 The Little Theorem
- 5.4 Wilson’s Theorem
Chapter 6. Number-Theoretic Functions
- 6.1 The Functions and o
- 6.2 The Möbius Inversion Formula
- 6.3 The Greatest Integer Function
Chapter 7. Euler’s Generalization of Fermat’s Theorem
- 7.1 Leonhard Euler
- 7.2 Euler’s Phi-Function
- 7.3 Euler’s Theorem
- 7.4 Some Properties of the Phi-Function
- 7.5 An Application to Cryptography
Chapter 8. Primitive Roots and Indices
- 8.1 The Order of an Integer Modulo n
- 8.2 Primitive Roots for Primes
- 8.3 Composite Numbers Having Primitive Roots
- 8.4 The Theory of Indices
Chapter 9. The Quadratic Reciprocity Law
- 9.1 Euler’s Criterion
- 9.2 The Legendre Symbol and its Properties
- 9.3 Quadratic Reciprocity
- 9.4 Quadratic Congruences with Composite Moduli
Chapter 10. Perfect Numbers
- 10.1 The Search for Perfect Numbers
- 10.2 Mersenne Primes
- 10.3 Fermat Numbers
Chapter 11. The Fermat Conjecture
- 11.1 Pythagorean Triples
- 11.2 The Famous “Last Theorem”
Chapter 12. Representation of Integers as Sums of Squares
- 12.1 Joseph Louis Lagrange
- 12.2 Sums of Two Squares
- 12.3 Sums of More than Two Squares
Chapter 13. Fibonacci Numbers and Continued Fractions
- 13.1 The Fibonacci Sequence
- 13.2 Certain Identities Involving Fibonacci Numbers
- 13.3 Finite Continued Fractions
- 13.4 Infinite Continued Fractions
- 13.5 Pell’s Equation
Appendix: The Prime Number Theorem