Then some corollaries were given (products, absolute values). We discussed the Fundamental Theorem of Calculus.
An attempt was made to discuss curves. Somehow this got sidetracked as the instructor told students about space-filling curves. This is a continuous mapping of, say, the unit interval in R^{1} onto the unit square in R^{2}. Here is a Wikipedia article. Such curves were first constructed by Giuseppe Peano, and a nice example was invented by I.J. Schoenberg.
The following results tell you a bit more about what the space-filling curves do not (!).
• If f=(f_{1},f_{2}) is a differentiable function (so the two components are differentiable) then f([0,1]) is "small" in the following sense: given ε>0, there is a final collection of rectangles in R^{2} which contain f([0,1]) so that the sum of the areas of these rectangles is less than ε. (This is sometimes referred to by the phrase: "f([0,1]) has content 0".) The proof of this just uses the Mean Value Theorem.
Therefore the space-filling curves must not be differentiable (at least in large parts of the domain). The coordinate functions must resemble, for example, the Takagi function mentioned earlier in class: very "strange".
• If f is 1-1, then the interior (in R^{2}!) of f([0,1]) is ∅. Unlike the previous remark, which really does follow easily from the Mean Value Theorem, I don't know any "easy" way to prove this. The follows from a standard result of topology called invariance of domain, which is usually verified with tools from algebraic topology.
Today
Discussion of some of these results in the textbook:
6.11, 6.12, 6.13, 6.15, 6.16, 6.17.
Thursday
Finish this discussion, and discuss 6.19, 6.20, and 6.21 (and maybe
6.27).
Next Monday, the last day of class
Students would analyze the solutions of problems 2, 3, 5, 10 (a,b,c),
11, 15, 16, preparing for the final exam.
We did indeed discuss some parts of these results:
6.12, 6.15, 6.16, and a version of 6.17. I proved a version of 6.17
with a weaker hypothesis. Also I indicated why the derivative of the
Heaviside function (a jump) could be/should be the Dirac delta
function. The "generalized Riemann integral" I mentioned towards the
end of class is frequently called the
Henstock-Kurzweil integral.
Basic definitions
As in the text: f is a bounded function on [a,b]; α is an
increasing bounded function on that interval; P is a partition of
[a,b]; Δα_{j} which is how big α thinks the
subinterval is; U(P,f,α) and L(P,f,α) are the upper and
lower sums; ∫_{a}^{b}f dα and
∫_{a}^{b}f dα are the
upper and lower Riemann-Stieltjes integrals with respect to
α.
In order to keep from going crazy in html, I will refer to the upper
R-S integral as U∫_{a}^{b}f dα and
to the lower R-S integral as
L∫_{a}^{b}f dα. sigh...
A bounded function f on [a,b] is Riemann-Stieltjes
integrable with respect to α (brief written as
f∈R(α)) if
U∫_{a}^{b}f dα=L∫_{a}^{b}f dα.
The common value if it exists is called the Riemann-Stieltjes integral
of f with respect to α.
x
We showed that if f is continuous on [a,b], then f∈R(x). R(x) is
also called just R in the text. These are the usual Riemann integrable
functions. The verification used uniform continuity. I remarked (and
this came up again later) that actually the verification was valid for
any eligible α, that is, any α which is increasing in
[a,b]. So any continuous f is in R(α). Necessary and sufficient
conditions for a bounded f to be in R were given by Lebesgue and were
mentioned in the last lecture and a link was given
there to a careful statement and proof.
The Heaviside function
We took α to be defined piecewise by α(x)=0 if x<0 and
α(x)=1 if x>0. We investigated what
Δα_{j} was, and which functions were in R(α)
for the interval [-1,1]. We discovered a much simpler necessary and
sufficient condition: f∈R(α) if and only if f is
left-continuous at 0. This means
lim_{x→0-}f(x) exists and equals f(0). The
proof, once the definitions have been waded through, is not that
difficult.
Adding a point to a partition
We returned to the prosaic general theory by verifying that adding a
point to a partition increases (not strictly increases, necessarily!)
the lower sums and decreases the upper sums. Then we learned that
every lower sum is less than or equal to every upper sum. And we
learned that the lower L-S integral is always less than or equal to
the upper L-S integral.
Criteria for L-S integrability
We considered the statement: given ε>0, there is a
partition P_{ε} so that
U(P_{ε},f,α)-L(P_{ε},f,α)<ε.
This turns out to be equivalent to f∈R(α). We also
considered Riemann sums for the Riemann-Stieltjes integral. The
sums we have looked at, defined with sups and infs in the subintervals
of partitions, are sometimes called Darboux sums. But it turns
out that Riemann sums, which rely on f(sample point) instead,
also work in more or less the same fashion. I discussed this, and sort
of proved it. A detailed proof is in the textbook.
Test taking, etc.
The instructor offered some comments.
We saw the counterexample to the strongest form of the Mean Value Theorem (e^{ix} with x in [0,2π]). So a version of MVT with equality is false. You see, R^{1} is very nice. For topology folks, it is the only R^{n} where the order topology and the Euclidean metric topology are the same, so there are many coincidences. Also a version of L'Hop fails.
However, the MVT is really used in analysis as an inequality. In R^{1}, the result is |f´(x)-f´(y)|≤(Constant)|x-y| where f is differentiable between x and y, and "Constant" is some upper bound on |f´(ξ)| for ξ between x and y. We use this Lipschitz estimate, as we previously called it, in lots of ways. Now the text has a marvelous an efficient proof of Theorem 5.19 giving what turns out to be a best possible (although not so stated) such estimate for f:[a,b]→R^{n} when f is differentiable. Well, that proof looks magical but it comes from functional analysis and is an example of a big machine. Let me give a less efficient but non-magical proof.
Proof of a MVT estimate for differentiable
f:[a,b]→R^{n}
Here f=(f_{1},f_{2},...,f_{n}) is an n-tuple
of differentiable scalar (real-valued) functions. We want to estimate
|f(x)-f(y)|=|(f_{1}(x),f_{2}(x),...,f_{n}(x))-(f_{1}(y),f_{2}(y),...,f_{n}(y))|
and we will use a definitely low-tech trick that, for example, is
common in multivariable calculus: change one variable or coordinate at
a time. So define
g_{j}(x,y)=(f_{1}(y),...,f_{j}(y),f_{j+1}(x),...,f_{n}(x)).
Please realize that g_{0}(x,y)=f(x) and
g_{n}(x,y)=f(y). I may have not defined this precisely (hah!)
correctly in class -- I am sorry. Almost nothing is precise in class.
Then write |f(x)-f(y)|=|∑_{j=0}^{n-1}g_{j+1}(x,)-g_{j}(x,y)|. Notice that each of the differences of this telescoping sum has a change in one variable. Use the triangle inequality to overestimage this sum by the sum of the magnitudes of the differences. Now let's analyze each of the magnitudes of the differences.
So |g_{j+1}(x,)-g_{j}(x,y)|=|f_{j+1}(x)-f_{j+1}(y)| because the Euclidean norm or metric takes the square root of the sum of the squares of the differences, and all of the other coordinates have difference equal to 0. But by the "ordinary" MVT, |f_{j+1}(x)-f_{j}(y)|≤C_{j+1}|x-y| where C_{j+1} is some overestimate of |f_{j+1}´(&xi)| when ξ is between x and y. Therefore |f(x)-f(y)|≤|&sum_{j=1}^{n}C_{j}||x-y|. So indeed we have a Lipschitz estimate: |f(x)-f(y)|≤Constant|x-y|. The Constant here is some L^{1} overestimate of the size of f´ between x and y. The text shows that there is some ξ between x and y so that |f´(ξ)| can be taken as the Constant (the same ξ in each coordinate, and the L^{2} size is what's there. That is more efficient. Please take a look.
Presentation by Ms. Hood
Ms. Hood discussed the solution of
several problems about differentiation. Her work was prepared with the
help of Mr. Baldwin. Here is a very nice exposition which she
prepared.
Magic, magic, magic ...
The result presented can be given a very general form, and is used in
many circumstances. The generalization is known with varous names. It
can be called the Contraction Mapping Theorem or the Banach
Fixed Point Theorem. Stefan
Banach was was once of the early originators and expositors of
functional analysis. So what is the result?
Contraction mapping theorem
Suppose X is a complete metric space, and f:X→X is a
contraction. That is, there is a constant K with 0≤K<1 so
that d(x,y)≤Kd(x,y) for all points x and y in X. Then f has a
unique fixed point, p: f(p)=p, and, if x is any point in X, the
sequence {x_{j}} defined by x_{1}=x and
x_{j+1}=f(x_{j}) for j≥1 always converges with
limit equal to p.
Rather than a proof here, I refer you to the proof in the Wikipedia article. The proof is extravagantly simple, and, in fact, the whole setup is so simple (X and f and K) that the theorem has enormous uses in virtually all areas of mathematics.
If we "relax" the hypotheses to d(f(x),f(y))<d(x,y) then there need not be a fixed point as Ms. Hood shows. However, if X is compact, then this slightly weaker hypothesis does imply the result. You could prove this! (Look at the continuous real-valued function x→d(x,f(x)) on the compact metric space. What is its [achieved!] minimum? Why is the minimizing point unique?)
Completeness is needed. For example, x→(1/2)x is a contraction on (0,1) but it has no fixed point. Frequently compact metric spaces are used in applications, and this is o.k. since compact metric spaces are always complete.
Moving on
We finally began to move on to integration. I sketched where we were
going. I indicated (I'll be more formal later) the ideas of Riemann
which are now familiar to us from calculus and which can be restated
with only very slight modification in a totally rigorous way.
Integration
Suppose f is a bounded real-valued functon on [a,b]. Then we defined
partition and upper and lower sums of f with respect to the
partition. And the sup of the lower sums over all choices of (which
must exist because of the word "bounded"!) is called the lower
integral, while the inf of the upper sums is called the upper
integral.
Now there's a trick which shows that any upper sum is greater than or equal to any lower sum (I'll go through this later, but it basically involves taking a "common refinement" of the two (possibly distinct) partitions involve and then observing that "adding" one point to a partition (weakly) increases lower sums and (weakly) decreases upper sums. Therefore the lower integral is always less than or equal to the upper integral. When these integrals are equal, we declare that f on [a,b] is Riemann integrable and the common value is called ∫_{a}^{b}f.
Which functions are Riemann integrable?
Well, let's consider some examples.
When is a bounded function Riemann integrable?
Suppose f:[a,b]→R is bounded. Then f is Riemann integrable if
and only if the collection of discontinuities of f has measure 0.
The function f(x)=1/n if x is rational and m/n in "lowest terms" and 0 otherwise is Riemann integrable. Its set of discontinuities is the rationals. Given ε>0, you can, with some ingenuity, construction partitions with this function's lower sum=0 and its upper sum smaller than ε. The characteristic function of the rationals (1 on the rationals and 0 otherwise) is not Riemann integrable. Its set of discontinuities is all x and this is not a set of measure 0.
A set of real numbers has measure 0 if and only if, given any ε>0, the set is inside a countable union of intervals whose total length is less than ε. Professor Anton Schep of the University of South Carolina, has a very nice complete proof, with sufficient background to be understandable of Lebesgue's result. It is not very long (with introductory material, only 3 pages). I was wrong in class when I stated that this result is not in the textbook. A version appears in Chapter 11, far, far away.
Of course Lebesgue published his theorem as part of his complete overhaul of the theory of integration. My online dictionary gives for overhaul
a. take to pieces in order to examine. b. examine the condition of (and repair if necessary).Here is an extremely simple example of what does go wrong. Suppose f is the characteristic function of the rationals. It is 1 for rational numbers and 0 for irrationals. Then ∫_{0}^{1}f doesn't exist. But define the function f_{n} by f_{n}(x)=1 if x is the n^{th} rational (assume you have some enumeration in mind here, please) and f_{n}(x)=0 otherwise. Then since f_{n} has only one little discontinuity, it is Riemann integrable (with integral equal to 0). But f=∑_{n=1}^{∞}f_{n} clearly. So the Riemann integral doesn't work well with infinite sums. It needed fixing. In fact, there are many ways to fix it. There's some discussion of this in David Bressoud's books, mentioned here.
It should be true that the function which is piecewise defined to be 0 on (-∞,0) and x on (0,∞) should have derivative 0 on the left and 1 on the right. What happens at 0 shouldn't be very important. Now people declare that there is no classical solution whose derivative is 0 on the left and 1 on the right. There are notions of generalized solutions and distributional derivatives. This was all systematized in the 30's and 40's by Sergie Sobolev and Laurent Schwartz, more or less working independently. One quote in Schwartz's biography is interesting: To discover something in mathematics is to overcome an inhibition and a tradition. You cannot move forward if you are not subversive.
Let me try to explain the idea of a distributional derivative in the simplest case. Suppose f and g are functions on R^{1} and f is differentiable with derivative g, and g is a continuous function. Then we write f´=g, of course. We know from our discussion last time that there are many smooth (even C^{∞}) functions φ(x) whose support is compact. The support of a function is the closure of the set where the function is not 0. So, for example, the function which is 1 on (0,1) and 0 elsewherer has support equal to [0,1]. But we are interested in functions φ which are smooth and have compact support. Now look:
Step 1 Since f´(x)=g(x), of course
φ(x)f´(x)=φ(x)g(x).
Step 2 We integrate: ∫_{-∞}^{∞}
φ(x)f´(x)dx=∫_{-∞}^{∞}φ(x)g(x)dx.
Now it may seem that these are improper integrals, because of
the appearance of all the ∞'s. But remember that φ will be 0
outside of [-A,A] if A is a large positive number since φ has
compact support and compact sets are closed and bounded.
Step 3 Consider the left-hand side of the equation:
∫_{-∞}^{∞}φ(x)f´(x)dx. It is actually
∫_{-A}^{A}φ(x)f´(x)dx for some large A where φ(-A)=0 and
φ(A)=0. Now (clever, clever, clever!) integrate by parts:
∫_{-A}^{A}u dv=uv]_{-A}^{A}-∫_{-A}^{A}v du
This throws the derivative on the "other factor" with the penalty
being the boundary term (with the ]) and a minus sign. Here we will
take u=φ and v=f´. Notice that the boundary term, when both A
and -A are "plugged in", must be 0 because of the assumption about the
support of φ. Therefore ∫_{-∞}^{∞}φ(x)f´(x)dx is exactly the same as
–∫_{-∞}^{∞}φ´(x)f(x)dx.
Step 4 We have shown that if f´=g as classical functions,
with the classical derivative, then
–∫_{-∞}^{∞}φ´(x)f(x)dx=∫_{-∞}^{∞}φ(x)g(x)dx.
So how to define the derivative if a function is not
differentiable?
Now as a result of the sequence of steps above, we have a consequence
of f´=g which doesn't have a derivative of f! More importantly,
it turns out that the consequence is reversible. That is, if f has a
continuous derivative, g is a continuous function, and if the equation is true for all smooth
φ's with compact support, then f´=g. So now let's just
support that f and g are continuous. We will say if the equation is true for all smooth
φ's with compact support, then g is the weak or distributional
derivative of f. This turns out to serve wonderfully. There is a
tremendous amount of analytical (and algebraic!) intricacy in this
defintion, so it needs to be considered in detail later.
Back to the text ...
Consequences
Differentiability implies continuity. Algebraic combinations of
differentiable functions are differentiable. x^{n} is
differentiable (when n is a positive integer).
The Chain Rule
Proved very slickly by stuffing one linear approximation inside
another and realizing that the consequence was, indeed, the Chain
Rule.
The beginnings of a gallery
Everyone, young and old, learned and naive, should have a gallery
of functions to be considered when theorems and definitions are
considered. The instructor spent a chunk of time identifying good
initial candidates for such a gallery, from "ugly" to "nice". Of
course, the beauty of the qualities mentioned depend quite a bit on
the observer!
Writing in progress!
Increasing functions
I discussed increasing functions and how often they can be
discontinuous. This is in the textbook. I wrote and gave a warm
advertisement for a book: Functional Analysis by F. Riesz
and Nagy (also known and pronounced as "rees-naje". Reprinted by Dover
and on sale at Amazon for 14.93. A wonderful book!
Mr. Skalit
Discussed several problems in chapter 4 he had prepared with the help
of Mr. Kowalick. In particular, we saw
that metric spaces were normal.
Differentiability
The instructor defined and began a discussion of
differentiability and announced his preference for a definition
written in this form:
f:[a,b]→R is differentiable at c∈[a,b] with derivative Q if
there is a function E defined in a neighborhood of 0 with
lim_{v→0}E(v)=0 so that
f(x)=f(c)+(Q)(x-c)+E(x-c). This definition doesn't have any
divisions.
Some simple consequences were mentioned but the Chain Rule will be verified next time.
An exam was announced, to be given in two weeks. Ms. Hood volunteered (??!) to present a problem about differentiation, with the help of Mr. Baldwin.
Finishing the proof
We used compactness to verify uniform continuity. I continued with the
attempted (or rather, interrupted) proof. So X is
compact, and we have sequences {p_{n}} and {q_{n}}
with d(p_{n},q_{n})<1/n and
d(f(p_{n}),f(q_{n}))≥ε. Now {p_{n}}
is a sequence in a compact metric space. The sequence itself may not
converge, but compactness always guarantees that the existence of a
convergent subsequence, {p_{nj}}. We know that
there is an x so that
lim_{j→∞}p_{nj}=x. Notice that
the sequence {q_{n}} is sort of "dragged along" by the
corresponding sequence of p_{n}'s. That is, I claim the
subsequence {q_{nj}} also converges, and its limit
is x. Why is that? Well,
d(q_{nj},x)Δ≤d(q_{nj},p_{nj})+d(p_{nj},x).
The first term on the right is <1/n_{j}. For j large, this
will be small and stay small. And the second term behaves
similarly, because the p-subsequence converges to x. Now f is
continuous at x. Therefore given ε>0, we can find a
δ>0 so that if d(x,y)<δ then
d(f(x),f(y))<ε. But choose J so that for j≥J, both
d(q_{nj},p_{nj}) and
d(p_{nj},x) are less than δ/2. Then
d(q_{nj},x)<δ so
d(f(q_{nj}),f(x))<ε and
d(f(p_{nj}),f(x))<ε. Again the triangle
inequality applies, so
d(f(p_{nj}),f(q_{nj}))<2ε
for j>J. This is a contradiction to how we created the two
sequences initially (that is, it will be a contradiction if you allow
me to relabel 2ε as ε!). So we are done.
A consequence of uniform continuity?
I proved a rather remarkable result, and this will be especially
wonderful in many applications later. I mentioned that a "random" (?)
continuous real-valued function on R can be rather bizarre. One
explicit well-known example is xsin(1/x) on [0,1]. For another example
with even worse behavior, I investigated the Takagi function
graphically in the
9^{th} meeting of the Byrne Seminar on Experimental
Math. A picture drawn there is shown to the right. The Takagi
function (named for Teiji
Takagi (1875--1960)) happens to be an example of a continuous
nowhere differentiable function on [0,1], and it is also a function
which is neither increasing nor decreasing on any subinterval of [0,1] of positive length. So
continuous functions can be weird and wonderful. In view of
that, the result here is almost amazing. If you are willing to
tolerate only a little bit of error (as little as you would like) you
can approximate any continuous function by a piecewise constant step
function with finitely many values. David Tall, an English mathematics
educator, gives a detailed discussion of the properties of this function
(aimed at teachers) here.
To the right is a possible picture. The graph of a continuous function defined on a closed bounded interval is shown. The light blue band around the graph is ±ε deviation from the graph. The red horizontal lines are the graph of an approximating piecewise constant function with finitely many values. It "steps" because the values it assumes are taken on finitely many intervals of positive length. The result is remarkable to me, because no matter how bad (?) the function f can be, we can replace it by something with finite data (!) if you allow me to commit a very small error. In many applications, the step function is too coarse, and we may want piecewise linear or differentiable or ... whatever. But this result is the initial version. So:
Theorem Suppose f is a continuous real-valued function on
[a,b], and ε>0. Then there is a function g:[a,b]→R so
that the range of g is finite, g^{-1}(y) is either empty
or a finite set of intervals for all y∈R, and
|g(x)-f(x)|<ε for all x∈[a,b].
Proof [a,b] is compact and f is continuous, so f is
uniformly continuous on [a,b]. Therefore we can find
δ>0 so if |f(x_{1}-x_{2}|<δ, then
|f(x_{1})-f(x_{2})|<&epsilon. Now find a positive
integer n so |b-a|/n<δ. For all integers j between 0 and n-1,
we know that |f(a+j(b-a)/n)-f(x)|<ε for x between
a+j(b-a)/n and a+(j+1)(b-a)/n. Define g by this "rule": if x is
between a+j(b-a)/n and a+(j+1)(b-a)/n, g(x)=f(a+j(b-a)/n). Then we're
done!
Advertisement This is magic, and you should realize it. We have
developed a considerable amount of technique, and now many results,
including what's in this lecture, are "easy" to verify. Once we have
the step function, we can (approximately) integrate f, find its
(approximate) mean and standard deviation, etc.
d) implies our first inverse function theorem
There are a collection of results which are referred to as inverse
function theorems, including, especially, a result about
differentiable mappings between open subsets of R^{k}. This
wonderful result will be proved in 412. One of its standard proofs
uses the following result, which is also very frequently invoked in
algebraic topology. So:
Theorem Suppose f:X→Y is a bijection (1-1 and onto -- basically,
X and Y are the same as sets!, f is continuous, and X is
compact. Then f^{–1} is continuous.
Proof We will use d). To show that f^{–1} is
continuous, we will investigate if (f^{–1})^{–1}=f
has the following property: if C is closed in X, then f(C) is closed
in Y. But X is compact, so C, a closed subset, is compact. f(C), a
continuous image of a compact subset, is compact in Y. But compact
subsets of metric spaces are closed, so f(C) is closed in Y. And we
are done!
Advertisement This is magic. There is hardly any effort
to verify a result which is used constantly. Be aware, if you know
what a topological space is, that the result is true if only X is a
compact Hausdorff space.
[0,1)
As I mentioned, NJ State Law QD 324-17 (a math regulation)
requires that this example be presented immediately after the
statement and proof of the previous theorem.
Consider [0,1) with its usual topology. This is 1 point away (?) from being compact. The mapping f:[0,1)→R^{2} given by f(t)=(cos(2πt),sin(2πt)) is continuous. Considering the picture is more fun. The range of f is x^{2}+y^{2}=1, a closed and bounded subset of R^{2}, so the range is compact. But f^{–1} is not continuous. Why? The picture shown to the right gives some idea. Take the magenta colored dots in the range, which approach the image of 0 from the wrong way. These dots have a limit, the image of 0. But pulled back to [0,1), we get the green dots which have no limit in the domain of f. Thus f^{–1} takes a convergent sequence to a sequence which does not converge, and a continuous function can't do that.
A major result used in calculus
If f:[a,b]→R is continuous, then f([a,b])=[c,d]. This result
includes the Intermediate and Extreme Value Theorems. The proof is
(now!) easy. A proof from just the definition of continuity is likely
to be tedious. So why is this result true? [a,b] is compact and
connected, so f([a,b]) is compact and connected. Connected subsets of
R are intervals. The only compact intervals are those which are closed
and bounded. So we're done.
Connectedness
As mentioned, the lecturer thinks that connectedness as defined is a
difficult concept. The definition has a not very prominently
mentioned, so it may be irritating to verify that something is
connected (or not connected). There is a negative (?) aspect about the
logic. So here is a variant.
Arcwise connected
A metric space X is arcwise connected or pathwise
connected (both phrases are used) if for all pairs of points p and
q in X, there is a continuous function c:[0,1]→X so that c(0)=p
and c(1)=q.
Theorem If X is a pathwise connected metric space, then X is
connected.
Proof If X is not connected, then X=A∪B with A and B open,
non-empty, and disjoint. Take p∈A and q∈B. Then pathwise
connected provides a continuous c as in the definition. Since
0∈c^{–1}(A) and 1∈c^{–1}(B) and c is
continuous, then (using c)!) these are open subsets of [0,1],
disjoint, non-empty. But [0,1] is connected, which is a
contradiction.
Star-shaped subsets of R^{n} are arcwise connected, because a star-shaped set has a center, v (or, at least one center!), so that given any p in the set, the line segment from p to v is in the set. So we can connect p to q by detouring (?) through v. Convex sets are star-shaped. Open balls are therefore star-shaped.
For open subsets of R^{n}...
Much, although not all, of analysis in 411-412 will be in open subsets
of R^{n}. For such subsets, the concepts of arcwise connected
and connected coincide.
Theorem Suppose U is an open subset of R^{n}. Then U is
connected if and only if U is arcwise connected.
Proof I did not offer a proof in class, but look: suppose U is
connected. I want to show that U is arcwise connected. Fix
p∈U. Take A to be the set of all points q which can be
"connected" to p with a continuous image of [0,1], as in the
definition of arcwise connected. Then A is a connected set, surely. I
claim that A is open. Well, if q∈A, then since U is open, there
is r>0 with N_{r}(q)⊂U. I can connect from p to any
point in N_{r}(q) with a detour through q. So A is open. Also,
A is closed, since a limit point of q's, call it v, has a ball
N_{s}(v)⊂U. At least one q is in N_{s}(v), so go
from p to that q and then, inside the convex ball, to v. Since A is
open and closed and non-empty, A must be all of the connected set
U. (Or else what is left out is open, also, and makes U disconnected!)
Not the other way!
For unfriendly sets, arcwise connected and connected may be
different. For example, the topologist's
sine curve is connected but not arcwise connected.
There is a purported proof in the linked page, without much details
and also without a picture! Here is
a Wikipedia reference, with a picture, but not even a candidate for a
proof.
I sketched a rather clumsy proof. My effort would involved repeated use of the Intermediate Value Theorem.
All except the last were previously shown as equivalent. The last is an easy consequence of set and function manipulation results, using fact that a set is closed exactly when its complement is open. The great freedom we now have is to investigate and diagnose results about continuous functions (those which satisfy any/all of the previous properties) and use whichever characterization we like. Some are more convenient to use than others in certain situations.
Composition of continuous functions
We're given f:X→Y and g:Y→Z both continuous. Use c). Take U
open in Z, then g^{–1}(U) is open in Y, and
f^{–1}(g^{–1}(U))
is open. Now notice that (gof)^{–1}(U)
is the same as f^{–1}(g^{–1}(U)).
Distance as a continuous function
We saw that
d(x,y_{1})–d(x,y_{2})≤d(y_{1},y_{2})
because (Δ≤)
d(x,y_{1})<d(y_{1},y_{2})+d(x,y_{2}).
Then, switching y_{1} and y_{2}, we know
|d(x,y_{1})–d(x,y_{2})|≤d(y_{1},y_{2}).
Therefore if f(y)=d(x,y), f satisfies a Lipschitz estimate, and there
must be continuous. See what follows, please.
Lipschitz and locally Lipschitz
A function f:X→Y is Lipschitz
if there is a constant, A≥0 so that
d(f(x_{1},x_{2})≤A d(x_{1},x_{2})
for all x_{1}, x_{2} in X. Such a function must be
continuous. Use, for example, a). Given ε>0, take
δ=ε/A (hey, talk to me if A=0 and you can't figure out
what to do).
We'll say f:X→Y is locally Lipschitz if it is Lipschitz in
a neighborhood of every point. That is, given x∈X, there is
r>0 and A≥0 (A may depend on r and x) so that for all
x_{1}, x_{2} in N_{r}(x),
d(f(x_{1},x_{2})≤A d(x_{1},x_{2}).
Such functions are also continuous (same proof!).
Lipschitz conditions are fundamentally important in the standard statement of the existence and uniquenss theorem of ordinary differential equations.
MVT implies Lipschitz
This is jumping ahead a bit, but if f:R→R is differentiable, then
the Mean Value Theorem implies that
|f(b)–f(a)|≤|f´(c)| |b–a| for at least one c in the
interval between a and b. Therefore if we "happen" to know that
f´ is bounded on that interval, then f will be Lipschitz there,
and we know a Lipschitz constant (the sup of the absolute values of
the derivative). Of course, a differentiable function will
automatically be continuous, but this maybe gives some feeling for
what Lipschitz means: the mapping "stretches" distances by no more
than a factor of A, the Lipschitz constant.
Hölder and locally Hölder
Here is a similar idea. Suppose A≥0 and α≥0. Then
f:X→Y satisfies a (uniform) Hölder
condition if d(f(x_{1},x_{2})≤A (d(x_{1},x_{2}))^{α}. For example, the function
f:R→R defined by the formula f(x)=sqrt(|x|) satisfies such a
condition with α=1/2. It does not satisfy a Lipschitz
condition on all of R.
An analogous definition can be made for locally Hölder.
Functions satisfying Hölder conditions are also continuous. For example, take δ=(ε/A)^{1/α}. The "average" continuous function on [0,1] is not differentiable, but I think it does satisfy, in most places, a Hölder condition of order 1/2. So this is important in many fields (e.g., probability, math finance, physics [Brownian motion], etc.).
Algebraic combinations
Sums, products, and quotients of complex-valued continuous functions are
continuous. Probably the simplest way to verify this is by appealing
to similar statements about convergent sequences and using the
criterion in b) above.
Vector-valued functions and their components
Suppose
F=(f_{1},f_{2},...,f_{k}):X→R^{k}.
Then each component, f_{j}, is a function from X to R. And F
is continuous if and only if each of the f_{j}'s is
continuous. The easiest way to see this is probably to contrast the
Euclidean metric, L^{2}, in R^{k} with the
L^{1} and L^{∞} metrics, and use the constants
we have previously obtained in problem #3 of Homework #2 to show the
needed implications. This is also in the text, of course.
Perturbing the definition of continuous
Suppose f:X→Y. A version of the official definition of
continuity follows:
∀p∈X∀ε>0∃δ>0 if x∈X with d(x,p)<δ then d(f(x),f(p))<ε.A wonderful thing you could do to yourself is change the order or the type (∀ to ∃ or ∃ to ∀) of the quantifiers and see what that does to the definition. For example:
∀p∈X∃δ>0∀ε>0 if x∈X with d(x,p)<δ then d(f(x),f(p))<ε.I think such functions still are continuous, but additionally there aren't "very many" of them -- they are functions which are locally constant. That is, any function at any point has a neighborhood where it is constant. You might want to prove that such a function on a connected metric space must be actually constant.
What about this?
Here's another change to the definition.
∀ε>0∃δ>0 if x,p∈X with d(x,p)<δ then d(f(x),f(p))<ε.These functions are also continuous. But does every continuous function satisfy this statement? The statement is more mysterious, since, given ε>0, we're making a selection of δ>0 which will "work" for any pair of points in X.
An example
An instructive example is f:R→R defined by f(x)=x^{2}.
We can show that this function does not satisfy the previous
logical statement. We decided that we needed to produce an
ε>0 so that for any δ>0 there are points p, q in X
with d(p,q)<δ with d(f(p),f(q))<ε. Then we quickly
declared that this would be proved if we choose ε=1, say, and
created, for each positive integer n, points p_{n} and
q_{n} so that |p_{n}–q_{n}|<1/n but
|(p_{n})^{2}–(q_{n})^{2}|≥1.
The marvelous intervention of Ms. Slusky allowed us to choose p_{n}=n and q_{n}=n+(1/[2n]). We computed |(p_{n})^{2}–(q_{n})^{2}| and saw that it was greater than 1. The instructor then drew a picture intending to show that the graph of this f tilted more and more as |x|→∞ so that getting control over |f(x)–f(y)| even if the size of |x–y| was restricted became more difficult. For more pictures, please see the link below.
A definition
f:X→Y is uniformly continuous if
∀ε>0∃δ>0 if x,p∈X with d(x,p)<δ then d(f(x),f(p))<ε.Such functions are extremely important in applications, and even in such basic situations as integration. Please see a digression (or discussion) in Math 503 about uniform continuity in the basic complex variables graduate course.
Some other examples, perhaps elaborate
Example 1 x^{2} is not uniformly continuous on
R.
Comment about learning stuff Especially in math, when the
defintions get more and more complicated, a very good idea is to find
things which do not satisfy the definition yet are very closely
related to things which do satisfy the definition. The
differences may be useful in understanding the definition and its
consequences.
Example 2 MVT again: conside the function f:R→R defined by
f(x)=(3/[4+x^{2}])+17x+cos(4x). I claim that f is uniformly
continuous. I'll show this by checking that f satisfies a uniform
Lipschitz condition. Then the same δ can be used for any
ε and any x. Well,
f´(x)=–(6x)/[4+x^{2}]^{2}+17–6sin(x). I claim
that each of the "pieces" of this derivative is bounded in all of
R. That should be clear fro 17 and –6sin(x), so let's look at
(absolute value!) 6x/[4+x^{2}]^{2} for x>0. Uhh
... this is less than 1. Why?
6x≤[4+x^{2}]^{2}: true for x between 0 and 1,
certainly, since the largest that 6x gets there is 6, and the smallest
that the right-hand side gets is 16. And consider the derivatives:
6≤2([4+x^{2}]2x. The right-hand side is always at least
2(5)2=20 on [1,∞) so it is always larger than 6. Therefore the
original functions have the same inequality on [1,∞). This is
too much work! It should be easier.
Example 3 5sqrt(x)+17x on [0,∞). Well, the derivative is 5/(2sqrt(x))+17. Mr. Leven correctly objected here because the derivative is not bounded on [0,∞). O.k.: the derivative is bounded on [1,∞) so the function is uniformly continuous there. What about on [0,1]. Aha! Look ahead.
A major theorem
If f:X→Y is continuous and X is a compact metric space,
then f is uniformly continuous.
Attempted proof
There's a nice proof in the textbook, using essentially problem 5a)
from the first exam. Let me try to give a different proof. What if the
theorem were false? Then we would have some ε>0 and two
sequences of points {p_{n}} and {q_{n}} (remember the
verification of the x^{2} example above) so that
d(p_{n},q_{n})<1/n but
d(f(p_{n}),f(q_{n}))>ε. This alone is not
enough to guarantee a contradiction, because we could have such
sequences in R and not get into trouble. (In class I gave sequences in
R^{2}, p_{n}=(0,n) and q_{n}=(0,n+{1/n}) but
we already had the examples in the x^{2} verification.) What
will help us here?
Hint Compactness! This guarantees not that a sequence itself converges, but that some subsequence does. This will be enough to get a contradiction. to be continued ...
I proved this in detail, perhaps too much detail. The process used the convergence definition applied to the infinite series ∑_{j=1}^{∞}a_{j} and the infinite series ∑_{k=1}^{∞}b_{k}, and the Cauchy criterion applied to ∑_{j=1}^{∞}|a_{j}| and to ∑_{k=1}^{∞}|b_{k}|. The proof used the "technique" of problem 7 in the first exam, and was guided by a version of the rather bizarre picture to the right.
♥Love for absolute convergence♥
The rearrangement result and the product summation result declare that
commutativity and associativity are correct when applied in
(essentially) any way to absolutely convergent series and to their
algebraic manipulations.
Other results about products
I stated Merten's Theorem and another result. Please see the text.
Limits and sequences
I stated the ε–δ definition of limit, and proved that it
was equivalent to the sequential statement. This is as in the text.
Mr. Baldwin kept me honest.
Continuity
I defined continuity of a function at a point, if the function mapped
one metric space to another. Then I defined continuity of a
function. I showed that this definition ("Inverse images of open sets
are open") was equivalent to requiring an ε–δ statement
at each domain/range point pair. This is as in the text. All is an
abstract version of the result on real-valued functions defined on the
real line which was stated several weeks ago.
Pictures
Ms. Pritsker suggested that I try to
show some pictures about what was happening, and I thank her for
this.
In this first example, the function shown (from R to R) has what's called a jump discontinuity. The inverse image of the green open interval is traced (backwards) and the result seems to be a half-open interval. So this example fails the "inverse image open" criterion for continuity. | |
The second example has a more complicated discontinuity. I tried to sketch, in class and here, the function f whose value, f(x), 0 if x≤0 and is sin(1/x) if x>0. Then the inverse image of a small open interval centered around 0 includes (–∞,0] and also includes a countable collection of open intervals, getting smaller and smaller as they "pile up" at 0. Note, though, that although 0 is in the inverse image, there is no open neighborhood (consider the right "half") which is a subset of the inverse image. |
The great theorems
Chapter 4 concentrates on precise statements and proofs of some of the
most basic theorems in analysis about continuity. These results seem
to have their historic source (sort of) in the work of Bolzano.
The calculus version of the theorems is the following almost
ludicrously simple statement:
Suppose f:R→R is a continuous function, and a≤b. Then the set of values of f([a,b]) is [c,d] with c<d.This result of course includes the Intermediate Value Theorem and the Extreme Value Theorem. By the way, the converse of this result is not, I believe, correct (that is, continuous functions are not the only functions from R to R which obey the conclusion of this theorem).
Notice that closed bounded intervals in R can be characterized as proper non-empty subsets of R which are both connected and compact. We will prove the result above by showing that continuous images of compact sets are compact, and continuous images of connected sets are connected. Sigh. Both compact and connected are defined in terms of open coverings. So it turns out that the "inverse image open" version of the definition of continuity is exactly suited for quite simple proofs of the results needed. But, please, the apparent simplicity is only superficial. The whole theory is aimed at these results, and 150 years of work have gone into constructing a Definition/Theorem/Proof/Example succession which seems simple and nearly effortless. Historically the results and the ordering involved a great deal of work.
Homework
Please read chapter 4. The top vote-getters in the poll for "Problems
students want to do" are Chapter 3: 6, 10 and Chapter 4: 4, 6,
7, 11. I will add one more problem in a formal assignment on Monday.
An example
Here is one (delightful?) example of the sort of computation which made people very
uneasy historically. You may remember from the first year of calculus
study (geometric series, Taylor series, remainders, etc.) a fact about
the Alternating Harmonic Series.
ln(2)=1–(1/2)+(1/3)–(1/4)+(1/5)–(1/6)+(1/7)–(1/8)+...That the series converges follows from a version of the Alternating Series Test which we will verify in a few minutes. The specific value of the sum relies on things like Taylor's Theorem.
Now replace each positive term by twice itself minus itself. I hope you can convince yourself this does not change convergence or the sum of the series. Now this is true:
ln(2)=(2–1)–(1/2)+(2/3)–(1/3)–(1/4)+(2/5)–(1/5)–(1/6)+(2/7)–(1/7)–(1/8)+...and now divide this series (and its sum) by 2. I mean (emphasis!) divide everything by 2. Here is the result:
ln(2)/2=1–(1/2)–(1/4)+(1/3)–(1/6)–(1/8)+(1/5)–(1/10)–(1/12)+(1/7)–(1/14)–(1/16)+...It is not difficult to check carefully that what is apparently correct actually is correct. The series on the right-hand side is a rearrangement of the original Alternating Harmonic Series. Wow! This rearrangement converges to half of the original value. We don't really need to know what that value is, but it is (relatively) easy to see that the original value was actually positive. So rearrangements of this series do change the sum!
Riemann Rearrangement Theorem
If a series of real numbers converges yet diverges absolutely
(conditional convergence) then there is a rearrangement which
converges to any real number. I gave a proof of this. This is a weaker
version of the result in the textbook. Please look there.
The better version in the text
Rearrangements can be devised to have the sequence of partial sums
with lim sup and lim inf very arbitrarily specified.
There are versions of the theorem which apply to complex series. The
conclusions then are more complicated, though.
Summation by parts
We went through the discussion of summation by parts. Several students
clearly understood it better than the instructor. Sigh. This was
discussed because summation by parts can be applied to several topics
in Fourier series.
Some consequences (Alternating series test)
We stated and proved the standard alternating series test.
Products of series
Here we began a discussion of products of series, which is a more
complicated undertaking than is immediately apparent. We want to start
with two convergent infinite series, ∑_{j=1 (or 0)}^{∞}a_{j}
and ∑_{k=1 (or
0)}^{∞}b_{k} and then analyze the numbers
a_{j}b_{k}. We'd like to figure out some way to
"assemble" these numbers into an infinite series, whose sum, we might
hope, is AB, the product of the sums of the two series we began with.
Taylor series and Cauchy products
Thinking about Taylor series almost immediately inspires (?) the
definition of the Cauchy product. We "know" from calculus ("know"
means that some examples were shown then!) that if
f(x)=∑_{j=0}^{∞}a_{j}x^{j}
and
g(x)=∑_{k=0}^{∞}b_{k}x^{k}
then a way (?) to organize the product is to suppose it is
F(x)=f(x)g(x), a product of functions. And we would hope that
F(x)=∑_{t=0}^{∞}c_{t}x^{t}
where now
c_{t}=∑_{q=0}^{t}a_{q}b_{t–q}.
These formulas are gotten in several ways: first, we rewrite the
product by assuming that manipulations with infinite polynomials work
exactly like those with finite polynomials. Or we can hope that a
Taylor's Theorem is at work behind the scenes, and the summation
exactly reflects the product rule for l^{th}
derivatives. The sum is sometimes called a convolution of the
two sequences. It is unfortunate or, perhaps, interesting, that many
of these hopes are not exactly true all of the time we would hope. Oh
well. In particular,here is no clear relationship between convergence
of the factors and convergence of the Cauchy product. See the example
mentioned below.
Example in the text
A neat example in the text shows that the Cauchy product of a
convergent series (with itself!) need not
converge. Interesting. I have little to add to the account in the
textbook.
Dirichlet series and Dirichlet product
Here I defined Dirichlet series, which are rather useful in number
theory, and some other areas which don't seem to be immediately
related. These are series of the form
f(s)=∑_{n=1}^{∞}(a_{n}/n^{s}).
Such series are natural in number theory and complex analysis. The
most famous example occurs when all of the a_{n}'s are 1. This
is ζ(s)=∑_{n=1}^{∞}(1/n^{s}),
the Riemann zeta
function. Learn more about this, and make a
milliion dollars!. Dirichlet series have a whole theory of their
own. For example, just as power series have a radius of
convergence inside which they converge absolutely, Dirichlet
series similarly (and for much the same reasons!) have an abscissa
of convergence, a line x=Constant in the complex plane. The series
converges absolutely to the right of that line and diverges to the
left of it. For ζ(s), that line is x=1. Or, as we say in the
complex analysis game, Re(s)=1.
If we have another Dirichlet series, say g(s)=∑_{m=1}^{∞}(b_{m}/m^{s}), we might consider F(s)=f(s)g(s) and think about ∑_{n=1}^{∞}(a_{n}/n^{s})·∑_{m=1}^{∞}(b_{m}/m^{s}), and maybe reassemble into another Dirichlet series, so: ∑_{N=1}^{∞}(c_{N}/N^{s}) and this should be the same as ∑_{n=1}^{∞}∑_{m=1}^{∞}(a_{n}/n^{s})(b_{m}/m^{s}). So we might want c_{N}/N^{s} to be, well, which terms of the product? To match up (1/n^{s})(1/m^{s}) with 1/N^{s} we need nm=N.
Therefore we could define the Dirichlet product of two series (dropping the s stuff) ∑_{n=1}^{∞}a_{n} and ∑_{m=1}^{∞}b_{m} to be the series ∑_{N=1}^{∞}c_{N} where c_{N}=∑_{n·m=N}a_{n}b_{m}. Here it is divisors which are important. You can see if factoring integers and primality is interesting, such products might be more revealing than Cauchy products. One can then ask if convergence of the factors always implies convergence of the Dirichlet product. The answer, with no further hypotheses, is no. There are examples which don't have the desired behavior.
Absolute convergence and products
We will briefly discussion summation methods for products of
series. Let me start with 1 in these series, and starting with 0 would
give a similar result. So let's have two series
∑_{j=1}^{∞}a_{j}
and
∑_{k=1}^{∞}b_{k}
which converge and which have sums A and B respectively. Consider the
numbers a_{j}b_{k} which I could think of as "sitting"
each at a lattice point, NxN, of the plane. Now we will
consider
a sequence of finite subsets
{W_{t}}_{t=1}^{∞} of NxN, with
the following properties:
1. They are nested: W_{t}⊂W_{t+1}.
2.
∪_{t=1}^{∞}W_{t}=NxN: the
union is "everything".
Examples of such W_{t}'s come from both Cauchy and Dirichlet products. Adding up the terms which come from the W_{t}'s exactly will correspond to the partial sums of each of these products. The easy and natural result (I love absolute convergence!) is the following.
Theorem Suppose that the series
∑_{j=1}^{∞}a_{j}
and
∑_{k=1}^{∞}b_{k} both converge
absolutely and have sums A and B respectively. Then the
sequence of (possibly complex!) numbers
s_{t}=∑_{(k,l)∈Wt}a_{j}b_{k}
converges, and its limit is AB.
Actually, even slightly more is true and follows from the preceding
result. That is, the sequence whose t^{th} term is
∑_{(k,l)∈Wt}|a_{j}b_{k}|
also converges, and its value is ≤ the product of
∑_{j=1}^{∞}|a_{j}|
and
∑_{k=1}^{∞}|b_{k}|. So any
method of summation of the product series of absolutely convergent
series "works" and gives the correct answer.
Proof?
We began thinking about the proof and will complete it next time.
A volunteer and his coach
Mr. Skalit, helped by the valiant Mr. Kowalick, will present problems 20, 21,
and 22 of chapter 4 in class, probably a week from Thursday.
Problems to be considered
The students in the class will consider Chapter 3: 6, 7, 10; and
Chapter 4: 4, 5, 6, 7, 11, 13, 25. Votes for the best problems will be
counted, and the 6 or maybe 7 highest vote-getters will be assigned as
homework, due on Thursday, November 26.
The article I mentioned in class which taught me new things about the Ratio Test is this: The mth Ratio Test: New Convergence Tests for Series and is written by Sayel A. Ali. It appears in the June-July 2008 issue of the American Mathematical Monthly.
I started to discuss ♥Why I love absolutely convergent series!♥ My first reason was that rearrangements of such series also converge, and converge to the same sum. This is nice. I didn't quite finish the proof, and I will give an interesting example next time, and a result of Riemann's which explains what happens when the series does not converge absolutely.
The lecturer continued in a more pedestrian fashion. He defined infinite series. He talked about translating various sequence statements to infinite series: the Cauchy criterion, series with non-negative terms, absolute convergence, comparison, p-series via the Cauchy condensation theorem, and a brief anticipation of a version of the ratio test. More next week!
Inequalities and limits
The only Euclidean space R^{k} where an order topology
coicides with the usual Euclidean metric topology is R^{1},
that is, R, the real numbers. Some special results follow from using
the order in R. For example, we have this theorem:
Theorem Suppose {x_{n}} and {y_{n}} are two
real convergent sequences with limits L_{x} and L_{y},
respectively. If for all positive integers n we know that
x_{n}≤y_{n}, then L_{x}≤L_{y}.
A proof of this was shown, using contradiction. If
L_{x}>L_{y}, then take
L_{x}–L_{y}=2ε is positive. Take n large
enough so that |x_{n}–L_{x}|<ε and
|y_{n}–L_{y}|<ε. Then
–ε<x_{n}–L_{x}<ε and
–ε<y_{n}–L_{y}<ε. So we know
that –ε<L_{x}–x_{n}<ε (multiply
the first inequality by –1)
so that
x_{n}–ε<L_{x}<x_{n}+ε.
Similarly,
–y_{n}–ε<–L_{y}<–y_{n}+ε
so that
(x_{n}–ε)+(–ε–y_{n})=(x_{n}–y_{n})–2ε<L_{x}–L_{y}<(x_{n}–y_{n})+2ε=(x_{n}+ε)+(–y_{n}+ε).
This is already a problem since
L_{x}–L_{y}=2ε.
The instructor asked if a converse to this result was true, and was greeted with scorn. Indeed, no, since 0<0 but the first 0 could be the limit of the sequence {–1/n} and the second 0 could be the limit of the sequence {1/n}. Huh. But how about this:
Theorem Suppose L_{x} is the limit of the sequence {x_{n}} and L_{y} is the limit of the sequence {y_{n}}. If we know that L_{x}<L_{y}, then there is a positive integer N so that if n>N, x_{n}<y_{n}.
The proof of this is that (taking 2ε=L_{y}–L_{x}) if |x_{n}–L_{x}|<ε and |y_{n}–L_{y}|<ε again, then unroll the absolute values as before, you will see that the result quoted is true.
Comment about SUBTRACT INEQUALITIES?
Notice that 1<2 and 1<5 but 1–1<2–5 means 0<–3 which is
certainly false! Generally, you cannot subtract
inequalities and be sure that the result is correct.
More subtle results occur, but we need a few more definitions.
∞ and –∞
Suppose {x_{n}} is a real sequence which has the following
property: for all M∈R, there is a positive integer N_{M}
so that if n≥ N_{M}, then x_{n}>M. Then we will
say that lim_{n→∞}x_{n}=∞.
If we changed "x_{n}>M" in what's above to
"x_{n}<M" then we will write
lim_{n→∞}x_{n}=–∞.
lim sup and lim inf
Suppose {x_{n}} is a real sequence. Define S to be the set
of all subsequential limits of this sequence. This is a collection of
numbers in R^{*}, which I guess is
R∪{∞}∪{–∞}, and we will think of
R^{*} as an ordered set with the (more or less) obvious
ordering. I asked if we could find examples of sequences with
S=∅ and S=R^{*}. Well ...
NO!
How to think about this: a sort of bisection method. Now consider
[–∞,∞] and split it into two parts,
[–∞,0]\cup;[0,∞]. Look at N: let's call an integer n
left if x_{n}∈[–∞,0] and call it
right. Hey, if it is both left and right call it both. In any
case, N is now the union of a set of left and right sets, so one of
them has infinitely many elements. Let's suppose it is the right
one. Then consider [0,∞]=[0,1]\cup;[1,∞]. Hey: left/right
again. If left triumphs (has infinitely many!) then split up [0,1]
into [0,1/2]\cup[1/2,1] ETC.. If right wins, then
split [1,∞] into [1,2]\cup[2,∞] ETC.
Eventually we will get a finite limit or a subsequence which is pushed
to infinity. In either case, S≠∞. Whew! So there must be at
least one subsequential limit.
YES!
There is a sequence with S=R^{*}. Take {x_{n}}
to be any "enumeration" of the rationals: so this refers to a function
f:N→R which is 1–1 and whose range is Q. Then S is
R^{*}. Why? Because in any real interval of postive
length there are infinitely many rationals, so given any such
interval and given any integer J, there is always j>J with
x_{j} inside the designated interval. It is not difficult to
use this observation to verify that S is indeed R^{*}.
Question Is there a sequence which has S=R? You think about this, please.
lim sup and lim inf
Given {x_{n}} consider S, the set of all subsequential
limits. Then lim sup is the sup of S (considered as a subset of
R^{*}) and lim inf is the inf of S.
Some special sequences
The sentence "And now we begin." ends the novel, "Portnoy's
Complaint", by Philip Roth. This sentiment could be echoed by many
analysts at this point in the course. We now start looking at some
concrete examples of sequences and then series.
The examples
We saw as a consequence of our study of the Archimedean property, that
The lecturer falls, and no student picks him up!
We tried to consider the sequence {a^{1/n}} for a>0. This
sequence does converge, and its limit is 1. After some
confusion regarding the Bernoulli inequality, little progress was made
towards the verification of the convergence and limit claim. Sigh. Deferred
to Thursday's class, I suppose!
Writing in progress!
He continued with a clumsy proof that the set of subsequential limit
points of a sequence in a metric space is always a closed set. There
is a tiny bit of diagonalization needed in this proof. The lecturer
obfuscated this as much as he illuminated it.
Cauchy sequences
A major discussion was started with the definition of Cauchy sequence, a very important concept in
analysis and, indeed, in all of mathematics. Given {x_{n}} in
the metric space (X,d), we call this sequence Cauchy if given
any ε>0, there is a positive integer N_{ε}
so that if n and m are integers ≥N_{ε}, then
|x_{n}–x_{m}|<ε.
Cauchy seuqences are candidates for convergent sequences. They are
defined, though, only with "internal", self-referencing criteria, so
that knowing such a sequence must converge (as it must, under
certain circumstances!) is neat. In R and R^{n} Cauchy
sequences must converge. The convergence of many algorithms is
guaranteed using the Cauchy criterion (that is, by proving that the
sequences produced using the algorithm are Cauchy). We looked at some
examples in R and some conditions in R.
Example If {x_{n}} is a real sequence and we know that
|x_{n}–x_{n+1}|<1/n, then such a sequence need not
converge and such a sequence need not be Cauchy. Here the example
(reaching for our knowledge of calc 1) is
x_{n}=∑_{j=1}^{n}1/j, the sequence of
partial sums of the harmonic series. This sequence does not converge
(hey: x_{2n}≥1+[(n–1)/2] with an easy argument,
so the sequence is not even bounded). Why is this sequence not
a Cauchy sequence? Well, we need to compare x_{n} and
x_{m} when n and m are both "free" and large. If we make n
large with m>n, the triangle inequality will only provide a bound
like this:
|x_{n}–x_{m}|≤|x_{n}–x_{n+1}|+...+|x_{m–1}–x_{m}|<∑_{j=n}^{m}1/j,
and that sum is not bounded.
Example The metric space Q has Cauchy sequences which do not
converge. Any convergent sequence in a metric space is Cauchy (we'll
state that formally in a few microseconds). Now take a rational
sequence which converges to sqrt(2) in R. It is Cauchy in R, and hence
in Q (the metric is the same). But it can't converge in Q since there
is nothing for it to converge to. (If it converge to w in Q, the same
inequalities would make it converge to w in R. But then since the
limits are unique, sqrt(2) would be in Q!)
Example Look at (0,1), the open unit interval in R. The
sequence {1/n} is in (0,1). In R it converges to 0, and thus it must
be Cauchy, but it can't converge to anything in (0,1) since otherwise
it would converge to w>0. But the sequence converges to 0 in R, so
(just as before) this is impossible.
A convergent sequence is Cauchy
Use a triangle inequality argument.
If a subsequence of a Cauchy sequence converges, then the original
sequence converges.
Use a triangle inequality argument.
A Cauchy sequence is bounded
Indeed, take ε=1. Then the whole "infinite tail" of the
sequence after the N_{ε} term is within distance 1 of
x_{Nε}. Now take a ball big enough to
enclose also the finitely many other points left out.
The Direichlet problem, a historical digression
This is a problem in partial differential equations which originated
in classical mathematical physics. Attempts to solve this problem in
the late 19^{th} centuries led to recognition that "closed and
bounded" is not enough to guarantee convergence in many natural
"function spaces", specifically function spaces which were used to
analyze the Dirichlet problem. This failure or, rather, perhaps, perceived deficiency led
to recognition of the importance of compactness, and to the invention
of the notions and methods that we are currently discussing.
We went on ...
The lecturer really deliberately tried to investigate the connection
between Cauchy and compactness in a rather diffuse non-linear
way. This is often how new mathematics develops!. The sequence
of ideas went like this:
Theorem Any Cauchy sequence in a compact metric space
converges. (A direct result of the finite intersection property.)
Theorem Any Cauchy sequence in R^{k}
converges. (Because any such sequence is bounded, and is therefore
inside a very big (maybe!) compact set, and therefore can be consider
to be in a compact set, and we are done.)
The textbook has a much more linearly ordered and careful presentation.
We then proved (as in the text) that sums of complex-valued convergent sequences are convergent to the sums of the respected limits. Also, if a sequence converges to a non-zero complex number, then "eventually" the sequence is non-zero, and that the sequence obtained by taking reciprocals of the elements of that tail of the original sequence converges, and the limit is the reciprocal of the limit of the original sequence. These techniques are basic and must be part of all mathematicians' subconscious.
I looked at the product of two metric spaces. We put a metric on this
product, defined in a way very similar to the Euclidean metric in
R^{2} from d(x,y)=|x–y| in R^{1}. I observed, very
briefly, that metrics giving the same topology on the cartesian
product are the analogues of the L^{1} and L^{2} and
L^{∞} metrics mentioned in the last homework
assignment. Then we saw that a sequence in the product converged if
and only if the components in each factor (X and Y, respectively)
converges. The same is true in any finite product, but with infinitely
many factors things get much more complicated. (The last part of the
second problem of the Entrance Exam deals with similar obstacles.).The
metrics which we considered here (L^{1} and L^{2} and
L^{∞}) and which give, in finite dimensions, the same
topologies and the same notion of sequence convergence, all
turn out to be different in infinite dimensions. This is either
distressing and bewildering, or wonderfully enriching!
We will use the observations about products of metric spaces almost
always with R^{k} (where k is a positive integer).
We then moved on to subsequences. I tried to be very careful about defining a subsequence, and admitted my occasional confusion between a sequence (a mapping from the ordered set N to X) and the image of the mapping, a subset of X. Here is a weird example.
Example
Since Q, the rationals, are countable, there is a bijection
F:N→Q. Fix any such bijection for this discussion. Now fix
any x∈R. I claim there is a subsequence G:N→Q of F so
that G converges to x. Why? Well, we must write G as FοI where
I:N→N is strictly increasing. I will create I inductively as
follows.
So the "structure" of subsequences can be quite complicated. I don't think this should be too surprising since there are, after all, uncountably many subsequences.
We stated a result something like this:
Theorem Suppose {x_{n}} is a sequence in a metric
space. The following are equivalent:
He began chapter 3, defining convergence of sequences in metric spaces. He showed that a convergent sequence was bounded, etc. He proved that the product of convergent sequences of complex numbers converged, and its limit was the product of the limits of the factor sequences.
The Heine-Borel Theorem
If E is a subset of R^{k}, then the following statements are
equivalent.
A counterexample (not really!)
This is a rather simple but still instructive example.
Suppose X is any set, and d is the discrete metric. That is,
d(x,y) is 1 when x≠y and is 0 when x=y. Every subset of X is an
open set always (and every subset is a closed set!). The compact
subsets of X are exactly the finite subsets. The diameter of every
subset of X is at most 1. So if X is infinite, there are certainly
many closed and bounded subsets which are not compact. So the H-B
theorem is not true in all metric spaces.
I note that most "infinite-dimensional" situations which arise in analysis (and, indeed, even in certain areas of algebra) naturally many subsets do not satisfy the H-B Theorem. Certainly compactness implies closed and bounded in a metric space, and several textbook homework problems show that the third condition implies compactness in any metric space. But generally the friendly hypothesis of "closed and bounded" will not imply compactness.
Proof of H-B
This was very much like what's in the textbook, with only minor
variations introduced by the instructor to ... uhhhh ... challenge the
students. Yeah, that's right: challenge the students.
Perfect sets are uncountable
Thanks to Ms. Pritsker and Ms. Hood. Here is her writeup.
The most wonderful perfect set, the Cantor set
I tried to analyzed this set informally. It is an uncountble set. The
intervals taken out of it from [0,1] have total length equal to
1, so maybe it should have total length 0. The points in the Cantor
set are those whose ternary (base 3) expansions have no 1's. The
Cantor set and variations of it are used as the foundation for many
disturbing and unintuitive examples in analysis and topology.
It would be nice to tell you that the Cantor set and its relatives are unnatural, etc., except that certain dynamical systems which closely model physical and biological systems have certain types of behavior closely tied to the Cantor set!
Homework due on Thursday, October 9
The third homework assignment is due a
week from Thursday.
Out of town
I will be away from Thursday, October 2, to Sunday, October 5.
Compact
Define of open cover and compact using a finite
subcover.
Compactness turns out to be of fundamental importance in many numerical computations. This is not at all clear from the definition and discussion we're about to do. We are beginning at the tail end of a century of struggle with how to handle compactness and related notions, and the "perfection" shown is a bit difficult for a novice to grasp and certainly to understand in a meaningful way.
Examples
We verified from the definition that
[0,1] is compact.
Here's how Suppose {G_{α}} is an open cover of
[0,1]. Let S={s∈[0,1] : [0,s] has a finite subcover by elements
of {G_{α}} }. Then:
We verified from the definition that
(0,1) is not compact.
And why this is true As Mr. Baldwin suggested, cover
(0,1) by (1/n,1) for each positive integer n. Then the Archimedean
property implies this is an open cover (every positive number is
greater than some 1/n) and the union of every finite subcover
is exactly its "highest" element, and none of the (1/n,1)'s is equal
to (0,1).
Balls are enough to check compactness
We need only use open balls instead of open sets in the covers we use
to test for compactness.
2.33
Suppose K⊂Y⊂X. Then K is compact relative to X if and only if
K is compact relative to Y.
I think we used balls here, which might be slightly nicer than the
proof of the text.
Suppose X=R^{2} and Y=R^{1} (considered as a subset of
R^{2} with y∈Y ↔ (y,0)∈X) and with both sets
having topologies determined by the usual metric. Notice that the
metric in R^{2} defined by sqrt((a_{1}–a_{2})^{2}+(b_{1}–b_{2})^{2}) is just |a_{1}–a_{2}| when we
consider points on the horizontal axis. K should be a subset of Y.
Now suppose we have a cover of K by open balls in Y and
we know that K is compact in X. The picture to the right is an effort
to illustrate the situation. I drew a collection of green intervals. I
had to lift them up a little bit ("up" means vertical motion) to show
them and not have them overlay and conceal K. If we use the same
centers and the same radiuses and create open balls in X, then, since
the green intervals cover K, the resulting magenta discs will also
cover K. This is because the intersection of each magenta disc exactly
equals the green interval which it specified. If we know that K is
compact in X, then a finite union of the magenta discs will have K as
a subset. The related green intervals will have K as a subset, also,
since a point in K is in a magenta disc exactly when it is in the
related green interval. We have proved that if K is X compact then it is Y compact. | |
Now suppose we have a cover of K by open balls in X and
we know that K is compact in Y. So K is in a union of magenta
discs. Take one such disc. Its intersection with Y is not
necessarily an open ball in Y. It must be an open subset of Y,
because each point in the disc is an interior point (we proved that
open balls are open!). Therefore each point in the intersection of a
magenta disc with Y is contained in a green interval -- that is, an
open disc in Y with center in Y and positive radius. This may
be a big collection of green sets, but I don't care: the collection of
all of these green intervals, for all of the magenta discs, is an open
cover of K in Y. Since K is compact in Y, there's a finite subcover by
the green intervals. Each green interval comes from one of the magenta
discs. So for each green interval in the finite subcover, take the
associated magenta disc. The resulting collection of magenta discs is
a finite subcover in X of K. We have proved that if K is X compact then it is Y compact. |
2.34
Compact subsets of metric spaces are closed.
Proof as in the text. This is a brief and clever proof and the ideas
are used elsewhere.
2.34
Closed subsets of compact sets are compact.
Proof as in the text.
Corollary
If F is closed and K is compact, then F∩K is compact.
2.36
If {K_{α}} is a collection of compact subsets of a
metric space X such that the intersection of every finite
subcollection of {K_{α}} is nonempty, then
∩K_{α} is nonempty.
What's written in the hypothesis is called the Finite Intersection
Property. One of my colleagues remarked that this result is no
more than DeMorgan's Laws. I have always found this confusing. So,
with the help of many charitable students, I tried to state the
compactness definition carefully and logically. I then wrote the
contrapositive. I then used DeMorgan's Laws to translate the
statements we obtained. If we consider everything as a subset of one
of the given K_{α}'s, we proved the theorem.
This is the
proof in the text, where it is given not so histrionically.
Bonus vocabulary word: histrionic Of, or relating to actors or
acting; Excessively dramatic or emotional.
Corollary
If {K_{n}} is a sequence of nonempty compact sets such that
K_{n}⊃K_{n+1} (n=1,2,3,...), then
∩_{1}^{∞}K_{n} is not empty.
Proof as in the text. People sometimes say "the K_{n}'s are
nested" instead of "K_{n}⊃K_{n+1}".
2.37
If E is an infinite subset of a compact set K, then R has a limit
point in K.
Proof as in the text.
2.38
If {I_{n}} is a sequence of intervals in R^{1} such
that I_{n}⊃I_{n+1} (n=1,2,3,...) then
∩_{1}^{∞}I_{n} is not empty.
Useful and relevant examples
slightly restated 2.38
Make the intervals both closed and bounded. Then
the result is true. This phrase has great "resonance" historically.
If I_{n}=[a_{n},b_{n}], then
a_{n}<b_{m} when n and m are positive integers. If
α is sup{a_{n} :n∈N} and if β is
inf{b_{m}: m∈N} then (not totally easy exercise)
α<β, and then [α,β] is
∩_{1}^{∞}I_{n}, and certainly the
interval is not empty.
Proof as in the text.
2.39
Let k be a positive integer. If {I_{n}} os a sequence of
k-cells such that I_{n}⊃I_{n+1} (n=1,2,3,...) then
∩_{1}^{∞}I_{n} is not empty.
We briefly discussed what "k-cells" were.
Here are pictures of 1- and 2- and 3-cells. A k-cell is the Cartesian
product in R^{k} of k closed and bounded intervals, one in
each factor.
The proof as in the text.
This question is more suitably part of Math 441, but we briefly discussed certain desirable (?) properties possessed by metric spaces but not by "random" topological spaces. More generally in the history of topology, a great deal of effort went into discovering which topological spaces occured as a result of a metrics. This is called metrizability and the theorems are intricate.
Uniqueness of limits/Hausdorff
If a topology comes from a metric, then the topology has the following
property, which may initially seem rather strange.
Property H Take two distinct points x and y of the space
X. Then there are open sets U and V of X with x∈U and y∈V
with U∩V=∅.
To see why this is true in a metric space, let
q=d(x,y) which must be a positive number since x and y are
different. Then put r=(1/2)q, and consider N_{r}(x) and
N_{r}(y). If z is in both of those sets, then d(z,x)<r and
d(z,y)<r, so (triangle inequality and symmetry of d)
q=d(x,y)≤d(z,x)+d(z,y)<(1/2)q+(1/2)q=q which is a contradiction
since q is positive.
A silly example of a topology which does not have Property H is
the following: take X to be a two-element set, say
{♣,♦}. The topology on X (which must a set of subsets of
X, don't sink into the sea of abstraction here) to consider is {∅,{♣,♦}}. This
sure is a topology (called the indiscrete topology, the
smallest topology -- the largest topology is called the discrete
topology) and it does not have "enough" open sets to separate the
two points, ♣ and ♦.
Comment The number of
topologies on a finite set is interesting and there's no known
exact pattern. Note, oh combinatorists, that most (!!) of them do not
have Property H above.
A topology which has Property H is called Hausdorff (the points can be "housed off" from each other [sorry]). This property is already evident in calculus, because we tell people there the following result: if {x_{n}} is a sequence and if lim_{n&rarrow;∞}x_{n}=A and lim_{n&rarrow;∞}x_{n}=B, then A=B. This "uniqueness of limits" property is exactly a consequence of Property H or Hausdorffness. There are important examples of non-Hausdorff topological spaces which seem less "artificial" than what I gave, but their descriptions are somewhat complicated.
Countable sequence of open sets
A topology which comes from a metric has the following property:
Sequence of neighborhoods
If x∈X, then there is a
sequence of open sets, {U_{n}} so that if V is open and
x∈V, there must be a positive integer N so that
x∈U_{N}⊂V. Actually, we can even ask that these sets
be "nested", so that U_{n+1}⊂U_{n} for all n.
Proof A proof is very brief. If d is the metric, take
U_{n} to be the ball of radius 1/n centered at x, that is,
U_{n}=N_{1/n}(x). If V is as described, there is some
r>0 with N_{r}(a)⊂V. Then the Archimedean property
guarantees the existence of N with 0<1/N<r, so that
U_{N}⊂V.
This idea will be used again and again in this course. It plays an important part of the "a implies c" proof and you should look at that now if you already haven't. We will use the idea to "create" suitable sequences. But let me show you a somewhat intricate example of a topological space which does not satisfy the preceding result.
R^{∞}
As a set, R^{∞} is "just" the collection of all real
sequences. So if x∈R^{∞}, then
x=(x_{1},x_{2},...,x_{n},...) with all of the
x_{n}'s elements of r. For the purposes of this paragraph,
I'll call such an x positive exactly when all of the
x_{n}'s are positive real numbers. I want to define the box
topology on R^{∞}. I will first begin by defining
box neighborhoods.
Suppose x=(x_{1},x_{2},...,x_{n},...) is a
point in R^{∞} and
ε=<ε_{1},ε_{2},...,ε_{n},...)
is a positive element of R^{∞}. Then I will tell you
what is in the subset M_{ε}(x) of
R^{∞}. A point
y=(y_{1},y_{2},...,y_{n},...) of
R^{∞} is in M_{ε}(x) if and only if, for
all positive integers n,
|x_{n}–y_{n}|<ε_{n}. The sets
M_{ε}(x) play a role similar to what the balls
N_{r}(x) do in the case of metric spaces, but there's no
metric visible (indeed, there is no metric possible, as we will
see!). Now I will tell you what an open set. A subset W of
R^{∞} will be open in the box topology if:
for all x∈W, there is a positive ε in R^{∞}
so that M_{ε}(x)⊂W.
We should check that the rules for a topology are satisfied. If you
wish to do this, you should begin by verifying that
M_{ε}(x) is itself open (analogous to verifying that
an open ball is open). This is not too difficult, and, indeed, nost of
the other verifications are easy. One thing needs to be said, I
think. If
M_{εa}(x) and
M_{εb}(x) are two of the defining
neighborhoods, and if we put
ε_{c};=(min(ε_{a}_{1},ε_{b}_{1}),min(ε_{a}_{2},ε_{b}_{2}),...,min(ε_{a}_{n},ε_{b}_{n}),...)
(take minimums in each coordinate) then
M_{&epsilon3}(x)⊂M_{εa}(x)∩M_{εb}(x). This
result is needed to prove that intersections of box open sets are box
open.
Now suppose that x∈R^{∞}. What if there were a qualifying sequence of box open sets {U_{n}} behaving nicely as in the Sequence of neighborhoods proposition above? Then there must be (definition of box open) a sequence of positive ε_{n}'s so that x∈M_{&epsilonn}(x)⊂U_{n}. I will create a specific M_{η}(x) which is not contained in any of the M_{&epsilonn}(x)'s.
Let me begin by looking at a specific example, because that will help us understand the general case. Take U_{n} to be M_{(1/n,1/n,...,1/n,...)}(0). So this is a neighborhood of 0 whose "polyradius" is 1/n in each coordinate or direction. Certainly, there is no point other than 0 in all of the U_{n}'s, so they seem to "shrink down" to 0. As a response to this choice of a sequence of U_{n}'s, I ask you to consider V=M_{(1,1/4,...,1/n2,...)}(0). I claim that none of the U_{n}'s can be included inside V. Why? Well, let's look at a point p_{n} in U_{n}. Define p_{n} to be (2/(3n),2/(3n),...,2/(3n),...) so it is 2/(3n) in each coordinate. This is certainly inside U_{n} since 2/(3n)<1/n. But it is not in V, because 1/n^{2} is eventually less than 2/(3n).
Now let's try something similar in general. You "challenge" me with a
list of M_{εn}(0)'s.
My M_{η}(0) will be constructed so that its sequence of
coordinates →0 faster (eventually!) than any of the
sequences of coordinates of the ε_{n}'s. Here is one
possible recipe:
Take
η_{j}=(2/3)min(1/j,ε_{1}_{j},ε_{2}_{j},...,ε_{j}_{j})
Then what do I know? I know that &eta_{j} is much smaller than
the j^{th} coordinates of ε_{1},
ε_{2}, ..., ε_{j}. The element of
R^{∞} defined by two-thirds of &epsilon_{n} is
in M_{εn}(0), surely (I just mean, multiply
the components of &epsilon_{n} each by 2/3). But it is
not in V. Why? Because for j>n, η_{j} is less
than (2/3)ε_{n}_{j}. This "tail" restriction
prevents the element of U_{n} from being in V.
What's going on?
Given a sequence of sequences of positive numbers, we have
created a sequence whose limit is 0 and which approaches 0 eventually
faster than any of the sequences. The "eventually" is that it
is smaller than the given sequences at a variable starting place, of
course. This is again the diagonal process applied in a more
complicated way.
Please note that in class I used a much simpler prescription to create η which won't necessarily prove what is needed. I am sorry. I also owe emphatic thanks to Ms. Slusky whose persistant inquiries and messages made me think a bit more about this. It is more subtle than what I hurriedly did in class.
First countable
A topology which does satisfy the Sequence of neighborhoods
proposition is called first countable. I showed that a topology
obtained from a metric space is first countable, and gave an example
of a topological space which is not first countable. There are simpler
examples but the example given is also Hausdorff, and the underlying
set, R^{∞} is neither artificial nor unimportant, but
arises naturally in probabilistic reasoning (select a random sequence
of real numbers). The box topology is not used in probability because
it has other defects besides lack of first countability. In a first
countable space, sequential reasoning is adequate to determine what is
and is not an open set.
Metric spaces are Hausdorff and first countable, and I likely never refer to these terms officially again in this course, but I will use the properties they define frequently.
More definitions
Now for the actual progress of the course. We return to metric spaces,
Interior point; interior of a set
We defined interior point and the interior of a set. We prove that the
interior of a set is equal to the union of all the open sets which are
a subset of the set. We investigated some examples.
Closed set
A closed set is the complement of an open set. Some sets are neither
open nor closed. We investigated some examples. The closure of a set
is the intersection of the closed sets containing the set. We
characterized this using the idea of limit points, and considered
(rapidly!) some examples. A closed set is one which contains its limit
points.
Perfect set
A perfect set is a set which is its own limit points. Ms. Pritsker,
with the help of Ms. Hood, volunteered to prove, some time, that a
perfect set in a metric space is uncountable. Thanks to them in
advance.
Dense
I did not define "dense" and will need to begin next time with that
definition. Sigh.
The real numbers are uncountable
Here is what I said. This is, more or less, Cantor's second (!)
proof of this assertion (more later). Suppose R were a countable
set. Then, since subsets of countable sets are countable, the interval
[0,1] would be countable. Now supponse [0,1]~N, the positive
integers. Then there would be an "enumeration", that is, a way of
listing the elements of [0,1] according to their image under a
hypothetical bijection with N. So let us enumerate the elements
of [0,1], listing each one according to its decimal expansion address.
First element <---> .a_{11}a_{12}a_{13}a_{14}... Second element <---> .a_{21}a_{22}a_{23}a_{24}... Third element <---> .a_{31}a_{32}a_{33}a_{34}... Fourth element <---> .a_{41}a_{42}a_{43}a_{44}... ....Now we define a number b in [0,1] by giving its decimal expansion address:
Criticism of this Cantorial proof
Well, the problem is that the decimal "address" is not unique. What do
I mean? By analogy, if we talked about a building at the intersection
of Fourth Avenue and Avenue A, it is logically possible that the
building would have two distinct addresses. It could be called,
say, 47 Avenue A and 222 Fourth Avenue. Certainly that occurs with
decimals. For example, I know (?) that .379999999...(9's repeating) is
a decimal address for a number which also has the decimal adress
.38000000...(0's repeating). Well, hw many numbers have that problem?
The problems are those numbers with decimal expansion "ending" in an
infinite string of 9's. Well, golly, how many such problematic numbers
are there? For any finite string of integers,
.c_{1}c_{2}c_{3}...c_{L} I could end
with 9's repeated or subtract 1 fron c_{L}. (This is not
exactly correct, since c_{N} could be 0 but I'm getting
tired.) So this countable set of reals has maybe two addresses. Throw
them out and apply the Cantor proof (the "Cantor diagonal
process"). The result will be an unenumerated real number. So the
proof works. Let's follow Professor Rudin's text, though. This will
mean using the diagonal process you have already seen at least two
more times, and maybe even more. This is not a bad thing, since the
idea is really inspired, first-class, etc.
Really, a set which is not countable
Let's consider the sequences whose values are 0 or 1. That is, such a
sequence is a function, f:N (the positive integers)→{0,1}. Or, if
you are a traditionalist, it is {a_{n}}_{n in N} where
the a_{n}'s are 0 or 1. What if this set (let us call it,
temporarily, S) is countable? We enumerate the sequences in S. That
is, S consists of f_{1}, f_{2}, f_{3},
f_{4}, ... and we "diagonalize" to create a sequence which is
not enumerated. So g(n) is 1-f_{n}(n). I wrote this equation
and knew this was sort of being exhibitionistic -- I just wanted to
show you I could. Sign. What the heck is g? At the positive
integer n, is is 1-f_{n}(n). So (work it out, I have, darn
it!) this "switches" the values. That is, if f_{n}(n) is 1,
then g(n) is 0, and if f_{n}(n) is 0, g(n) is 1. So g
can't be any function already counted. The set S cannot be
enumerated, and it is not countable.
Hash [huh?]
Let me now try to create an injective function I:S→R. I
will be wrong, but, I will try! First, as preface, I will assert the
following wonderful fact:
If n is a positive integer and a is non-negative, then
∑_{j=1}^{n}r^{j}=(r–r^{n+1})/(1–r).
Now you may know this as the partial sum of a geometric series, but I
know it in Math 411 as a fact which can be confirmed with mathematical
induction if r≠1. Well, if I know this fact, then (with a≥0 and
0≤r<1) any such sum is positive and certainly less than
r/(1–r). for example, if r=1/2, the sums will all be less than 1.
Let's try the following to create an f. If A={a_{n}} is a sequence of 0's and 1's, define I(A)=sup{∑_{n=1}^{N}a_{n}/2^{n} : N is a positive integer}.
There's a bunch of things to check. First, does the set indicated
above have an upper bound? The key observation is that
∑_{n=1}^{N}a_{n}/2^{n}≤∑_{n=1}^{N}1/2^{n}≤1.
So the set involved in I(A)'s definition is bounded above, and
therefore it has a sup. Unfortunately, several students pointed out
that the Emperor's clothing is indeed lacking. I is not 1-1. It
is, after all, just the sequence {a_{n}} encoded as the binary
expansion of a number. And just as decimals have problems, so do these
expansions. For example, the sequence {1,0,0,0,...(all 0's)} gets
mapped to a set whose only element is 1/2. But the sequence
{0,1,1,1,...(all 1's)}: well, we need the sup of
∑_{n=2}^{N}1/2^{n}={1/2}–{1/2^{N}).
Since we proved that 2^{N} is unbounded, these sums also have
sup equal to 1/2. The mapping I is not injective.
How to fix it
Computer science sometimes calls functions like I by the name, hash
function (this is part of the special Math 411 Vocabulary Building
Project). The hash function classifies elements of a set. Here the set
to be "classified" are the collection of 0-1 sequences, and we will
classify them by real numbers in [0,1]. When its values of the hash
function are the same for different inputs, the result is called a
collision. Can we redefine I to avoid collisions?
One suggestion (from Mr. Kowalik?) was to replace 2 by 3 in the definition of I. This will work. In fact, we will use something similar next week in a slightly different context. I wanted to use a weirder idea, because it leads to a wonderful fact mentioned later. So my "suggestion" was to replace 2^{n} by 2^{n!}. This means that we are filing the sequences into bins (?) of [0,1] which have widths which shrink much more quickly. Let's now consider two different sequences, say A={a_{n}} and B={b_{n}}. Let's assume that N is the first time that they disagree. So a_{n}=b_{n} for n<N and, to be specific, a_{N}=0 and b_{N}=1. I claim that I, as modified with n! instead of n for the exponent, will have I(A)<I(B): they won't be equal. If I verify this, then I is injective, and, since S is uncountable, we will have verified that R is uncountable. Wow!
Let's make an I(A) as large as possible and an I(B) as small as
possible, and compare them. The initial segments appearing in the sum
are the same, and if I show what follows, then we're done:
∑_{n=N+1}^{∞}1/2^{n!}<1/2^{N!}.
Now consider the sum. Its first term is
1/2^{(N+1)!}=1/(2^{N!})^{N+1} (repeated
exponents are subtle!). What about the ratio between successive terms?
Well, the factorials make what's written above not a geometric
series. So let me overestimate what is above by a geometric series
with initial term 1/(2^{N!})^{N+1} and ratio between
successive terms equal to 2^{(N+2)!–(N+1)!} (this is just the
ratio between the first and second terms -- all the other ratios are
much less because the exponents are even larger). The exponent is at
least 4. So the series given above is less than a geometric series
whose first term is 1/2^{(N+1)!} and whose ratio is 1/4. The
sum of that series is [1/(2^{(N+1)!}]/(1–{1/4}) which is
1/(3·2^{(N+1)!–2}). Now let me ask about this
inequality:
2^{N!}<3·(2^{(N+1)!–2})
When N=1, this is
2<3·2^{0} which is true.
When N=2, this is
2^{2}<3·2^{4} or 4<48 which is true.
When N=3, this is
2^{6}<3·2^{22} which is very true (heh, heh:
64<125,582,912: silly).
The desired inequality is always true, and can be "left to the
student" (factorials grow very fast!). So we are done, and we
know the reals are not countable. We will see other proofs of
this fact, but now let's use it.
Transcendentals ...
Mr. Ratner discussed problems 2 and 3 in chapter 2. He defined
algebraic numbers and showed that they were countable, and therefore
concluded that the transcendentals, the "other" real numbers, were
uncountable. Here is his writeup.
Note For the algebraists among you, please realize that the
algebraic numbers are a field. This is not totally trivial, but it is
algebra, so I can't prove it, and I don't need to in Math 411.
Opinion and history
I said as I believe, that this sequence of silly trivial observations
is truly remarkable. One consequence is that most real numbers,
if the word "most" is used in almost any sense, are
transcendentals. So a random real number should be transcendental. It
is rather difficult to make sense of the word random in an infinite
setting, which is one of the real difficulties of probability.
Cantor's proof of the existence of transcendental numbers as described above is the only one I knew until fairly recently, but apparently it was his second proof. Earlier in his life, he published a proof which can be described as constructive. A discussion of this proof is in the article Georg Cantor and Transcendental Numbers by Robert Gray, in The American Mathematical Monthly, Vol. 101, No. 9 (Nov., 1994), pp. 819-832. If you are at a Rutgers computer, you should be able to see a copy of this paper using this link.
Yeah, there are lots of them; now show me one!
I know because people have told me that the special numbers e and π
are transcendental. The original proofs of these facts are difficult
and long, and even now, a century later, I do not think there are any
really easy proofs. It would be nice, since most real numbers are
transcendental, to give at least one explicit example of such a
number. Here:
∑_{n=1}^{∞}1/2^{n!}.
The transcendentality of this number is a result of Liouville, whose
name is also attached to other results in math and physics. The
foundation of the rather brief proof is the Mean Value Theorem of one
variable calculus, and the strange statement that algebraic numbers
which are not rational can't be approximated very well by rationals in
a suitable sense. An exposition is here.
We must go on!
The central concerns of the course will frequently be stated in terms
using topology. So we should study a bit of topology. We have a whole
course, Math 441, devoted to this. I've taught it. Let me outline what
my first lecture in that course was about.
How to begin Math 441
My assumption was that not all students were knowledgeable about
advanced math and proofs, but that there would be some recall of
calculus. I "remembered" some definitions and introduced a definition
new to many students.
Definitions
Theorem Suppose f:R→R is a function. The following logical statements about this function are equivalent.
Proof of b implies c
I will try to verify c, and somewhere I will use b. So suppose U is an
open subset of R, and w is in U. Since U is open, I know there is
ρ>0 so that (w–ρ,w+ρ)⊂U. This interval translates
to the statement: all real y's which satisfy |y–w|<ρ (interval
statement and inequality statement are logically equivalent). Suppose
w=f(a). Then the inequality becomes |y–f(a)|<ρ. Now use b which
declares "there exists" (magic!) δ>0 so that if
|x–a|<δ then |f(x)–f(a)|<ε. But think about this:
it means in turn that the set f^{–1}(U) includes the numbers x
which satisfy |x–a|<δ. But that means (inequality/interval)
(a–δ,a+δ)⊂f^{–1}(U). Think this through: we
have produced an open interval contianing a inside f^{–1}(U),
and we can do this for any a inside f^{–1}(U). Therefore
f^{–1}(U) is open.
There were no other requests until later that afternoon brave Mr. Leven and brave Mr. Kiria came to my office, risking GBH (in English mysteries, GBH is "grievous bodily harm" but I find to my horror that GBH is also a text-messaging abbreviation for "great big hug"). They worked through what I think is a more difficult proof involving sequences. Let me present their proof here.
Proof of a implies c
I need to convince you that the sequential statement implies the
open/inverse open statement. In fact, here we will proceed by proving
the contrapositive: if the open/inverse open statement is
false, then the sequential statement is false.
Let's suppose I know an open subset U of R so that f^{–1}(U) is not open. Well, this means there's at least one w in U with f(a)=w and one ρ>0 so that (w–ρ,w+ρ)⊂U, so that there is no τ>0 with (a–τ,a+τ) inside f^{–1}(U). Let's use this complicated statement to create a sequence. If n is a positive integer, take τ to be 1/n. The interval (a–1/n,a+1/n) is not a subset of f^{–1}(U), so there must be a number x_{n} in this interval with x_{n} not in f^{–1}(U). That is, |x_{n}–a|<1/n and f(x_{n}) is not in U. Since f(x_{n}) is not in U, it also can't be in the interval (w–ρ,w+ρ) because that interval is a subset of U. Therefore we know that |f(x_{n})–f(w)|≥ρ.
Putting this all together, we have created a sequence {x_{n}} so that |x_{n}–a|<1/n. Therefore (Archimedean property) this sequence converges to a. But w=f(a), and |f(x_{n})–f(a)|≥ρ where ρ is some positive number. The sequence {f(x_{n})} can't converge to f(a)! And a must be false! We have verified if c is false, then a is false. Therefore a implies c.
YOU TOO can try to prove one of the implications guaranteed by the theorem above. Tell me about it, or ask me about it!
So what people then did ...
Historically, people decided that statement c was a neat way of
studying ideas like continuity. It seemed to change inequalities and
complicated implications into "simple set theory" (well, the packaging
was good, but the difficulties are just hidden, not gone!). So a whole
industry was created to understand this sort of approach.
A metric
I defined metric as in the text, commenting that most difficult
rule to verify was usually the triangle inequality. I gave two
examples. The most important example for Math 411 was the Euclidean
metric (square root of the sum of the squares of the differences in
the coordinates). The triangle inequality then is a consequence of the
Cauchy-Schwarz inequality. The silliest example of a metric is the
so-called discrete metric, where d(x,y) is 1 if x≠y and 0 if x=y.
Note that d(x,y) will play the role of |x–y| in arguments similar to
what we just went through.
More definitions
An open ball or open neighborhood was defined as in the
text: N_{r}(p) is the collection of x's in X with
d(x,p)<r. This plays the part of (p–r,p+r). The open balls in
R^{n} are "clear". In the discrete metric, an open ball of
radius<1 is just 1 point. If the radius is ≥1, it is
everything.
An open set is one which contains an open ball centered at each
point in the set. The radius, r, of the ball will usually depend on
the point selected! The open sets for the discrete metric are all
subsets of the set!
So use these properties to define ...
A topology on a set X is a set of subsets of X, let's call it
Τ, with the following properties:
I'll give some more examples next time, and go through more of chapter 2.
I gave a good proof of the thing I messed up last time.
C
We discussed the complex numbers, C, as
R[i]/(i^{2}+1) and as R^{2} with vector addition and a
weird kind of multiplication and as 2 by 2 real matrices of a strange
kind (complex conjugation is realized as transpose of matrices in this
guise -- I was, as usual, incorrect!).
I verified a few of the elementary properties of complex numbers, as in the text. I then started to prove the Cauchy-Schwarz inequality. The method I used might have looked a bit different, but was actually the same, as what was in the text. I considered the function f(t)=||A+tB||^{2} where A and B are vectors in C^{n}. We knew that f(t)≥0, so when I "expanded" this function using bilinearity, the discriminant of the resulting quadratic was non-positive. This is almost the Cauchy-Schwarz inequality, and I tried to suggest how it would verify the text's version of Cauchy-Schwarz by multiplying one of vectors by the scalar e^{iθ} for suitable choice of θ (to make the appropriate dot product real and positive).
R^{n}
I discussed enough of R^{n} to be sure that we could recognize
it as a metric space in a few days. In particular, I got versions of
Cauchy-Schwarz and the triangle inequality.
Sizes of sets (cardinality)
It will be useful to have this language as we discuss certain
examples. So I defined (with help!) the following terms:
So non-empty subsets of R which have upper bounds will always have least upper bounds, called sups. We verifeid a few standard simple consequences of completeness.
The Archimedian Property
If x>0 and y are real numbers, there is a positive integer N so
that Nx>y.
This was proved last time. To me there is a geometric contect, that if
we lay out a "grating" of width x on the real line, we will always
"trap" y inside the grating. If you believe that the Archimedean
property is somehow intuitive then probably you have not met the long
line which locally looks just like R but is much, much longer. It
is irritatingly unintuitive.
The Archimedean property says that the real numbers can't contain infinities. Take reciprocals, and we see that infinitesimals can't exist either (a pity, because many analysts, me included, frequently think about proofs with them).
Corollary
If x>0, then there is a positive integer N so that 0<1/N<y.
I needed a simple inequality to get a multiplicative version of the
Archimedean property. In some textbooks this is called Bernoulli's
Inequality. Here it is:
If x>1 and n≥2, then x^{n}>1+n(x–1).
This can be proved directly using mathematical induction (verify for
n=2; assume for n, multiply both sides by x and juggle some algebra to
get the statement for n+1). Or if you know the Binomial Theorem, look
at the first two terms of the expansion of
x^{n}=(1+[x–1])^{n}.
A sort of multiplicative Archimedean property
If x>1 and y are real numbers, then there is a positive integer N
so that x^{N}>y.
This is true because we can select N so that 1+(N–1)x>y, and then
use the previous lemma.
These observations will be used again and again.
Then I turned to more ambitious verifications. I said there were three statements which the proved directly from completeness. Here is what they are, in order (to me!) of increasing difficulty and here is what we did (and will do). All of these statements could be quite easily proved once the machinery of one variable calculus has been developed, so these direct proofs, while they are all elementary, all involve some intricate contortions of logic.
Roots exist Given y>0 real and a positive integer n, there
is a unique x>0 so that x^{n}=y.
Proof I won't write out the details, since I basically copied
what was in the text except that I tried to show how a proof of this
result could be discovered. So we consider the set of real numbers,
S={x: x^{n}<y}. The logical outline of the proof follows:
I proved 1 as in the text: take y/(1+y) which is a positive number less than 1 and less than y. Integer powers of numbers between 0 and 1 "shrink" so this provides an element of S.
I proved 2 as in the text: take 1+y, larger than both 1 and y. Integer powers increase, so 1+y is not in S, and, because x_{1}<x_{2} implies (x_{1})^{n}<(x_{2})^{n} we know that no number bigger than 1+y is in S, and therefore S is bounded above. Notice that the observation given also implies that n^{th} roots, if they exist, must be unique.
Then I tried 3. There is a complete proof, awesomely well-arranged, in the text, so I tried to understand how that proof might have been invented. Well, if s^{n}<y, we can try to kick "up" or increase s and get another number whose n^{th} power is still less than y. So let's look at s+h with h some "small" positive number to be determined. Well, notice that (s+h)^{n}≤s^{n}+STUFF. We'd like this to still be <y. So we need STUFF<y–s^{n}, and y–s^{n} is some random positive number. So how can we choose h? This is the origin of Rudin's specification. You could try to finish this yourself, please, with the book and notes closed. Here is the case n=3.
STUFF=h(3x^{2}+3xh+h^{2})≤ h[max(3,3,1)x^{2}] if h≤ x. So to get this less than y–x^{3}, take h positive which is also less than (y–x^{3})/[max(3,3,1)x^{2}], and b is done for n=3.
4 is proved similarly.
I made a very bad mistake
What I did to prove 3 in class was wrong and I was too excited
and went too darn fast to notice. I am sorry. So what did I do and how
did my spurious argument go? I had s^{n}<y and wanted
h>0 so that (s+h)^{n}<y. Then somehow (stupidity!!!) I
used the true observation that
s^{n}<s^{n}+h^{n}<(s+h)^{n}
combined with selesting h so that s^{n}+h^{n}<y to
conclude that I had found a satisfactory s+h (one with
(s+h)^{n}<y). This is an inference which is NOT valid. So:
a<b<c and b<d does not necessarily imply that c<d (find
some numbers, if nothing else, and use that to convince yourself!).
I am sorry. The comments above for n=3 show how I should or could have correctly ended the proof.
Problem 6
The existence of real powers of positive numbers was nicely discussed
by Ms. Slusky. Here is a
pdf she supplied for the class. I thank her for her work.
I did prove one easy consequence of completeness, the Archimedean property, and the further consequence that there are no positive reals smaller than 1/n where n runs through all positive integers. The Archimedean proof would be an offense to constructivists, and the fact about 1/n would be false to folks who like infinitesimals (that probably includes most analysts!).
Maybe things will go better next time. It can't be worse. We will be treated to Ms. Slusky, coached by Ms. Sergel, giving a beautiful presentation of problem #6 on irrational powers. It's gotta be better than ... well, enough ...
My reasons for volunteering early
We investigated the statement
R is a complete ordered field
Field
A field is a set with two "operations", addition and
multiplication, each a functon
from pairs of elements of the set to the set. Each operation is
commutative and associative, and each has an identity. Inverses also
exist (except that 0 doesn't have a multiplicative inverse otherwise
things would be very silly!). And there's a distributive law, relating
multiplication and addition.
With no further elaboration, there are many examples, highly varied, ranging from finite fields to "big" examples such as fields of rational functions. Field theory is rich enough to totally support such intellectual enterprises as linear algebra and investigating the relations between fields is an interesting and worthwhile part of algebra.
We rapidly worked through a few simple consequences of the field axioms (pp.6-7) such as –(–x)=x and 0x=0.
Ordered
An ordered set in the text is a set with a relation, <,
which is transitive (x<y and y<z implies x<z) and has
trichotomy (exactly one of these statements is true: x<y, x=y,
y<z). An ordered field is an ordered set which further obeys
these rules: if x<y then x+z<y+z, and x>0, y>0 implies
xy>0.
In other texts an ordered field is a field which contains a subset P (of "positive" elements) so that one of these is true: x=0 or x is in P or –x is in P; and P is closed under addition and multiplication. We verified that our text's defintion agreed with this one with x<y defined as y–x is in P. Also we checked a few simple consequences of the ordered field rules (p.8). It is true that an ordered field has characteristic 0. I didn't state this, but if you know what characteristic means, you can probably verify it.
An upper bound for a subset W of an ordered set is an element t so that for all w in W, either w<t or w=t (together written as w≤t). A least upper bound of a set W is an upper bound which is ≤ any other upper bound. Trichotomy implies that a least upper bound, if it exists, must be unique. Such an element will be called sup W. The dual notions are lower bound and inf, the greatest lower bound.
Finite ordered sets have sups (they are actually maxes and mins). For a more relevant and interesting example we used Q and its subsets.
The rational numbers, Q
The text assumes known the ordered field of rational numbers, Q,
together with its properties and its representation as quotients of
integers. This information can be deduced from more primitive notions
(see some of the references) but we begin here in 411.
Q itself does not have an inf or a sup.
The important definition that I forgot (at first) in class is of the word complete. An ordered set is complete if every non-empty subset with an upper bound has a sup. The word "non-empty" is important, since the empty subset has many upper bounds but may not have a sup (check this for the empty subset of Q).
Q's major defect from the analysis point of view is that it is not
complete. The usual discussion involves "sqrt(2)". The reason for the
quotes is
There is no rational number whose square is 2.
You should know how to prove this. Does the proof work for 3? Does it
work for 4? Why or why not?
Notice that this statement does not necessarily imply that Q is not complete. A similar statement ("no rational number has square –1") is true for both Q and R, although R is complete. The distressing thing about the statement is that it shows the need for completeness if you want nice calculus. For example, look at the graph of y=x^{2}–2 in the "rational plane", Q^{2}. This graph violates the conclusion of the Intermediate Value Theorem!
Suppose C is the collection of rational numbers whose square is less than 2 or which are negative. This is a silly looking set, an infinite half-line. The reasons for considering this particular set will become clear later. Then:
A major achievement, possible to do in a number of ways, is to show the following:
This is a really nice fact. I will not go through the proof in detail, since we won't need any of these techniques later in this course. It is certainly a part of "mathematical literacy" to know something about these ideas, and similar ideas are used in other mathematical fields. I hope to show you enough so that you can see how clever it is, and also so that you can see the difficulties.
"Construction" of R from Q using Dedekind cuts
Our complete ordered field, which we will call the real numbers, R,
will be a collection of subsets of Q. This is a rather sophisticated
notion. The subsets are called Dedekind cuts or just cuts. So: a
subset C of Q is a cut if
References
The text describes this whole setup, with proofs, in less than 5 pages
of the Appendix to Chapter 1. A more leisurely (25 pages!) discussion
is available in Spivak's Calculus book which you can borrow
from me.
I won't go through all details because that would probably take 3 or 4 lectures. I will try to do enough so that you can appreciate the cleverness and also understand the intricacy of certain verifications. I won't need the techniques of these proofs later in this course, but I will remark that knowing about the setup is certainly part of everyone's mathematical literacy, and also analagous constructions are used in other fields with great success.
Cuts and <
If C_{1} and C_{2} are cuts, then we define
C_{1}<C_{2} to mean: C_{1} is a proper
subset of C_{2}. (Cute!)
Completeness of the set of cuts
It turns out with all of these definitions, completeness is easy. Since this is what is usually quite difficult,
maybe the payoff is right here, upfront. So what do we do? Suppose
{C_{a}} for a in A is some non-empty set of cuts which is
bounded above by the cut D. This means that each C_{a} is a
subset of D. Let C be the union of all the cuts C_{a}. I claim
that D is the sup of the set of C_{a}'s. We need to
verify:
This method makes verifying completeness easy. Some of the extensive details involving the algebra are difficult. Let me just discuss addition.
Addition of cuts
If C and C' are cuts, we will define a subset of Q to be called
C+C'. So z is in C+C' if z=x+y for some x in C and y in C'.
We now need to verify that C+C' is a cut, we need to check
commutativity and associativity, identify and check an additive
identity, and look for additive inverses. I'll do some of this next
time.
Maintained by greenfie@math.rutgers.edu and last modified 9/5/2008.