# Lecture 3

\chapter{Affine Geometry}
\section{Oriented Lengths, Areas, Volumes}
A careful reader might have noted that we have cheated slightly in lecture 1: we formulated Ceva's theorem only for the case that three Cevians meet inside the triangle. Then we applied it in a situation when Cevians don't necessarily meet inside the triangle. For instance this is the case for three heights in an obtuse triangle.
\input{./chapter3/figObtuseTriangleHeights.tex}
Later, when we introduce the notion of quotient of oriented lengths of segments, we'll refine the formulation of Ceva's theorem to include that case as well. However we never gave a proof of Ceva's theorem that covers this case. (In fact the proof with center of masses does work if we allow negative masses, but we didn't emphasize that enough). Let us see what the problem can be. In the proof of Ceva's theorem we used multiple times an identity of the following kind: if $C'$ is a point on side $AB$ in triangle $ABC$, then $\area{ABC}=\area{AC'C}+\area{CC'B}$. It is correct if the point $C'$ is inside the segment $AB$, but it becomes wrong once the point $C'$ goes beyond the point $A$ or $B$ along the line $AB$. Indeed, in the picture below the following identity holds: $\area{ABC}=\area{AC'C}-\area{CC'B}$.
\input{./chapter3/figPointOutsideTriangle.tex}
Note that the area of triangle $CC'B$ must be taken with minus sign. In this section we are going to revisit notions of length, area and volume to make them better behaved and easier to deal with.
We will start with recollecting what oriented length is on an oriented line. An oriented line is just a line with a choice of preferred direction on it (one-sided road is an example). On such a line we can measure the oriented distance from point $A$ on the line to point $B$ on the same line. It's the distance between $A$ and $B$ taken with the sign "+" if the direction from $A$ to $B$ agrees with the chosen direction along the line, and sign "-" otherwise (this will make perfect sense to a driver on the one-sided road -- he will have to drive negative distances in reverse). Now one thing we used several times in proofs, but didn't emphasize is that it is always the case that while the equality of lengths $|AB|=|AC|+|CB|$ holds only for points $C$ inside the segment $AB$, the equality of signed lengths $AB=AC+CB$ is true for any choice of point $C$ on the line.
Now we will introduce the notion of directed area.
The notion of directed area makes sense on a plane with a choice of one of the two possible orientations: clockwise and counterclockwise.
\input{./chapter3/figClock.tex}
Now we will say that oriented area of triangle $ABC$ is the area of triangle $ABC$ taken with sign "+" if in going from $A$ to $B$ to $C$ to $A$ one rotates in the direction that was chosen on the plane and sign "-" otherwise. The concept can be understood better by looking at a picture contrasting positive oriented area and negative one.
\input{./chapter3/figNegativeArea.tex}
Note that the sign of oriented area (as is also in the case of oriented length) depends on the order in which the vertices of the triangle are traversed: $\area{ABC}=\area{BCA}=\area{CAB}=-\area{ACB}=-\area{CBA}=-\area{BAC}$. For oriented areas it is always true that if $ABC$ is a triangle and $P$ is any point in the plane, then $\area{ABC}=\area{ABP}+\area{BCP}+\area{CAP}$.
Place for figure
For non-oriented areas this is the case only when the point $P$ is inside the triangle $ABC$. In other cases we should change the signs in the equation above.
To define the oriented area of any closed polygon $A_1A_2\ldots A_n$ we can choose a point $P$ anywhere in the plane and define the area of $A_1A_2\ldots A_n$ to be the sum $\area{A_1A_2P}+\area{A_2A_3P}+...+\area{A_nA_1P}$. The reader is advised to check that this definition does not depend on the choice of the point $P$ in the plane.
In space we will consider one of the two orientations - right-handed and left-handed. Suppose we've chosen the right-handed orientation.
Space for figure.
Then to determine the sign of oriented volume of a tetrahedron $ABCD$ we can curve the index finger of our right hand along the direction of rotation from $A$ to $B$ to $C$ and then look at our thumb: if it points in the direction of the half-space containing the vertex $D$, then the volume is positive. Otherwise it will be negative.
Space for figure.
The identity $Vol(ABCD)=Vol(ABCP)+Vol(PBCD)+Vol(CDAP)+Vol(PDAB)$ holds for any point $P$ in the space if the volumes are interpreted as oriented volumes.
Space for figure.
To discuss further properties of oriented volumes and their generalizations, we will have to recall what linear functions are. We will remind the definitions in the next section. If the reader feels free with the material on linear functions, he may safely skip it.
\section{Vector spaces and linear functions}
Recall that if we choose an origin in a plane or space, we can identify points with the corresponding vectors from the origin. Once we do so, we get the possibility to add these vectors using the parallelogram rule and multiply them by real numbers (i.e. scale them). People have crystallized the essence of this situation into the definition of an abstract vector space.
According to the definition, a vector space over a field of scalars $K$ is a set with operations of addition (one can add any two vectors and get a vector) and multiplication by scalars (one can take any vector and multiply it by a scalar from the field $K$ to get another vector). These operations should satisfy some familiar properties that addition and scaling of two- or three-dimensional vectors satisfy (like $v+w=w+v, \lambda(v+w)=\lambda v+\lambda w$ etc).
In this lecture the field of scalars will be just the field of real numbers $R$, but in later lectures the fields $C$ (complex numbers) and $Q$ (rational numbers) will be also useful.
The main examples of vector spaces are the spaces $R^n$: for every natural number $n$ the space $R^n$ consists of $n$-tuples of real numbers $(a_1,...,a_n)$ with operations of addition and multiplication by scalar defined coordinatewise ($(a_1,\ldots,a_n)+\lambda(b_1,\ldots,b_n)=(a_1+\lambda b_1,\ldots,a_n+\lambda b_n$).
The space of vectors in the plane can be identified with $R^2$ once a basis is chosen, i.e. when we choose what two non-collinear vectors $e_1$ and $e_2$ will correspond to $(1,0)$ and $(0,1)$. Once these are chosen, any other vector can be represented as a linear combination $a_1 e_1+ a_2 e_2$ of these two ($a_1, a_2$ are some real numbers). Such a vector will be identified with the pair $(a_1,a_2)$. Similarly we can identify the space of spacial vectors with $R^3$ by choosing three non-coplanar vectors $e_1,e_2,e_3$ that will be identified with $(1,0,0),(0,1,0),(0,0,1)$.
Thus the vector space $R^n$ is the natural $n$-dimensional analogue of our notions of two- and three-dimensional spaces.
One notion that is important for the discussion of vector spaces is that of linear mapping: a mapping between vector spaces that preserves linear structure. More precisely, a function $f:V->W$ between $K$-vector spaces $V,W$ is a linear mapping if $f(\lambda_1 v_1 + \lambda_2 v_2)=\lambda_1 f(v_1) + \lambda_2 f(v_2)$ for any $v_1,v_2 \in V, \lambda_1, \lambda_2$ in the field $K$.
The linear maps from the vector space $V$ to the field $K$ (considered as one-dimensional vector space) are called linear functionals.
Let's examine some examples of linear maps:
1. Rotations around the origin are linear maps. If we have a rectangular coordinate system in our space, then the rotations have the form $(x,y)->(cos(\phi) x + sin(\phi) y, -sin(\phi) x + cos(\phi) y)$.
Space for figure.
2. Reflections in an axis passing through the origin. For instance the reflection in the x-axis is given in coordinates as $(x,y)->(x,-y)$. Refer to exercise for representation in coordinates of a more general reflection.
3. Shearing in direction of x-axis given in coordinates by $(x,y)->(x+\lambda y,y)$.
4. Scaling (aka homothety) with center at the origin. Scaling by factor $\lambda$ is given by the map $v->\lambda v$.
5. Stretching by factor $\lambda_1$ in x-direction and factor $\lambda_2$ in y-direction given in coordinates by $(x,y)->(\lambda_1 x, \lambda_2 y)$.
6. Parallel projection along some direction onto an axis that passes through origin. For instance the projection onto the line $x=y$ along the direction of x-axis is given by $(x,y)->(x,x)$.
Note that rotations preserve distances, angles and oriented areas, reflections preserve distances, but reverse the signs of angles and oriented areas, shearings preserve areas, but not distances or angles, homotheties preserve angles, but not distances or areas and finally the last mapping doesn't preserve any of the three.
One can represent linear maps between finite-dimensional spaces by means of matrices. let $f:V->W$ be any linear map and let $v_1,...,v_n$ be a basis for the vector space $V$. Then any vector $v\in V$ can be uniquely represented as $v=\lambda_1 v_1 + ... + \lambda_n v_n$ for some scalars $\lambda_1,...,\lambda_n \in K$. Hence for such $v$ we have $f(v)=\lambda_1 f(v_1) + ... + \lambda _n f(v_n)$. Thus the value of the function $f$ on any vector is determined by its value on the basis vectors $v_j$. Now if $w_1,...,w_m$ is a basis for the vector space $W$, then each vector $f(v_j)$ can be expressed as a linear combination of the $w_i$: $f(v_j)=a_{1,j}w_1 + a_{2,j} w_2 + ... + a_{m,j}w_m$.
The scalars $a_{i,j}$ determine the linear function $f$ uniquely. Thus the linear mapping $f$ can be identified with the mapping $v->A\cdot v$, where $A$ is the matrix
$\begin{pmatrix} a_{1,1} &... & a_{1,n} \\ ... & & ... \\ a_{m,1} & ... & a_{m,n} \end{pmatrix}$
of the function $f$ relative to the bases $v_1,...,v_n$ of $V$ and $w_1,...,w_m$ of $W$.
\section{Volume as multilinear function, determinants, Cramer's rule}
Recall that if we choose an orientation on the line $R$, we can measure the oriented length of any vector $OA$ from the origin to point $A$. This length is a linear function from $R$ to $R$. We want to explore what linearity properties the oriented areas and volumes have.
Let's start with oriented areas in $R^2$. For this suppose we've chosen orientation of $R^2$, e.g. counterclockwise one. Let $V(v_1,v_2)$ denote the oriented area of the parallelogram with vertices $0,v_1,v_1+v_2,v_2$. Let $v_0^\perp$ denote a unit length vector orthogonal to $v$ such that the basis $v_1,v_1^\perp$ is oriented positively. Then the area $V(v_1,v_2)$ is equal to the product of length (unoriented) of vector $v$ and the oriented length of the projection of $w$ to the line spanned by $v_0^\perp$ (the orientation on this line being given by demanding that the oriented length of $v_1^\perp$ is $+1$). This length is equal to the scalar product $<v_1^\perp,w>$ and as such it is linear function in $w$.
What follows from this line of arguments is that the function $V(v_1,v_2)$ is linear in the second parameter (if the first one is being held fixed, i.e. $V(v_1,v_2+\lambda v_2')=V(v_1,v_2)+\lambda V(v_1,v_2')$. By a similar argument (or by using the identity $V(v_1,v_2)=-V(v_2,v_1)$) we can conclude that $V(v_1,v_2)$ is linear in the first parameter as well. Thus the oriented area is linear in each of its parameters. This seemingly simple observation allows us to express the area of parallelogram explicitly in terms of coordinates of its vertices. Namely let $e_1,e_2$ be two basis vectors. Let $v_1=a_{11}e_1+a_{21}e_2, v_2=a_{12}e_1+a_{22}e_2$. Then $$V(v_1,v_2)=V(a_{11}e_1+a_{21}e_2,v_2)=a_{11}V(e_1,v_2)+a_{21}V(e_2,v_2)$$ $$=a_{11}V(e_1,a_{12}e_1+a_{22}e_2)+a_{21}V(e_2a_{12}e_1+a_{22}e_2)$$ $$=a_{11}a_{12}V(e_1,e_1)+a_{11}a_{22}V(e1,e2)+a_{21}V(e_2,e_1)+a_{21}a_{22}V(e_2,e_2)$$
Since $V(e_1,e_1)=0, V(e_2,e_2)=0$ (the parallelogram degenerates to a segment in these cases) and $V(e_2,e_1)=-V(e_1,e_2)$, we get that $V(a_{11}e_1+a_{21}e_2,a_{12}e_1+a_{22}e_2)=(a_{11}a{22}-a_{12}a_{21})V(e_1,e_2)$.
The first factor in this expression is the determinant of the matrix $$\begin{array}{cc}a_{11} & a_{12} \\ a_{21} & a_{22}\end{array}$$ (one can think of this more invariantly as the determinant of the transformation sending the basis vectors $e_1,e_2$ to vectors $v_1,v_2$).
The second factor is the volume of the parallelogram spanned by the vectors $e_1,e_2$. If we choose the vectors $e_1,e_2$ to be the vectors $(1,0)$ and $(0,1)$ in $R^2$, this area is just $+1$.
So we established a close relation between oriented area of parallelograms and determinants of 2x2 matrices.
Let's see whether the same applies to volumes. Choose some orientation of the space. Let $V(v_1,v_2,v_3)$ denote the oriented volume of parallelepiped spanned by the vectors $v_1,v_2,v_3$.
Space for figure
Let $v_{12}^\perp$ denote a unit vector perpendicular to the plane of $v_1,v_2$ in the direction such that the basis $v_1,v_2,v_{12}^\perp$ is positively oriented. Then the volume $V(v_1,v_2,v_3)$ is equal to the product of the unoriented area of parallelogram spanned by $v_1,v_2$ and the oriented length of the projection of $v_3$ to $v_{12}^\perp$. This length is equal to $<v_{12}^\perp,v_3>$ and thus it is linear in the parameter $v_3$. Hence the volume itself is also linear in the parameter $v_3$. By using similar argument (or using $V(v_1,v_2,v_3)=V(v_2,v_3,v_1)=V(v_3,v_1,v_2)$) we get that the volume $V(v_1,v_2,v_3)$ is linear in each of its parameters.
In the same manner we did it for areas, we can now write what the volume of parallelepiped spanned by vectors $v_1,v_2,v_3$ is in terms of the coordinates of the vectors $v_1,v_2,v_3$ with respect to a basis $e_1,e_2,e_3$. Indeed, if $e_j=a_{1,j}e_1+a_{2,j}e_2+a_{3,j}e_3$ for $j=1..3$, then $V(v_1,v_2,v_3)=V(\sum_{i}{a_{i,1}e_i},\sum_{j}{a_{j,2}e_j},\sum_{k}{a_{k,3}e_k})=\sum_{i,j,k}{a_{i_1}a_{j,2}a_{k,3}V(e_i,e_j,e_k)}$. Now if at least two indices among $i,j,k$ are equal, then $V(e_i,e_j,e_k)=0$ (since the parallelepiped degenerates to a parallelogram in this case). The remaining six volumes are related by $V(e_1,e_2,e_3)=V(e_2,e_3,e_1)=V(e_3,e_1,e_2)=-V(e_2,e_1,e_3)=-V(e_3,e_2,e_1)=-V(e_1,e_3,e_2)$. Thus $V(v_1,v_2,v_3)=(a_{1,1}a_{2,2}a_{3,3}+a_{2,1}a_{3,2}a_{1,3}+a_{3,1}a_{1,2}a_{2,3}-a_{2,1}a_{1,2}a_{3,3}+a_{3,1}a_{2,2}a_{1,3}-a_{1,1}a_{3,2}a_{2,3})V(e_1,e_2,e_3)$, or $V(v_1,v_2,v_3)=det(A)V(e_1,e_2,e_3)$ where the matrix $A$ is the matrix $$\begin{array}{ccc}a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33}\end{array}$$ (one can think instead of the matrix of the transformation of $R^3$ to itself sending the basis vectors $e_1,e_2,e_3$ to the vectors $v_1,v_2,v_3$ - the matrix of this transformation with respect to the basis $e_1,e_2,e_3$ is $A$).
The discussion above suggests two ways of extension of the notion of volume to higher dimensional spaces: one is an inductive definition formulated in geometric terms, namely the volume of the $n$-dimensional parallelepiped spanned by vectors $v_1,...,v_n$ is the $n-1$-dimensional volume of the "base" parallelepiped spanned by $v_1,...,v_{n-1}$ multiplied by the "height" - the length of projection of $v_n$ onto the line orthogonal to hyperplane spanned by $v_1,...,v_{n-1}$ and taken with appropriate sign to account for orientations. The other way is to define the volume as the algebraic expression of the determinant of the matrix, whose columns are the coordinates of the vectors $v_1,...,v_n$ with respect to some fixed basis $e_1,...,e_n$. As we've seen in dimensions $2$ and $3$ the two definitions agree: determinants have meaning of volumes and volumes can be computed as determinants.
\section{Cramer's rule}
Let $v$ be a vector in $R^3$ (we will write our computations in $R^3$, but the case of $R^n$ for any positive integer $n$ is no different). Let $e_1,e_2,e_3$ be a basis of $R^3$. Then we know that we can express $v$ as a linear combination of $e_1,e_2,e_3$: $v=\lambda_1 e_1+\lambda_2 e_2+\lambda_3 e_3$. How can we find these $\lambda$'s?
One way is by using the multilinearity property of volumes we've seen in previous section: we can expand the volume $V(v,e_2,e_3)$ as $V(\lambda_1 e_1+\lambda_2 e_2+\lambda_3 e_3,e_2,e_3)=\lambda_1 V(e_1,e_2,e_3)$. Then we get that $\lambda_1=\frac{V(v,e_2,e_3)}{V(e_1,e_2,e_3)}$. Similarly $\lambda_2=\frac{V(e_1,v,e_3)}{V(e_1,e_2,e_3)}$ and $\lambda_3=\frac{V(e_1,e_2,v)}{V(e_1,e_2,e_3)}$.
This is the geometrical meaning of the familiar Cramer's rule that expresses the solution of linear system of equations $A\cdot \mathbf{\lambda}=v$ using determinants. One just has to take the $e_i$'s from our previous discussion to be the columns of the matrix $A$.
\section{Barycentric coordinates}
We will now combine the ideas of center of mass and linearity of volumes together to get convenient coordinated in plane naturally associated with a given triangle in it.
So let $ABC$ be a (non-degenerate) triangle. We've seen that the vector sum $\lambda_1 OA + \lambda_2 OB + \lambda_3 OC$ doesn't depend on the choice of origin $O$ provided that $\lambda_1+\lambda_2+\lambda_3=1$. We will write such a sum as $\lambda_1 A + \lambda_2 B+\lambda_3 C$ (a sort of linear combination of points $A$,$B$,$C$). Such a sum is always in the plane of triangle $ABC$ (because one can choose the origin $O$ to be in the plane $A,B,C$). We now ask the question: can an arbitrary point $P$ in the plane of $ABC$ be represented as a combination $\lambda_1 A + \lambda_2 B+\lambda_3 C$ for some real $\lambda_1,\lambda_2,\lambda_3$ with $\lambda_1+\lambda_2+\lambda_3=1$? If so, is the representation unique and what is the meaning of coefficients $\lambda_1,\lambda_2,\lambda_3$ in it?
So let $P$ denote a point in plane $ABC$. Choose the origin $O$ to lie outside the plane $ABC$, so that $OA, OB, OC$ are now three linearly independent vectors in three-dimensional space. Since three linearly independent vectors in $R^3$ form a basis, the vector $OP$ can be written uniquely as a linear combination of the vectors $OA, OB, OC$: $OP=\lambda_1 OA + \lambda_2 OB+\lambda_3 OC$. To see that in this representation $\lambda_1 + \lambda_2+\lambda_3=1$ we can project all the vectors onto a line passing through points $O,P$, the projection being in direction parallel to the plane $ABC$. Since $P,A,B,C$ lie in the plane along which we are projecting, they all map to the same point, say $K$. The vector $OK$ is non-zero, since $O$ doesn't lie in the plane $ABC$. Now we can apply our projection to the equality $OP=\lambda_1 OA + \lambda_2 OB+\lambda_3 OC$ and use that projection is a linear map to get that $OK=\lambda_1 OK + \lambda_2 OK+\lambda_3 OK$, or $1=\lambda_1 + \lambda_2 +\lambda_3$.
Moreover, we already know how to find the coefficients $\lambda_1,\lambda_2,\lambda_3$ using Cramer's rule: for instance $\lambda_1$ is equal to $\frac{V(OP,OB,OC)}{V(OA,OB,OC)}$. Now the volume of the parallelepiped spanned by the vectors $OP,OB,OC$ is equal to 6 times the volume of tetrahedron $OPBC$ and similarly the volume $V(OA,OB,OC)$ is 6 times the volume of the tetrahedron $OABC$. These two tetrahedra share the same height to the face lying in the plane $ABC$, hence the ratio of their volumes is the same as the ratio of areas of their bases lying in the plane $ABC$. Finally we get $\lambda_1=\frac{\area{PBC}}{\area{ABC}}$. Similarly $\lambda_2=\frac{\area{APC}}{\area{ABC}}$ and $\lambda_3=\frac{\area{APC}}{\area{ABC}}$.
exercise: put $O$ at $B$ and reprove the formulas.
exercise: verify that the answer above satisfies $\lambda_1+\lambda_2+\lambda_3=1$.
Notice that when $\lambda_1+\lambda_2+\lambda_3=1$, the expression $\lambda_1 A+\lambda_2 B+\lambda_3 C$ is just the center of mass of point masses $\lambda_1,\lambda_2,\lambda_3$ put at points $A,B,C$ respectively.
To summarize, we've proved the following: for any point $P$ in the plane $ABC$ we can find unique set of three masses $\lambda_1,\lambda_2,\lambda_3$ with total mass $\lambda_1+\lambda_2+\lambda_3$ equal to $1$, so that when we put them at points $A,B,C$ the center of mass $\lambda_1 A+\lambda_2 B+\lambda_3 C$ coincides with $P$. These masses can be expressed as $\lambda_1=\frac{\area{PBC}}{\area{ABC}},\lambda_2=\frac{\area{APC}}{\area{ABC}},\lambda_3=\frac{\area{APC}}{\area{ABC}}$. These masses $\lambda_1,\lambda_2,\lambda_3$ are called the barycentric coordinates of the point $P$ (with respect to triangle $ABC$).
\section{Affine geometry}
We are now going to discuss a kind of geometry where we can start with bare minimum - knowing what points are collinear and what lines intersect. We shall see that even though notions of length and angles are not defined in this geometry, some remarkable theorems can still be proved.
The space that we will work with will be the usual space $R^n$. Consider the following group of transformations of $R^n$ to itself - the group of transformations $L_{A,b}$ given by $L_{A,b}(x)=A\cdot x + b$, where $A$ is some invertible matrix and $b$ is some vector. We will be interested in the question "what geometrical notions are invariant under all the transformations from this group?".
Let us see for example that the notion of a line is invariant. Every line can be parametrized as the set of points $\{P+\lambda v|\lambda \in R\}$, where $P$ is some point on the line and $v$ is direction along this line. After applying the transformation $L_{A,b}$ to this set, we get the set $\{L_{A_b}(P+\lambda v)|\lambda \in R\}$, or, equivalently, $\{(A\cdot P + b)+\lambda(A\cdot v)|\lambda\in R \}$, which is of course the line passing through the point $A \cdot P + b$ with direction vector $A\cdot v$.
Furthermore, the notion of parallel lines is preserved as well: all lines with direction vector $v$ go after application of transformations $L_{A,b}$ to lines in direction $A\cdot v$.
Another notion that is preserved is that of quotient of directed lengths: indeed, suppose that $B_1,B_2,B_3$ are three collinear points on the line $\{P+\lambda v\}$. Let $B_1=P+\lambda_1 v$, $B_2=P+\lambda_2 v$, $B_3=P+\lambda_3 v$. Then the quotient $\frac{B_1B_2}{B_2B_3}$ is equal to $\frac{(P+\lambda_2 v)-(P+\lambda_1 v)}{(P+\lambda_3 v)-(P+\lambda_2 v)}=\frac{(\lambda_2-\lambda_1)v}{(\lambda_3-\lambda_1)v}=\frac{\lambda_2-\lambda_1}{\lambda_3-\lambda_1}$. If we apply $L_{A,b}$ to the points $B_1,B_2,B_3$ we get $L_{A,b}(B_i)=(A\cdot P+b)+\lambda_i A\cdot v$ and so the quotient of lengths $\frac{B_1B_2}{B_2B_3}$ gets mapped to $\frac{L_{A,b}(B_1)L_{A,b}(B_2)}{L_{A,b}(B_2)L_{A,b}(B_3)}=\frac{\lambda_2 A \cdot v-\lambda_1 A \cdot v}{\lambda_3 A \cdot v-\lambda_1 A \cdot v}=\frac{\lambda_2-\lambda_1}{\lambda_3-\lambda_1}$.
Note however that many standard Euclidean notions, like lengths and angles, are not preserved by these transformations.
For instance we can map any non-degenerate planar triangle $ABC$ to any other non-degenerate triangle $A'B'C'$ by means of one of the maps $L_{M,b}$. Indeed, there is a unique matrix $M$ mapping the two linearly independent vectors $AB$ and $AC$ to the two vectors $A'B'$ and $A'C'$. If we choose now $b$ as the vector by which we have to translate $MA$ to get $A'$, the transformation $L_{M,b}$ will be the one we are looking for (exercise: verify that $L_{M,b}(B)=B'$).
Now it is time to discuss what geometry is in general.
\section{What is geometry?}
The question in the title is not as easy as it might sound and involves answering subquestions like: what spaces do we allow to study? What figures have geometric meaning? What is a geometric theorem? And so on and so on. The answers should be general enough to include at least all the meaningful examples we considered so far, but restrictive enough to exclude too pathological examples. One way to make precise what we mean by "geometry" is to follows F. Klein's ideas in Erlangen program: let $X$ be any set. This will be our space. Let $G$ be a group of maps from $X$ to itself*. This will be our group of transformations. Now we can define a geometric figure: consider all subsets of $X$ up to the following equivalence relation. Two subsets are called equivalent (sometimes simply referred to as "equal") if there is a transformation from group $G$ mapping onto another.
• The term "group" has a precise abstract definition. Here however we only need the following definition - a group of transformations of space $X$ is a subset of all invertible maps of $X$ to itself closed under composition.
Equivalent classes of such subsets will be called "geometrical figures".
We, admittedly, made a definition too general to be useful - we allowed all kinds of "spaces" $X$ and all kinds of groups $G$. A meaningful definition would restrict these notions. We, instead, will consider some examples.
Example 1 - Euclidean geometry. The space $X$ from the definition of geometry will be $R^n$ and the group $G$ will be the group of distance-preserving maps of $R^n$ to itself. In this geometry all points are of course equivalent (one can use a translation to map any point in $R^n$ to any other), but two pairs of points are equivalent if and only if the distance between the two points of the first pair is the same as the distance between the two points of the other pair.
Example 2 - Affine geometry. The space $X$ is again $R^n$, but the group of transformations is now the group of affine transformations $\{x->M\cdot x + b|M$ is invertible matrix, $b$ is a vector in $R^n\}$ (exercise: prove it is a group of transformations). In this geometry not only all pairs of distinct points are equivalent, but we just proved that all non-degenerate triangles are equivalent as well. However if one considers triples $A,B,C$ of collinear points, then the ratio of lengths $\frac{AB}{BC}$ is invariant under affine transformations, so not all triples are equivalent. So we are lead to believe that in affine geometry the notion of ratio of lengths of collinear segments is a meaningful geometric object.
\section{A couple of theorems from affine geometry}
Some of the notions of Euclidean geometry, like "line", "collinearity", "concurrency" are also meaningful in affine geometry (i.e. are invariant under affine transformations). Many others, like "length", "volume", "angle", "circle" do not have any meaning in affine geometry. Instead one can deal with "quotient of oriented lengths", "quotient of oriented volumes", "ellipsi" etc. These notions make sense in affine geometry. Later we will see another example of geometry - projective geometry - where even these do not make sense and instead we have to consider still different notions: "double ratios", "quadrics" etc.
Some theorems we have already seen are in fact of affine nature - their proper formulations use only affine notions and, moreover, affine proofs can be given. This is the case for Ceva's and Menelaus's theorems - they can be formulated in such a way that only the notions of collinearity/concurrency and quotients of oriented lengths of segments are used.
Note, however, that some of the corollaries of Ceva's and Manelaus' theorems we've seen in lecture 1 do not have affine analogues. Indeed, while the notion of medians in triangle is affine, the notions of heights or angle bisectors simply do not exist in affine geometry.
Let's now prove another theorem of affine geometry. Let $E$ be an ellipse and consider a family of parallel lines that intersect $E$. For each line $l$ in the family consider the mid-point of the segment of $l$ inside the ellipse. Then all these midpoints lie on the same line.
Space for figure.
Proof: The notions of collinearity, midpoints of segments and parallel lines are all invariant under affine transformations. So instead of proving the original theorem, we can first apply an affine transformation and prove the theorem for the image we get under this transformation. We will now choose an affine transformation that maps the ellipse to a circle (such a transformation exists: we can for instance stretch the ellipse in the direction of one of its major axes).
Now we have to prove the theorem in the case $E$ is a circle. But it is obvious now: all the midpoints lie on the diameter of the circle that is perpendicular to the family of parallel lines.
Space for figure.
Exercise: Prove that if an ellipse is inscribed into triangle, then the lines connecting the vertices to points of tangency of the ellipse with the sides of the triangle are concurrent.
\section{Why affine geometry is natural?}
In previous sections we've introduced the group of affine transformations of the form $x->Mx+b$ for some invertible matrix $M$ and vector $b$. While we've seen that considering such a group is rewarding in terms of theorems we can prove, the choice of this group seems to be quite artificial. In this section we will prove the following theorem that explains that if one is interested in properties related only to collinearity, he is lead to work with affine geometry.
Theorem: Let $f$ be an invertible function of $R^n$ to itself. Suppose that $f$ maps lines to lines (i.e. if $l$ is a line in $R^n$ then $f(l)$ is also a line) and that $n\ge 2$. Then $f$ is an affine transformation, i.e. there exists an invertible matrix $M$ and vector $b$ so that $f(x)=M x+b$ $\forall x\in R^n$.
Proof: We will prove this theorem for $n=2$ and refer the reader to exercise ... for the general case.
Let $O,A,B$ be three non-collinear points. Then the points $f(O),f(A),f(B)$ are also non-collinear (indeed, suppose the opposite is true. Then the lines $OA,OB$ must map to the line through $f(O),f(A),f(B)$. Then any two points $A',B'$ lying on the lines $OA,OB$ should be mapped to points on the line through $f(O),f(A),f(B)$. But then the line connecting them should also be mapped to the line through $f(O),f(A),f(B)$ and every point in the plane lies on such a line for some choice of points $A',B'$. This contradicts the assumption that $f$ is invertible).
If so, we can find an affine transformation $L$ mapping $f(O)$ to $O$, $f(A)$ to $A$ and $f(B)$ to $B$. Since affine transformations map lines to lines and are invertible, the composite mapping $\phi=L\circ f$ maps lines to lines, is invertible and fixes points $O,A,B$. We will prove below that any such map must be the identity map. Once we prove it, it will follow that $f=L^{-1}$, so $f$ is affine.
Suppose now that function $\phi:R^2->R^2$ is an invertible mapping that maps line to lines and fixes three non-collinear points $O,A,B$. We will identify the line through $O$ and $A$ with $R$ in such a way that $O$ is identified with $0\in R$ and $A$ - with $1\in R$. Then when we restrict $\phi$ to this line, we get a map (we will call it $\phi$ as well) from $R$ to $R$ fixing $0$ and $1$. Let's prove a couple of properties of this mapping $\phi$:
Property 1: $\phi(X+Y)=\phi(X)+\phi(Y)$ for any $X,Y\in R$.
Proof: Parallel lines get mapped by $\phi$ to parallel lines (because in Euclidean geometry parallel lines can be defined simply as lines without any common points, so invertible functions that send lines to lines preserve this notion). Hence parallelograms get mapped to parallelograms. Now we can construct the point $x+y$ from $x$ and $y$ by the following procedure that involves only drawing parallelograms:
draw parallelogram $OXDB$ spanned by vectors $OB$ and $OX$ (the point $D$ is just the name for the other vertex of this parallelogram). Now draw the parallelogram spanned by vectors $BD$ and $BY$. Its fourth vertex is the point $X+Y$.
Now under $\phi$ the point $X+Y$ gets mapped to $\phi(X+Y)$ and the construction shows that it must get mapped to $\phi(X)+\phi(Y)$. Hence $\phi(X+Y)=\phi(X)+\phi(Y)$.
Property 2: $\phi(X\cdot Y)=\phi(X)\cdot \phi(Y)$ for any $X,Y\in R$.
Proof: We can construct the point $X\cdot Y$ by the following procedure involving only construction of parallel lines through points: construct line through $X$ that is parallel to the line $AB$. Let $D$ be the point of intersection of this line with line $OB$. Construct the line through $D$ that is parallel to $BY$. The point of intersection of this line with $OA$ is $X\cdot Y$.
Under the mapping $\phi$ the point $X\cdot Y$ gets mapped to $\phi(X\cdot Y)$ and, following the construction, to $\phi(X)\cdot\phi(Y)$. So $\phi(X\cdot Y)=\phi(X)\cdot \phi(Y)$.
Now properties $1$ and $2$ suffice to prove that $\phi:R->R$ is the identity. Indeed, for any natural number $p$, $\phi(p)=\phi(1+1+...+1)=p\phi(1)=p$. Hence for any pair of natural numbers $p,q$, $q\phi(p/q)=\phi(p/q)+...+\phi(p/q)=\phi(q\cdot p/q)=\phi(p)=p$, i.e. $\phi(p/q)=p/q$. Also $\phi(-p/q)+\phi(p/q)=\phi(0)=0$, so $\phi(-p/q)=-p/q$. So $\phi$ fixes all rational numbers (note that so far we used only property 1). To prove that $\phi$ fixes all real numbers, we should show some continuity. We first claim that if $x\in R$ is positive, then $\phi(x)$ is also positive. Indeed, if $x\ge 0$, then there is some $a\in R$ such that $x=a^2$. Then $\phi(x)=\phi(a\cdot a)=\phi(a)^2\ge 0$.
It follows that if $y\ge z$, the $\phi(y) \ge \phi(z)$: if $y\ge z$, then $y=z+x$ for some positive $x$, and then $\phi(y)=\phi(z)+\phi(x)\ge \phi(z)$.
Combining this property and the fact that $\phi$ fixes all rational numbers, we get that if $y$ lies between two rational numbers, then $\phi(y)$ lies between the same rational numbers. This is enough to guarantee that $\phi(y)=y$.
By now we have showed that $\phi$ fixes any point on $OA$. Similarly it fixes any point on $OB$. Hence it fixes all points on lines through a point on $OA$ and a point on $OB$. But every point in the plane lies on such a line, so all points are fixed by $\phi$.