This document reviews the fundamentals of what we mean by “vector”. It is more of a review than an introduction or tutorial. It tries to avoid some of the misconceptions that often creep into unsophisticated discussions of vectors.
As can be seen in figure 1, we can classify vectors along two different axes: We can classify them as contravariant versus covariant, and we can classify them as arrays versus topological vectors.
In this section we focus on the distinction between arrays and topological vectors, i.e. the row-to-row distinction in figure 1. (We defer the main discussion of contravariant versus covariant until section 10.)
A great number of difficulties stem from the fact that the word “vector” can refer to two different concepts, either:
It is important to keep track of which is which, and it is most unfortunate that the name “vector” has been applied to both.
Each of these concepts is tremendously useful. Ideally one should be able to use both concepts, leveraging one against the other.
It is very common terminology to say that the following equation expresses multiplying a vector by a scalar and then adding two vectors:
| 2 | ⎡ ⎢ ⎣ |
| ⎤ ⎥ ⎦ | + | ⎡ ⎢ ⎣ |
| ⎤ ⎥ ⎦ | = | ⎡ ⎢ ⎣ |
| ⎤ ⎥ ⎦ | (1) |
To avoid confusion later, we will call this type of vector an array. This terminology is consistent with the usage in computer programming, where it is common to call a one-dimensional array a vector. Such an array is just collection of elements, with no requirement that the collection have any particular geometrical, topological, or physical significance.
For present purposes, we restrict attention to arrays for which we have well-behaved methods for adding arrays (element by element) and multiplying arrays by scalars, satisfying the axioms that define what we mean by “vector space”, as set forth in (e.g.) reference 1.
That still doesn’t mean that there is any requirement for geometrical, topological, or physical significance. In particular, there is no requirement that such a “vector” exhibit invariance with respect to rotations or other transformations.
You can carry out the calculation in equation 1 without pretending it has any real-world significance. Just do the arithmetic: 2(1) + 5 = 7 and 2(−2) + 3 = −1.
We can give an array a name such as
| B := | ⎡ ⎢ ⎣ |
| ⎤ ⎥ ⎦ | (2) |
which just defines B; it doesn’t tell us anything we didn’t already know.
Meanwhile, there is another kind of “vector”. To avoid ambiguity, I will call it a topological vector. It is tempting to call them geometric vectors but that is not quite right, because as discussed in section 10 there are important physical situations where we have a topology but don’t have a geometry. A subset of topological vectors will be called physical vectors.
A topological vector is an object P that lives in some space, possibly ordinary three-dimensional position-space, or possibly some more abstract space. The topological vector has direct, inherent significance in that space. Topological vectors can be diagrammed as arrows or by contours, without reference to any basis, as shown in the lower row in figure 1.
Pointy vectors can be added using the tip-to-tail rule. A similarly direct, graphical rule can be used to add one-forms.
There is no unique way to expand a topological vector in terms of its components. That is, without some nontrivial additional information, we cannot assign any useful meaning to an expression of the form
| P = | ⎡ ⎢ ⎣ |
| ⎤ ⎥ ⎦ | (?allegedly?) (3) |
because we don’t know which observer’s basis is to be used. It would make incomparably more sense to write something like
| P = | ⎡ ⎢ ⎣ |
| ⎤ ⎥ ⎦ |
| (4) |
which means
| P = a ex(M) + b ey(M) (5) |
where ex(M) and ey(M) are the basis vectors in Moe’s frame. The interesting thing is that we can equally well express P as:
| P = | ⎡ ⎢ ⎣ |
| ⎤ ⎥ ⎦ |
| (6) |
which means
| P = p ex(J) + q ey(J) (7) |
where ex(J) and ey(J) are the basis vectors in Joe’s frame. Note that all four of these expressions (equation 4) through equation 7) are simultaneously valid descriptions of the same topological vector P, since a ex(M) + b ey(M) = p ex(J) + q ey(J).
Beware: You might be tempted to think that the symbol ex represents the x-component of the e vector. That’s what the notation seems to suggest, but it’s not necessary to think of it in those terms, especially in an introductory course. Instead, keep in mind that ex is an entire vector unto itself. The subscript x is part of the name of the vector, and indicates which member of the basis set we are talking about, namely the basis vector in the x-direction. The ex vector has components of its own; its x-component can be written as (ex)x.
It is possible to alleviate this problem by choosing different names for the basis vectors. One choice is to use {i, j, k} instead of {ex, ey, ez}. Some people like to decorate unit vectors by writing a hat over them, as in {î, ĵ, k̂}. Another version of this is {x̂, ŷ, ẑ}. To summarize:
| (8) |
However, I don’t recommend any of the substitutions in equation 8, for reasons to be discussed below. So let’s return to using {ex, ey, ez} to denote our chosen basis.
It is possible to think of e as some sort of higher-rank object, as a vector of vectors. This is not wrong, but it is not particularly helpful either. Especially in an introductory course, it is simpler to say that the subscript x is just part of the name of ex.
For reasons to be discussed below, in non-introductory courses, the coordinates names {x1, x2, x3} are found to be more practical than {ex, ey, ez}. That is,
| (9) |
The basis vectors are numbered accordingly. Also, in the Clifford-algebra literature, basis vectors are commonly denoted by γ rather than e. That is,
| (10) |
In equation 10, all three options for naming unit vectors have the same potential to cause confusion, because e1 looks like a 1-component just as much as ex looks like an x-component. Using γ instead of e isn’t a fundamental change, but reduces the conflict with other uses of the symbol e. For the rest of this document, we will use γ.
The practice of using subscripts to denote which basis vector is well established in the math and physics community and is unlikely to change anytime soon The obvious disadvantage is that is a never-ending source of confusion for students. The practice does however have some significant advantages, such as allowing expressions such as equation 5 to be written more compactly:
| (11) |
where the last line uses the Einstein summation convention, i.e. implied summation over repeated indices.
Within the grand category of vectors that are just arrays of elements, we can distinguish basis vectors from other vectors. There is a more-or-less “natural” basis for arrays, namely the vectors
| γ1 = | ⎡ ⎢ ⎣ |
| ⎤ ⎥ ⎦ | , γ2 = | ⎡ ⎢ ⎣ |
| ⎤ ⎥ ⎦ | (12) |
Within the grand category of topological vectors, the situation is much more complex. Any complete set of orthonormal vectors can be used as a basis. You can choose any basis set you like (but keep in mind that other folks may choose differently) .
In figure 2, Joe’s reference frame is shown in blue, while Moe’s reference frame is shown in red. The topological vector P is shown by the heavy black arrow. It does not “belong” to either frame; it is a real topological object with its own independent existence. Joe looks at P from one viewpoint, while Moe looks at it from another viewpoint.
The crucial point is that P is not changed when we switch from one reference frame to another. The topological vector neither knows nor cares who – if anyone – is looking at it.
Even though numerically a is not equal to p, and b is not equal to q, equation 13 is in fact an equation, expressing the equality of two topological vectors:
| ⎡ ⎢ ⎣ |
| ⎤ ⎥ ⎦ |
| = | ⎡ ⎢ ⎣ |
| ⎤ ⎥ ⎦ |
| (13) |
This is like writing 2·12 = 3·8 ... it’s two expressions for the same thing.
In contrast, the corresponding arrays are not equal, since by definition arrays are equal if-and-only-if their corresponding components are equal:
| ⎡ ⎢ ⎣ |
| ⎤ ⎥ ⎦ | ≠ | ⎡ ⎢ ⎣ |
| ⎤ ⎥ ⎦ | (14) |
as you can easily verify by using figure 2 to graphically evaluate p and q (the projections of the vector onto Joe’s axes) versus a and b (the projections of the vector onto Moe’s axes). You can also infer this by comparing equation 5 to equation 7.
All the fundamental laws of physics are independent of the choice of basis. That means it is possible (and indeed common) for the basis vectors to have no direct physical significance. Sometimes it is convenient to choose basis vectors that are related to the physically-relevant vectors, but this is by no means necessary.
If you see a law that seems to depend on the choice of basis, you should find a better way of expressing the law.
In some situations there may be a “conventional” basis, but the conventions are definitely context-dependent; they vary from situation to situation.
Note that all three of the foregoing examples are right-handed systems.
Note that when describing directions on paper, even when the paper is lying on a table, it is conventional to speak of the “vertical” direction as if the paper were hanging on a wall. This conflicts with almost all other uses of the word “vertical”, but that’s how it is.
People are very commonly confused about whether a drawing represents a a basis vector or some other vector. This confusion afflicts students from high school to grad school inclusive.
Let’s consider the situation shown in figure 3. The basis vector γ1 points to the right, while the vector of interest F points to the left. The vector of interest is drawn to scale (including magnitude as well as direction) as an arrow pointing to the left. The vector of interest can be represented numerically as a negative number times the basis vector.
This does not mean that we have a negative amount of the leftward arrow! That would be double-counting the minus sign (and squaring the magnitude as well). The leftward arrow is complete unto itself.
Constructive suggestion: Always label your vectors so as to make it clear what is a basis vector and what is not. See section 9 for some related suggestions.
When I started graduate school, the very first homework assignment involved a vector pointing to the left. Every first-year student diagrammed it as an arrow pointing to the left, and quantified it as a negative number. The grader (a third-year grad student) marked every student wrong.
Some of us got out our torches and pitchforks and marched to the grader’s office to explain a few things, as shown in figure 4.
Although textbooks often leave it unsaid, current is a vector. The current in a wire is assumed to run along the wire, but that still leaves a question as to which of the two possible directions along the wire. It is a longstanding and necessary tradition that in circuit diagrams such as figure 5, we take I3 to be a scalar, and the corresponding physical current vector is I3 times a unit vector in the direction of the arrowhead shown on the diagram.
Introductory textbooks often disguise this issue because they can use 20/20 hindsight to arrange the arrowheads so that all (or nearly all) DC currents have positive components. In the real world of circuit analysis, even for DC circuits you often have to draw the diagram and define your variables before you know which way the current is going to be flowing … and for AC circuits talking about “the” direction of the current is pointless anyway.
To repeat: The convention does not say that the current flows in the direction of the arrowhead in the circuit diagram. If I3 is negative, the direction of current flow is opposite to the arrowhead. The arrowhead is just a basis vector. This is easy to remember because the arrowhead has no length associated with it; it’s just a disembodied arrowhead, not a complete arrow.
In any situation where there is only one reference frame being used, the distinction between arrays and topological vectors is not very important.
However, in many situations – in physics as well as life in general – it is important to be able to see things from more than one point of view. The technique of switching from one reference frame to another has been standard procedure in physics since at least the time of Galileo.
The fundamental laws of physics can be written as topological vector equations, valid in any reference frame, and indeed valid even if you don’t have any reference frame at all. Examples include the second law of motion and the Maxwell equations, which are conventionally represented in terms of 3-dimensional topological vectors, so that the representation is manifestly invariant with respect to arbitrary rotations in 3-dimensional space.
Tangential remark: With a bit of extra work, these laws can be re-expressed in terms of 4-dimensional topological vectors, making manifest their invariance with respect to arbitrary rotations in spacetime – including boosts – as discussed in reference 2 ... but if you don’t know what this remark means, don’t worry about it.
The best use of topological vectors and arrays is to use them together, leveraging one against the other. Topological vectors are particularly useful at the conceptual and strategic level, to set up the outline of the calculation and carry out the major steps. Every so often along the way, expanding the topological vector as an array of components in a chosen basis is useful for evaluating this-or-that subexpression. An amusing example of this combined approach can be found in reference 3.
It is important to think clearly about both topological vectors and arrays. Alas, the fact that people heretofore have used the same term – “vector” – to refer to two different concepts makes it hard to think clearly about either one.
Students’ proficiency with topological vectors seems to be a regrettably non-monotonic function of their overall sophistication:
The problem is that all too commonly, their knowledge of components eclipses their understanding of topological vectors.
The thing we have been calling a topological vector is properly called a tensor; all our examples have been rank=1 tensors. Similarly, the thing we have been calling an array is properly called a matrix; all our examples have been rank=1 matrices, i.e. columnar matrices with N elements (as opposed to the more common square matrices with N×N elements). Therefore what we have been calling array elements could also be called matrix elements.
Tensors exist as geometric objects unto themselves, independent of their representation in terms of matrices in this-or-that basis. Tensors can be represented by matrices in much the same way as numbers can be represented by numerals.
We postponed calling topological vectors and arrays by their proper names (tensors and matrices) for pedagogical reasons. As the saying goes, learning proceeds from the known to the unknown. The important ideas in this paper do not require any prior knowledge of tensors or matrices.
Vectors in one dimension are a special case, because there is an isomorphism between the vectors and the scalars.
This causes problems in an introductory physics class, because the usual practice is to introduce vectors in connection with one-dimensional motion. It makes sense to keep things as simple as possible, hence D=1. On the other horn of the dilemma, it makes sense to develop good habits and avoid bad habits that will have to be unlearned later.
Suggestion: Avoid writing equations of the form
| (15) |
and especially avoid
| (16) |
because the LHS of these equations is a vector while the RHS is a scalar. Also, in equation 16, the “>” operator only exists for scalars. In D=1, such equations are arguably technically acceptable, but they are pedagogically unsound, because they blur the distinction between vectors and scalars.
Constructive suggestion: Better alternatives exist.
You can use this to keep track of the distinction between vectors and scalars. For example it allows you to write
| (17) |
where we have dropped the arrow from v in accordance with the no-decoration policy explained in section 9.
Possibly constructive suggestion: It is useful to distinguish:
Depending on context, the word “component” can refer to either an array element or to a projection, as follows:
For any topological vector V, this means:
Remember that the term “component” could mean either the scalar γx · V or the vector γx (γx · V). If you need to avoid ambiguity, you can avoid the term “component” entirely; if you mean “array element” say “array element” (or “matrix element”), and if you mean “projection” say “projection”.
Projections have the nice property that we can take projections in any direction, not just along some pre-ordained set of basis directions. In general, the projection operator in the q-direction is
| Pq := |
| for any nonzero vector q (18) |
so that Pq(V) is the projection of V in the q-direction. This allows you to form projections in the direction of some q that is physically relevant to the problem. You don’t want to be restricted to a pre-ordained basis that typically has little or no physical significance.
As a rule, whenever you can formulate the problem topologically, using projections, you’re better off doing it that way, rather than introducing a basis and grinding out the array elements.
The denominator in equation 18 ensures that the projection operator works properly for any nonzero q, not just for unit vectors.
Tangential remark: It is important to keep computers in their place. When computers work with vectors, they use matrix elements internally ... but so what? The physics is still in the arrows and contours, not in the matrix elements. By way of analogy, computers do arithmetic using binary internally, but that doesn’t mean we humans should switch to using binary for everyday purposes. I’ve never seen a 110111 MPH speed-limit sign. (I keep expecting some smart-aleck student to make one, but it hasn’t happened yet.)
Some textbooks are better about this than others. Some of them define vectors as arrows ... but then lapse into element language as soon as they start doing actual calculations. Fooey.
If you are in a hurry, you can judge a textbook according to how it defines the dot product.
A second quick way of checking a text involves looking to see if the term “projection operator” is in the index. Alas I don’t know of any general-physics text that passes this test. (If anybody knows of one, please tell us about it.)
A more thorough check of the text involves looking at the end-of-chapter problems to see if they involve geometric relationships between vectors, as opposed to grinding out matrix elements.
The cleanest way I know to define the dot product is to postulate the existence of three vectors in space (or four vectors in spacetime) for which we know the dot products. It is not necessary to assume these vectors are orthonormal, but without loss of generality we will do so, for convenience.
We postulate the existence of at least one set of basis vectors with the following properties:
| (19) |
which just says the basis is orthonormal. In spacetime, we generalize equation 19 as follows:
| (20) |
but if you aren’t doing relativity you can ignore equation 20; don’t worry about it.
We do not assume there is only one basis. Given any such basis, you can construct innumerable other bases by taking linear combinations.
In any case, we postulate that the dot product is bilinear ... which is the same as saying there is a distributive law, such that “dot” distributes over “plus” as follows:
| (21) |
Given all that, you can calculate the dot product of any two vectors by expanding each vector in terms of the basis vectors in accordance with equation 5, redistributing the terms in accordance with equation 21, and then dotting the basis vectors using equation 19.
We can write any possible vector as a linear combination of the basis vectors. Then we can take the dot product of any vector by direct appeal to the axioms. In particular, as a corollary of this definition of dot product, suppose we have two spacelike vectors A and B that are known in terms of linear combinations of the basis vectors, namely
| (22) |
then their dot product is
| A · B = AxBx + AyBy + AzBz (spacelike) (23) |
as you can verify by direct substitution and turning the crank. We emphasize that equation 23 is not the definition of dot product; it is merely a corollary, valid under certain conditions. (The corresponding expression for spacetime vectors has a minus sign in it, not all plus signs.)
When we follow this approach, the dot product defines what we mean by angle. It also defines what we mean by length. This is important in abstract and/or unfamiliar spaces, where the notions of angle and length might not have been intuitively obvious. In particular, equation 19 has an elegant, simple, but very nontrivial extension to spacetime, as discussed in reference 2.
This approach (the axiomatic definition of dot product) reverses the idea in item (b), allowing us to define cos(θ) := A · B / |A| |B|.
As discussed in reference 4, a name is not the same as an explanation. Do not expect the structure of a name or symbol to tell you everything you need to know. Most of what you need to know belongs in the legend. The name or symbol should allow you to look up the explanation in the legend.
The convention of using boldface to represent vectors fails both in handwritten notes and in ascii email. The convention of drawing an arrow atop the symbol fails in email.
The convention of using a decorated letter to represent a vector while the corresponding undecorated letter represents the magnitude of the vector is cute, but is not worth the trouble. If you want the magnitude of F, write |F| explicitly. The cost of writing |F| when you want the magnitude is infinitesimal compared to the cost of decorating F when you want the whole vector.
Perhaps most importantly: All schemes involving decorated vectors fail miserably in the context of Clifford algebra, aka geometric algebra, where some quantities have both a scalar piece and a vector piece. See reference 5. A rotor is an important example with a scalar piece and a bivector piece.
It is remarkable that the mathematical definition of “vector space” (as set forth in reference 1) does not include any mention of a dot product.
That means we can have vector spaces for which we have no notion of length and no notion of angle. An important physical example of this is thermodynamics. As discussed in reference 6, there is an abstract space – state space – where there are various “functions of state” including energy (E), entropy (S), enthalpy (H), pressure (P), temperature (T), volume (V), et cetera. The gradient vectors dE, dS, dH, dP, dT, dV, et cetera are well defined (usually if not always), but there is no way of knowing the angle between such vectors. (Occasionally somebody will assume that a certain pair of such vectors is orthogonal, but there is no advantage to making such an assumption. If you do the math right, any valid result that can be obtained with such an assumption can be obtained without it, in every case I’ve ever seen.)
Such vector spaces tend to come in pairs, pairing a space of pointy vectors with a space of one-forms. That’s a good thing, because either member of the pair by itself wouldn’t be very useful.
If you visualize a pointy vector as a little arrow with a “tip” and a “tail”, you absolutely should not visualize a 1-form the same way.
Suppose we want to visualize the gradient of some landscape. If you visualize the gradient as a pointy vector, it points uphill. In many cases, though, you are better off visualizing the gradient as a one-form, corresponding to contour lines that run across the slope.
You can judge the magnitude of the 1-form according to how closely packed the contour lines are. Closely-packed contours represent a large-magnitude 1-form. To say the same thing the other way, the spacing between contours is inversely related to the magnitude of the one-form.
Contour lines have the wonderful property that they behave properly under a change of coordinates: if you take a landscape such as the one in figure 1. and stretch it horizontally (keeping the altitudes the same) as shown in figure 7, the slopes become less. The contour lines on the corresponding topographic map spread out by the same stretch factor, as they should, to represent the lesser slope. In contrast, if you try to represent the gradient by pointy vectors, the representation is completely broken by a change in coordinates. As you stretch the map, the pointy vector doesn’t stretch; it has to get shorter to represent the lesser slope. If you want to represent a gradient, pointy vectors aren’t nearly so well-behaved as 1-forms; they aren’t attached to the real landscape the way contour lines are.
Of course, pointy vectors are needed also; they are appropriate for representing the location of one point relative to another in this landscape. These location vectors do stretch as they should when we stretch the map.
| pointy vector | one-form | |||
| Example: | distance | slope | ||
| Represented by: | column vector | row vector | ||
| When we stretch the map: | gets bigger | gets smaller | ||
| Adjective: | contravariant | covariant | ||
| Dirac notation: | ket |⋯⟩ | bra ⟨⋯| | ||
See reference 7 for more about Dirac bra-ket notation.
It is useful to deal with vectors as objects unto themselves, i.e. vectors without components.
It is also useful to deal with arrays of components, i.e. matrix elements.
A valuable and easily-achievable goal is to be able to see things both ways. A skilled person should be able to switch from basis A to basis B to no basis at all and back again.
We should avoid the eclipse mentioned in item (b) in section 5. That is, we should teach people to use matrix elements while deepening – not lessening – their understanding of topological vectors as real, physical objects that have meaning independent of their elements, independent of any basis, and independent of any observers.