_ [Contents]

Copyright © 2005–2010 jsd

*   The Laws of Thermodynamics

Thermodynamics is celebrated for its power, generality, and elegance. However, all too often, students are taught some sort of pseudo-thermodynamics that is infamously confusing, limited, and ugly. This document is an attempt to do better, i.e. to present the main ideas in a clean, simple, modern way.

The first law of thermodynamics is usually stated in a very unwise form.   We will see how to remedy this.

The second law is usually stated in a very unwise form.   We will see how to remedy this, too.

The so-called third law is a complete loser. It is beyond repair.   We will see that we can live without it just fine.

Many of the basic concepts and terminology (including heat, work, adiabatic, etc.) are usually given multiple mutually-inconsistent definitions.   We will see how to avoid the inconsistencies.

Many people remember the conventional “laws” of thermodynamics by reference to the following joke:1

  0)   You have to play the game;
  1)   You can’t win;
  2)   You can’t break even, except on a very cold day; and
  3)   It doesn’t get that cold.

It is not optimal to formulate thermodynamics in terms of a short list of enumerated laws, but if you insist on having such a list, here it is, modernized and clarified as much as possible. The laws appear in the left column, and some comments appear in the right column:

The zeroth law of thermodynamics tries to tell us that certain thermodynamical notions such as “temperature”, “equilibrium”, and “macroscopic state” make sense.   Sometimes these make sense, to a useful approximation … but not always. See section 3.

The first law of thermodynamics states that energy obeys a local conservation law.   This is true and important. See section 1.4.

The second law of thermodynamics states that entropy obeys a local law of paraconservation.   This is true and important. See section 2.

There is no third law of thermodynamics.   The conventional so-called third law alleges that the entropy of some things goes to zero as temperature goes to zero. This is never true, except perhaps in a few extraordinary, carefully-engineered situations. It is never important. See section 4.

To summarize the situation, we have two laws (#1 and #2) that are very powerful, reliable, and important (but often misstated and/or conflated with other notions) plus a grab-bag of many lesser laws that may or may not be important and indeed are not always true (although sometimes you can make them true by suitable engineering). What’s worse, there are many essential ideas that are not even hinted at in the aforementioned list, as discussed in section 5.

We will not confine our discussion to some small number of axiomatic “laws”. We will carefully formulate a first law and a second law, but will leave numerous other ideas un-numbered. The rationale for this is discussed in section 6.9.

The relationship of thermodynamics to other fields is indicated in figure 1. Mechanics and many other fields use the concept of energy without worrying very much about entropy. Meanwhile, information theory and many other fields use the concept of entropy without worrying very much about energy; for more on this see section 21. The hallmark of thermodynamics is that it uses both energy and entropy.

energy+entropy
Figure 1: Thermodynamics, Based on Energy and Entropy
This document is also available in PDF format. You may find this advantageous if your browser has trouble displaying standard HTML math symbols.

0  Prerequisites, Goals, and Non-Goals

This section is meant to provide an overview. Therefore this section will not explain the ideas, but only mention the main ideas that will be explained later. If you want to go directly to the actual explanations, feel free to skip this section.

  1. There is an important distinction between fallacy and absurdity. An idea that makes wrong predictions every time is absurd, and is not dangerous, because nobody will pay any attention to it. The most dangerous ideas are the ones that are often correct or nearly correct, but then betray you at some critical moment.

    Most of the fallacies you see in thermo books are pernicious precisely because they are not absurd. They work OK some of the time, especially in simple “textbook” situations … but alas they do not work in general.

    The main goal here is to formulate the subject in a way that is less restricted and less deceptive. This makes it vastly more reliable in real-world situations, and forms a foundation for further learning.

    In some cases, key ideas can be reformulated so that they work just as well – and just as easily – in simple situations, while working vastly better in more-general situations. In other cases, we must be content with less-than-general results, but we will make them less deceptive by clarifying their limits of validity.

  2. We distinguish cramped thermodynamics from uncramped thermodynamics as shown in figure 2. See section 18 for more on this. There are some simple ideas such as specific heat capacity (or molar heat capacity) that can be developed within the limits of cramped thermodynamics, at the high-school level or even the pre-high-school level, and then extended to all of thermodynamics. The extension must be done carefully, as you can see from the fact that the energy capacity CV is different from the enthalpy capacity CP, yet both are widely (if not wisely) called the “heat” capacity.
    cramped
    Figure 2: Cramped versus Uncramped Thermodynamics

    Alas there are some other ideas such as “heat content” that are attractive in the context of cramped thermodynamics but extremely deceptive if you try to extend them to uncramped situations.

  3. Uncramped thermodynamics has a certain irreducible amount of complexity. If you try to simplify it too much, you trivialize the whole subject, and you arrive at a result that wasn’t worth the trouble. When non-experts try to simplify the subject, they all-too-often throw the baby out with the bathwater.
  4. You can’t do thermodynamics without entropy. Entropy is defined in terms of statistics. As discussed in section 2, people who have some grasp of basic probability can understand entropy; those who don’t, can’t. This is part of the price of admission. If you need to brush up on probability, sooner is better than later.

    We do not define entropy in terms of energy, nor vice versa. We do not define either of them in terms of temperature. Entropy and energy are well defined even in situations where the temperature is unknown, undefinable, irrelevant, or zero.

  5. Uncramped thermodynamics is intrinsically multi-dimensional. Even the basic expression dE = − P dV + T dS involves five variables. To make sense of this requires partial derivatives. If you don’t understand how partial derivatives work, you’re not going to get very far.

    Furthermore, when using partial derivatives, we must not assume that “variables not mentioned are held constant”. That idea is a dirty trick than may work OK in some simple “textbook” situations, but causes chaos when applied to uncramped thermodynamics, even when applied to something as simple as the ideal gas law, as discussed in reference 1. The fundamental problem is that the various variables are not mutually orthogonal. Indeed, we cannot even define what “orthogonal” should mean, because there is no notion of length or angle in thermodynamic parameter-space. This is another reason why thermodynamics is intrinsically and irreducibly complicated.

    Uncramped thermodynamics is particularly intolerant of sloppiness, partly because it is so multi-dimensional, and partly because there is no notion of length or angle in thermodynamic parameter-space. Unfortunately, some thermo books are sloppy in the places where sloppiness is least tolerable.

    The usual math-textbook treatment of partial derivatives is dreadful. The standard notation for partial derivatives practically invites misinterpretation.

    Some fraction of this mess can be cleaned up just by being careful and not taking shortcuts. Also it may help to visualize partial derivatives using the methods presented in reference 2. Even more of the mess can be cleaned up using differential forms, i.e. exterior derivatives and such, as discussed in reference 3. This raises the price of admission somewhat, but not by much, and it’s worth it. Some expressions that seem mysterious in the usual textbook presentation become obviously correct, easy to interpret, and indeed easy to visualize when re-interpreted in terms of gradient vectors. On the other edge of the same sword, some other mysterious expressions are easily seen to be unreliable and highly deceptive.

  6. You must appreciate the fact that not every vector field is the gradient of some potential. Many things that non-experts wish were gradients are not gradients. You must get your head around this before proceeding. Study Escher’s “Waterfall” as discussed in reference 3 until you understand that the water there has no well-defined height. Even more to the point, study the RHS of figure 12 until you understand that there is no well-defined height function, i.e. no well-defined Q as a function of state. See also section 7.1.

    The term “inexact differential” is sometimes used in this connection, although the term is mildly misleading. We prefer the term ungrady. In any case, you must treat path-dependent integrals as path-dependent integrals, not as potentials, i.e. not as functions of state. See section 18 for more on this.

    To say the same thing another way, we will not express the first law as dE = dW + dQ or anything like that, even though it is traditional in some quarters to do so. For starters, although such an equation may be meaningful within the narrow context of cramped thermodynamics, it is provably not meaningful for uncramped thermodynamics, as discussed in section 7.1 and section 18. It is provably impossible for there to be any W and/or Q that satisfy such an equation when thermodynamic cycles are involved.

    Even in cramped situations where it might be possible to split E (and/or dE) into a thermal part and a non-thermal part, it is often unnecessary to do so. Often it works just as well (or better!) to use the unsplit energy, making a direct appeal to the conservation law, equation 3.

  7. Almost every newcomer to the field tries to apply ideas “thermal energy” or “heat content” to uncramped situations. It always almost works ... but it never really works. See section 18 for more on this.
  8. On the basis of history and etymology, you might think thermodynamics is all about heat, but it’s not. Not anymore. By way of analogy, there was a time when what we now call thermodynamics was all about phlogiston, but it’s not anymore. People wised up. They discovered that questions about phlogiston could be more conveniently and more precisely re-expressed in terms of oxygen and/or energy. More recently, it has been discovered that questions about heat can be more conveniently and more precisely re-expressed in terms of entropy and/or energy.

    Heat remains central to unsophisticated cramped thermodynamics, but the modern approach to uncramped thermodynamics focuses more on energy and entropy. Energy and entropy are always well defined, even in cases where heat is not.

    You can do thermodynamics without heat. You can even do quite a bit of thermodynamics without temperature. But you can’t do thermodynamics without energy and entropy.

    There are multiple mutually-inconsistent definitions of “heat” that are widely used – or you might say wildly used – as discussed in section 16.1. (This is markedly different from the situation with, say, entropy, where there is really only one idea, even if this one idea has multiple corollaries and applications.) There is no consensus as to “the” definition of heat, and no prospect of achieving consensus anytime soon. There is no need to achieve consensus about “heat”, because we already have consensus about entropy and energy, and that suffices quite nicely. Asking students to recite “the” definition of heat is worse than useless; it rewards rote regurgitation and punishes actual understanding of the subject.

  9. Our thermodynamics is not restricted to the study of ideal gases. Real thermodynamics has a vastly wider range of applicability, as discussed in section 21.
  10. Even in situations where the notion of “thermal energy” is well defined, we do not pretend that all thermal energy is kinetic; we recognize that random potential energy is important also. See section 8.4.3.

*   Contents

1  Energy

1.1  Preliminary Remarks

It is more important to understand energy than to define energy. We can and will define it (section 1.2), but the definition is not super-simple nor super-concise. The concept of energy is so fundamental that there is no point in looking for a concise definition in terms of anything more fundamental.

To say the same thing in slightly different words, we can achieve more understanding by focussing on what energy does, rather than worrying too much about what energy is.

Energy is as energy does.
     

The most important attributes of energy its status as a conserved quantity and its connection to the laws of motion, as discussed in section 1.4.

1.2  Definition of Energy

Let’s start with some examples. Some well-understood examples of energy include the following:

gravitational energy:   Eg =m g h
kinetic energy:   EK =½ m v2
Hookean spring energy:   Esp =½ k x2
capacitive energy:   EC =½ C V2
inductive energy:   EL =½ L I2
             (1)

In particular, if you need a starting-point for your understanding of energy, visualize a book on a high shelf. It has more energy than it would on a low shelf. Similarly a fast-moving book has more energy than it would at a lower speed.

The idea of conservation per se is well defined, as discussed in detail in reference 4. We use this as the second step in a recursive definition of energy. That is:

  1. Energy includes each of the examples itemized at the beginning of this section.
  2. Energy also includes anything that can be converted to or from previously-known types of energy in accordance with the law of conservation of energy.

This concludes our definition of energy.

1.3  More Remarks

The definition of energy (section 1.2) is recursive. That means we can pull our understanding of energy up by the bootstraps. We can identify new forms of energy as they come along, because they contribute to the conservation law in the same way as the already-known examples. This is the same basic idea as in reference 5.

Recursive is not the same as circular. A circular argument would be fallacious and useless ... but there are many examples of correct, well-accepted definitions that are recursive. Recursion is very commonly used in mathematics and computer science. For example, it is correct and convenient to define the factorial function so that

factorial(0) := 1      and 
factorial(N) := N factorial(N−1)   for all integers N>0
             (2)

As a more sophisticated example, have you ever wondered how mathematicians define the concept of integers? One very common approach is to define the positive integers via the Peano axioms. The details aren’t important, but the interesting point is that these axioms provide a recursive definition … not circular, just recursive. This is a precise, rigorous, formal definition.

This allows us to make another point: There are a lot of people who are able to count, even though they are not able to provide a concise definition of “integer” – and certainly not able to provide a non-recursive definition. By the same token, there are lots of people who have a rock-solid understanding of how energy behaves, even though they are not able to give a consise and/or non-recursive definition of “energy”.

Energy is somewhat abstract. There is no getting around that. You just have to get used to it – by accumulating experience, seeing how energy behaves in various situations. As abstractions go, energy is one of the easiest to understand, because it is so precise and well-behaved.

Tangential remark: The introductory examples of energy itemized in section 1.2 are only approximate, and are subject to various limitations. For example, the formula m g h is exceedingly accurate over laboratory lengthscales, but is not valid over cosmological lengthscales. Similarly the formula ½ m v2 is exceedingly accurate when speeds are small compared to the speed of light, but not otherwise. These limitations do not interfere with our efforts to understand energy.

In non-relativistic physics, energy is a scalar. That means it is not associated with any direction in space. However, in special relativity, energy is not a Lorentz scalar; instead it is recognized as one component of the [energy, momentum] 4-vector, such that energy is associated with the timelike direction. For more on this, see reference 6. To say the same thing in other words, the energy is invariant with respect to spacelike rotations, but not invariant with respect to boosts.

We will denote the energy by E. We will denote various sub-categories of energy by putting subscripts on the E, unless the context makes subscripts unnecessary. Sometimes it is convenient to use U instead of E to denote energy, especially in situations where we want to use E to denote the electric field. Some thermodynamics books state the first law in terms of U, but it means the same thing as E. We will use E throughout this document.

Beware of attaching qualifiers to the concept of energy. Note the following contrast:

The symbol E denotes “the” energy of the system we are considering. If you feel obliged to attach some sort of additional words, you can call E the “system” energy or the “plain old” energy. This doesn’t change the meaning.   Most other qualifiers change the meaning. There is an important conceptual point here: “The” energy is conserved, but the various sub-categories of energy are not separately conserved. For example: The “internal” energy is not necessarily conserved, as discussed in section 14.1. Similarly, the “available” energy is not necessarily conserved, as discussed in section 1.5.

Associated with the foregoing conceptual point there is point of terminology: E does not denote “internal” energy. It does not denote “available” energy.

Note: If you want to calculate the total energy of the system by summing the various categories of energy, beware that the categories overlap, so you need to be super-careful not to double count any of the contributions. For example, if you have a macroscopic notion of “thermal energy” and also understand “thermal energy” in terms of microscopic kinetic and potential energy, you must count either the macroscopic or microscopic description, not both. Another example that illustrates the same point concerns the rest energy, E0, which is related to mass via Einstein’s equation2 E0=mc2. You can describe the binding energy of a particle in terms of its internal kinetic energy and potential energy, or in terms of the mass deficit, but you must not add both descriptions together; that would be double-counting.

1.4  Conservation of Energy

The first law of thermodynamics states that energy obeys a local conservation law.

By this we mean something very specific:

Any decrease in the amount of energy in a given region of space must be exactly balanced by a simultaneous increase in the amount of energy in an adjacent region of space.

Note the adjectives “simultaneous” and “adjacent”. The laws of physics do not permit energy to disappear now and reappear later. Similarly the laws do not permit energy to disappear from here and reappear at some distant place. Energy is conserved right here, right now.

It is usually possible3 to observe and measure the physical processes whereby energy is transported from one region to the next. This allows us to express the energy-conservation law as an equation:

change(energy inside boundary) =  − flow(energy, outward across boundary)              (3)

The word “flow” in this expression has a very precise technical meaning, closely corresponding to one of the meanings it has in everyday life. See reference 4 for the details on this.

There is also a global law of conservation of energy: The total energy in the universe cannot change. The local law implies the global law but not conversely. The global law is interesting, but not nearly as useful as the local law, for the following reason: suppose I were to observe that some energy has vanished from my laboratory. It would do me no good to have a global law that asserts that a corresponding amount of energy has appeared “somewhere” else in the universe. There is no way of checking that assertion, so I would not know and not care whether energy was being globally conserved.4 Also there is would be very hard to reconcile a non-local law with the requirements of special relativity.

As discussed in reference 4, there is an important distinction between the notion of conservation and the notion of constancy. Local conservation of energy says that the energy in a region is constant except insofar as energy flows across the boundary.

1.5  Energy versus “Capacity to do Work” or “Available Energy”

Non-experts sometimes try to define energy as “the capacity to do work”. This notion of “available energy” is useful for some purposes, as discussed in section 1.6, but it would be a terrible mistake to confuse “available energy” with the real physical energy. Alas, this mistake is very common. See section 13.5 for additional discussion of this point.

Any attempt to define energy in terms of “capacity to do work” would be inconsistent with thermodynamics, as we see from the following examples:

#1: Consider an isolated system containing a hot potato, a cold potato, a tiny heat engine, and nothing else, as illustrated in figure 3. This system has some energy and some ability to do work.   #2: Contrast that with a system that is just the same, except that it has two hot potatoes (and no cold potato).

The second system has more energy but less ability to do work.

potato-engine
Figure 3: Two Potatoes + Heat Engine

This sheds an interesting side-light on the energy-conservation law. As with most laws of physics, this law, by itself, does not tell you what will happen; it only tells you what cannot happen: you cannot have any process that fails to conserve energy. To say the same thing another way: if something is prohibited by the energy-conservation law, the prohibition is absolute, whereas if something is permitted by the energy-conservation law, the permission is conditional, conditioned on compliance with all the other laws of physics. In particular, as discussed in section 8.2, you can freely convert all the “non-thermal” energy of two rapidly-spinning flywheels to microscopic “thermal” energy, but not the reverse. The reverse would be perfectly consistent with energy conservation, but is forbidden on other grounds (namely the second law of thermodynamics, as discussed in section 2).

Let’s be clear: work can be converted to any other form of energy, but the converse is not true; not every form of energy can be used to do work.

Equating energy with doable work is just not correct. (In constrast, it might be OK to connect energy with some previously-done work, as opposed to doable work. That is not always convenient or helpful, but at least it doesn’t contradict the second law of thermodynamics.)

Some people wonder whether the example given above (the two-potato engine) is invalid because it involves closed systems, not interacting with the surrounding environment. Well, the example is perfectly valid, but to clarify the point we can consider another example (due to Logan McCarty):

#1: Consider a system consisting of a room-temperature potato, a cold potato, and a tiny heat engine. This system has some energy and some ability to do work.   #2: Contrast that with a system that is just the same, but except that it has two room-temperature potatoes.

The second system has more energy but less ability to do work in the ordinary room-temperature environment.

In some impractical theoretical sense, you might be able to define the energy of a system as the amount of work the system would be able to do if it were in contact with an unlimited heat-sink at low temperature (arbitrarily close to absolute zero). That’s quite impractical because no such heat-sink is available. If it were available, many of the basic ideas of thermodynamics would become irrelevant.

As yet another example, consider the sytem shown in figure 4. The boundary of the overall “system” is shown as a heavy black line. The system is thermally insulated from its surroundings. The system contains a battery (outlined with a red dashed line) a motor, and a switch. Internal to the battery is a small series resistance R1 and a large shunt resistance R2. The motor drives a thermally-insulated shaft, so that the system can do mechanical work on its surroundings.

By closing the switch, we can get the system to perform work on its surroundings by means of the shaft.   On the other hand, if we just wait a moderately long time, the leakage resistor R2 will discharge the battery. This does not change the system’s energy (i.e. the energy within the boundary of the system) … but it greatly decreases the capacity to do work.

This can be seen as analogous to the NMR τ2 process. An analogous mechanical system is discussed in section 10.5.4. All these examples share a common feature, namely a change in entropy with no change in energy.

motor-work
Figure 4: Capacity to do Work

To remove any vestige of ambiguity, imagine that the system was initially far below ambient temperature, so that the Joule heating in the resistor brings the system closer to ambient temperature. See reference 7 for Joule’s classic paper on electrical heating.

To repeat: In real-world situations, energy is not the same as “available energy” i.e. the capacity to do work.

What’s worse, any measure of “available” energy is not a function of state. Consider again the two-potato system shown in figure 3. Suppose you know the state of the left-side potato, including its energy E1, its temperature T1, its entropy S1, its mass m1, its volume V1, its free energy F1, and its free enthalpy G1. That all makes sense so far, because those are all functions of state, determined by the state of that potato by itself. Alas you don’t know what fraction of that potato’s energy should be considered thermodynamically “available” energy, and you can’t figure it out using only the properties of that potato. In order to figure it out, you would need to know the properties of the other potato as well.

Energy is a function of state.
“Available energy” is not.
     

Every beginnner wishes for a state function that specifies the “available energy” content of a system. Alas, wishing does not make it so. No such state function can possibly exist.

Also keep in mind that the law of conservation of energy applies to the real energy, not to the “available” energy.

Energy obeys a strict local conservation law.
“Available energy” does not.
     

Beware that the misdefinition of energy in terms of “ability to do work” is extremely common. This misdefinition is all the more pernicious because it works OK in simple non-thermodynamical situations. Many people learn this misdefinition, and some of them have a hard time unlearning it.

1.6  Conflict with the Vernacular

There is only one scientific meaning for the term energy. For all practical purposes, there is complete agreement among physicists as to what energy is. (This stands in dramatic contrast to other terms – such as “heat” – that have a confusing multiplicity of technical meanings, on top of innumerable nontechnical meanings; see section 16.1 for more discussion of this point.)

The same goes for the term conservation. There is essentially only one technical meaning of conservation.

However, we run into trouble when we consider the vernacular meanings:

Therefore the simple phrase “energy conservation” is practically begging to be misunderstood. You can easily have two profound misconceptions in a simple two-word phrase.

For example, you may have seen a placard that says “Please Conserve Energy by turning off the lights when you leave” or something similar. Let’s be absolutely clear: the placard is using vernacular notions of “conservation” and “energy” that are grossly inconsistent with the technical notion of conservation of energy (as expressed by equation 3).

The vernacular notion of “energy” is only loosely defined. Often it seems to correspond, more-or-less, either to the Gibbs free enthalpy, G (as defined in section 13.4), or to some notion of “available energy” (as discussed in section 1.5 and section 13.5), or perhaps to some other notion of low-entropy energy.

The vernacular notion of “conservation” means saving, preserving, not wasting, not dissipating. It definitely is not equivalent to equation 3, because it is applied to G, and to wildlife, and to other things that are not, in the technical sense, conserved quantities.

Combining these two notions, we see that when the placard says “Please Conserve Energy” it is nontrivial to translate that into technical terms.

At some schools, the students have found it amusing to add appropriate “translations” or “corrections” to such placards. The possibilities include:

  1. “Please Do Not Dissipate the Gibbs Potential” or, equivalently, “Please Do Note Waste Free Enthalpy”.
  2. “Please Do Not Thermalize the Energy” or “Please Do Not Waste the Thermodynamically-Available Energy”.
  3. “Please Do Not Create Entropy Unnecessarily”.

The third version is far and away the most precise, and the most amenable to a quantitative interpretation. We see that the placard wasn’t really talking about energy at all, but about entropy instead.

1.7  Range of Validity

The law of conservation of energy has been tested and found 100% reliable for all practical purposes, and quite a broad range of impractical purposes besides.

Of course everything has limits. It is not necessary for you to have a very precise notion of the limits of validity of the law of conservation of energy; that is a topic of interest only to a small community of specialists. The purpose of this section is merely to indicate, in general terms, just how remote the limits are from everyday life.

If you aren’t interested in details, feel free to skip this section.

Here’s the situation:

2  Entropy

2.1  Paraconservation

The second law states that entropy obeys a local paraconservation law. That is, entropy is “nearly” conserved.

By that we mean something very specific:

change(entropy inside boundary) ≥ − flow(entropy, outward across boundary)              (4)

 

The structure and meaning of equation 4 is very similar to equation 3, except that it has an inequality instead of an equality. It tells us that the entropy in a given region can increase, but it cannot decrease except by flowing into adjacent regions.

As usual, the local law implies a corresponding global law, but not conversely; see the discussion at the end of section 1.2.

Entropy is absolutely essential to thermodynamics … just as essential as energy.

You can’t do thermodynamics without entropy.
     

Entropy is defined in terms of statistics, as we will discuss in a moment. In some situations, there are important connections between entropy, energy, and temperature … but these do not define entropy. The first law (energy) and the second law (entropy) are logically independent.

If the second law is to mean anything at all, entropy must be well-defined always. Otherwise we could create loopholes in the second law by passing through states where entropy was not defined.

We do not define entropy via dS = dQ/T or anything like that, first of all because (as discussed insection 7.1) there is no state-function Q such that dQ = TdS, and more importantly because we need entropy to be well defined even when the temperature is unknown, undefinable,5 irrelevant, or zero.

Entropy is related to information. Essentially it is the opposite of information, as we see from the following scenarios.

2.2  Scenario: Cup Game

As shown in figure 5, suppose we have three blocks and five cups on a table.

cup-game
Figure 5: The Cup Game

To illustrate the idea of entropy, let’s play the following game: Phase 0 is the preliminary phase of the game. During phase 0, the dealer hides the blocks under the cups however he likes (randomly or otherwise) and optionally makes an announcement about what he has done. As suggested in the figure, the cups are transparent, so the dealer knows the exact microstate at all times. However, the whole array is behind a screen, so the rest of us don’t know anything except what we’re told.

Phase 1 is the main phase of the game. During phase 1, we are required to ascertain the position of each of the blocks. Since in this version of the game, there are five cups and three blocks, the answer can be written as a three-symbol string, such as 122, where the first symbol identifies the cup containing the red block, the second symbol identifies the cup containing the black block, and the third symbol identifies the cup containing the blue block. Each symbol is in the range zero through four inclusive, so we can think of such strings as base-5 numerals, three digits long. There are 53 = 125 such numerals. (More generally, in a version where there are N cups and B blocks, there are NB possible microstates.)

We cannot see what’s inside the cups, but we are allowed to ask yes/no questions, whereupon the dealer will answer. Our score in the game is determined by the number of questions we ask; each question contributes one bit to our score. Our objective is to finish the game with the lowest possible score.

  1. Example: During phase 0, the dealer announces that all three blocks are under cup #4. Our score is zero; we don’t have to ask any questions.
  2. Example: During phase 0, the dealer places all the blocks randomly and doesn’t announce anything. If we are smart, our score S is at worst 7 bits (and usually exactly 7 bits). That’s because when S=7 we have 2S = 27 = 128, which is slightly larger than the number of possible states. In the expression 2S, the base is 2 because we are asking questions with 2 possible answers. Our minimax strategy is simple: we write down all the states in order, from 000 through 444 (base 5) inclusive, and ask questions of the following form: Is the actual state in the first half of the list? Is it in the first or third quarter? Is it in an odd-numbered eighth? After at most seven questions, we know exactly where the correct answer sits in the list.
  3. Example: During phase 0, the dealer hides the blocks at random, then makes an announcement that provides partial information, namely that cup #4 happens to be empty. Then (if we follow a sensible minimax strategy) our score will be six bits, since 26 = 64 = 43.

Remark on terminology: Any microstates that have zero probability are classified as inaccessible, while those that have nonzero probability are classified as accessible.

These examples have certain restrictions in common:   More generally:

For starters, we have been asking yes/no questions.   Binary questions are not universally required; by way of contrast you can consider the three-way measurements in reference 9.

Also, so far we have only considered scenarios where all accessible microstates are equally probable.   If the accessible microstates are not equally probable, we need a more sophisticated notion of entropy, as discussed in section 2.6.

Subject to these restrictions, if we want to be sure of identifying the correct microstate, we should plan on asking a sufficient number of questions S such that 2S is greater than or equal to the number of accessible microstates.  

To calculate what our score will be, we don’t need to know anything about energy; all we have to do is count states (specifically, the number of microstates consistent with what we know about the situation). States are states; they are not energy states.

If you wish to make this sound more thermodynamical, you can assume that the table is horizontal, and the blocks are non-interacting, so that all possible configurations have the same energy. But really, it is easier to just say that over a wide range of energies, energy has got nothing to do with this game.

The point of all this is that we define the entropy of a given situation according to the number of questions we have to ask to finish the game, starting from the given situation. Each yes/no question contributes one bit to the entropy.

The central, crucial idea of entropy is that it measures how much we don’t know about the situation. Entropy is not knowing.

2.3  Scenario: Card Game

Here is a card game that illustrates the same points as the cup game. The only important difference is the size of the state space: roughly eighty million million million million million million million million million million million states, rather than 125 states. That is, when we move from 5 cups to 52 cards, the state space gets bigger by a factor of 1066 or so.

Consider a deck of 52 playing cards. By re-ordering the deck, it is possible to create a large number (52 factorial) of different configurations. (For present purposes we choose not to flip or rotate the cards, just re-order them. Also, unless otherwise stated, we assume the number of cards is fixed at 52 ... although the same principles apply to smaller or larger decks, and sometimes in an introductory situation it is easier to see what is going on if you work with only 8 or 10 cards.)

Phase 0 is the preliminary phase of the game. During phase 0, the dealer prepares the deck in a configuration of his choosing, using any combination of deterministic and/or random procedures. He then sets the deck on the table. Finally he makes zero or more announcements about the configuration of the deck.

Phase 1 is the main phase of the game. During phase 1, our task is to fully describe the configuration, i.e. to determine which card is on top, which card is second, et cetera. We cannot look at the cards, but we can ask yes/no questions of the dealer. Each such question contributes one bit to our score. Our objective is to ask as few questions as possible. As we shall see, our score is a measure of the entropy.

One configuration of the card deck corresponds to one microstate. The microstate does not change during phase 1.

The macrostate is the ensemble of microstates consistent with what we know about the situation.

  1. Example: The dealer puts the deck in some agreed-upon reference configuration, and announces that fact. Then we don’t need to do anything, and our score is zero. A perfect score.
  2. Example: The dealer puts the deck in the reverse of the reference configuration, and announces that fact. We can easily tell which card is where. We don’t need to ask any questions, so our score is again zero.
  3. Example: The dealer starts with the reference configuration, then “cuts” the deck; that is, he chooses at random one of the 52 possible full-length cyclic permutations, and applies that permutation to the cards. He announces what procedure he has followed, but nothing more.

    At this point we know that the deck is in some microstate, and the microstate is not changing … but we don’t know which microstate. It would be foolish to pretend we know something we don’t. If we’re going to bet on what happens next, we should calculate our odds based on the ensemble of possibilities, i.e. based on the macrostate.

    Our best strategy is as follows: By asking six well-chosen questions, we can find out which card is on top. We can then easily describe every detail of the configuration. Our score is six bits.

  4. Example: The dealer starts with the standard configuration, cuts it, and then cuts it again. The second cut changes the microstate, but does not change the macrostate. Cutting the deck is, so far as the macrostate is concerned, idempotent; that is, N cuts are the same as one. It still takes us six questions to figure out the full configuration.

    This illustrates that the entropy is a property of the ensemble, i.e. a property of the macrostate, not a property of the microstate. Cutting the deck the second time changed the microstate but did not change the macrostate.

  5. Example: Same as above, but in addition to announcing the procedure the dealer also announces what card is on top. Our score is zero.
  6. Example: The dealer shuffles the deck thoroughly. He announces that, and only that. The deck could be in any of the 52 factorial different configurations. If we follow a sensible (minimax) strategy, our score will be 226 bits, since the base-2 logarithm of 52 factorial is approximately 225.581. Since we can’t ask fractional questions, we round up to 226.
  7. Example: The dealer announces that it is equally likely that he has either shuffled the deck completely or left it in the reference configuration. Then our score on average is only 114 bits, if we use the following strategy: we start by asking whether the deck is already in the reference configuration. That costs us one question, but half of the time it’s the only question we’ll need. The other half of the time, we’ll need 226 more questions to unshuffle the shuffled deck. The average of 1 and 227 is 114.

Note that we are not depending on any special properties of the “reference” state. For simplicity, we could agree that our reference state is the factory-standard state (cards ordered according to suit and number), but any other agreed-upon state would work just as well. If we know deck is in Moe’s favorite state, we can easily rearrange it into Joe’s favorite state. Rearranging it from one known state to to another known state does not involve any entropy.

2.4  Peeking

As a variation on the game described in section 2.3, consider what happens if, at the beginning of phase 1, we are allowed to peek at one of the cards.

In the case of the standard deck, example 1, this doesn’t tell us anything we didn’t already know, so the entropy remains unchanged.

In the case of the cut deck, example 3, this lowers our score by six bits, from six to zero.

In the case of the shuffled deck, example 6, this lowers our score by six bits, from 226 to 220.

The reason this is worth mentioning is because peeking can (and usually does) change the macrostate, but it cannot change the microstate. (This stands in contrast to cutting an already-cut deck or shuffling an already-shuffled deck, which changes the microstate but does not change the macrostate.)

To repeat: Obviously peeking does not change the microstate, but it can have a large effect on the macrostate. If you don’t think peeking changes the ensemble, I look forward to playing poker with you!

2.5  Discussion

2.5.1  States and Probabilities

If you want to understand entropy, you must first have at least a modest understanding of basic probability. It’s a prerequisite, and there’s no way of getting around it. Anyone who knows about probability can learn about entropy. Anyone who doesn’t, can’t.

Our notion of entropy is completely dependent on having a notion of microstate, and on having a procedure for assigning a probability to each microstate.

In some special cases, the procedure involves little more than counting the “allowed” microstates, as discussed in section 8.7. This type of counting corresponds to a particularly simple, flat probability distribution, which may be a satisfactory approximation in special cases, but is definitely not adequate for the general case.

For simplicity, the cup game and the card game were arranged to embody a clear notion of microstate. That is, the rules of the game specified what situations would be considered the “same” microstate and what would be considered “different” microstates. Such games are a model that is directly and precisely applicable to physical systems where the physics is naturally discrete, such as systems involving only the nonclassical spin of elementary particles (such as the demagnetization refrigerator discussed in section 10.10).

For systems involving continuous variables such as position and momentum, counting the states is somewhat trickier. The correct procedure is discussed in section 11.2.

2.5.2  Entropy is Not Knowing

The point of all this is that the “score” in these games is an example of entropy. Specifically: at each point in the game, there are two numbers worth keeping track of: the number of questions we have already asked, and the number of questions we must ask to finish the game. The latter is what we call the the entropy of the situation at that point.

Entropy is not knowing.
Entropy measures how much is not known about the situation.
     

Remember that the macrostate is the ensemble of configurations consistent with what is known about the situation. The entropy is a property of the macrostate.

At each point during the game, the entropy is a property of the macrostate, not of the microstate. The system is in “some” microstate, but we don’t know which microstate, so all our decisions must be based on the macrostate.

The value we assign to the entropy depends on what we know about the situation, not what the dealer knows, or what anybody else knows. This makes the entropy somewhat context-dependent or even subjective. Some people find this irksome or even shocking, but it is real physics. For physical examples of context-dependent entropy, and a discussion, see section 11.7.

2.5.3  Entropy versus Energy

Note that entropy has been defined without reference to temperature and without reference to heat. Room temperature is equivalent to zero temperature for purposes of the cup game and the card game; theoretically there is “some” chance that thermal agitation will cause two of the cards to spontaneously hop up and exchange places during the game, but that is really, really negligible.

Non-experts often try to define entropy in terms of energy. This is a mistake. To calculate the entropy, I don’t need to know anything about energy; all I need to know is the probability of each relevant state. See section 2.6 for details on this.

States are states;
they are not energy states.
     

Entropy is not defined in terms of energy, nor vice versa.

In some cases, there is a simple mapping that allows us to identify the ith microstate by means of its energy Êi. It is often convenient to exploit this mapping when it exists, but it does not always exist.

2.5.4  Entropy versus Disorder

In pop culture, entropy is often associated with disorder. There are even some textbooks that try to explain entropy in terms of disorder. This is not a good idea. It is all the more disruptive because it is in some sense half true, which means it might pass superficial scrutiny. However, science is not based on half-truths.

Small disorder generally implies small entropy. However, the converse does not hold, not even approximately; A highly-disordered system might or might not have high entropy. The spin echo experiment (section 10.7) suffices as an example of a highly disordered macrostate with relatively low entropy.

Before we go any farther, we should emphasize that entropy is a property of the macrostate, not of the microstate. In contrast, to the extent that “disorder” can be measured at all, it can be measured on a microstate-by-microstate basis. Therefore, whatever the “disorder” is measuring, it isn’t entropy. (A similar microstate versus macrostate argument applies to the “energy dispersal” model of entropy, as discussed in section 8.9.) As a consequence, the usual textbook illustration – contrasting snapshots of orderly and disorderly scenes – cannot be directly interpreted in terms of entropy. To get any value out of such an illustration, the reader must make a sophisticated leap:

The disorderly snapshot must be interpreted as representative of an ensemble with a very great number of similarly-disorderly microstates. The ensemble of disorderly microstates has high entropy. This is a property of the ensemble, not of the depicted microstate or any other microstate.   The orderly snapshot must be interpreted as representative of a very small ensemble, namely the ensemble of similarly-orderly microstates. This small ensemble has a small entropy. Again, entropy is a property of the ensemble, not of any particular microstate (except in the extreme case where there is only one microstate in the ensemble, and therefore zero entropy).

To repeat: Entropy is defined as a weighted average over all microstates. Asking about the entropy of a particular microstate (disordered or otherwise) is asking the wrong question. As a matter of principle, the question is unanswerable. (See section 2.7 for a discussion of surprise value, which is a property of the microstate.)

The number of orderly microstates is very small compared to the number of disorderly microstates. That’s because when you say the system is “ordered” you are placing constraints on it. Therefore if you know that the system is in one of those orderly microstates, you know the entropy cannot be very large.

The converse does not hold. If you know that the system is in some disorderly microstate, you do not know that the entropy is large. Indeed, if you know that the system is in some particular disorderly microstate, the entropy is zero. (This is a corollary of the more general proposition that if you know what microstate the system is in, the entropy is zero. it doesn’t matter whether that state “looks” disorderly or not.)

Furthermore, there are additional reasons why the typical text-book illustration of a messy dorm room is not a good model of entropy. For starters, it provides no easy way to define and delimit the states. Even if we stipulate that the tidy state is unique, we still don’t know whether a shirt on the floor “here” is different from a shirt on the floor “there”. Since we don’t know how many different disorderly states there are, we can’t quantify the entropy. (In contrast the games in section 2.2 and section 2.3 included a clear rule for defining and delimiting the states.)

Examples of high disorder and low entropy include, in order of increasing complexity:

  1. Perhaps the simplest example is five coins in a closed shoebox. Randomize the coins by shaking. The entropy at this point is five bits. If you open the box and peek at the coins, the entropy goes to zero. This makes it clear that entropy is a property of the ensemble, not a property of the microstate. Peeking does not change the disorder. Peeking does not change the microstate. However, it can (and usually does) change the entropy. This example has the pedagogical advantage that it is small enough that the entire microstate-space can be explicitly displayed; there are only 32 = 25 microstates.
  2. Ordinarily, a well-shuffled deck of cards contains 225.581 bits of entropy, as discussed in section 2.3. On the other hand, if you have peeked at all the cards after they were shuffled, the entropy is now zero, as discussed in section 2.4. Again, this makes it clear that entropy is a property of the ensemble, not a property of the microstate. Peeking does not change the disorder. Peeking does not change the microstate. However, it can (and usually does) change the entropy.

    Many tricks of the card-sharp and the “magic show” illusionist depend on a deck of cards arranged to have much disorder but little entropy.

  3. In cryptography, suppose we have a brand-new one time pad containing a million random hex digits. From our adversary’s point of view, this embodies 4,000,000 bits of entropy. If, however, the adversary manages to make a copy of our one time pad, then the entropy of our pad, from his point of view, goes to zero. All of the complexity is still there, all of the disorder is still there, but the entropy is gone.
  4. The spin echo experiment involves a highly complicated state that has low entropy. See section 10.7. This is a powerful example, because it involve a macroscopic amount of entropy (on the order of 1 joule per kelvin, i.e. on the order of a mole of bits, not just a few bits or a few hundred bits).

2.5.5  False Dichotomy

There is a long-running holy war between those who try to define entropy in terms of energy, and those who try to define it in terms of disorder. This is based on a grotesquely false dichotomy: If entropy-as-energy is imperfect, then entropy-as-disorder “must” be perfect … or vice versa. I don’t know whether to laugh or cry when I see this. Actually, both versions are highly imperfect. You might get away with using one or the other in selected situations, but not in general.

The right way to define entropy is in terms of probability, we now discuss. (The various other notions can then be understood as special cases and/or approximations to the true entropy.)

2.6  Quantifying Entropy

The idea of entropy set forth in the preceding examples can be quantified quite precisely. Entropy is defined in terms of statistics.6 For any classical probability distribution P, we can define its entropy as:

S[P] := 
 
i
 Pi log(1/Pi)              (5)

where the sum runs over all possible outcomes and Pi is the probability of the ith outcome. Here we write S[P] to make it explicit that S is a functional that depends on P. For example, if P is a conditional probability then S will be a conditional entropy. Beware that people commonly write simply S, leaving unstated the crucial dependence on P.

Subject to mild restrictions, we can apply this to physics as follows: Suppose the system is in a given macrostate, and the macrostate is well described by a distribution P, where Pi is the probability that the system is in the ith microstate. Then we can say S is the entropy “of the system”.

Expressions of this form date back to Boltzmann (reference 10 and reference 11) and to Gibbs (reference 12). The range of applicability was greatly expanded by Shannon (reference 13).

Beware that uncritical reliance on “the” observed microstate-by-microstate probabilities does not always give a full description of the macrostate, because the Pi might be correlated with something else (section 10.7) or amongst themselves (section 25). In such cases the unconditional entropy will be larger than the conditional entropy, and you have to decide which is/are physically relevant.

Equation 5 is the faithful workhorse formula for calculating the entropy. It ranks slightly below Equation 142, which is a more general way of expressing the same idea. It ranks above various less-general formulas that may be useful under more-restrictive conditions (as in section 8.7 for example). See section 21 and section 25 for more discussion of the relevance and range of validity of this expression.

In the games discussed above, it was convenient to measure entropy in bits, because I was asking yes/no questions. Other units are possible, as discussed in section 8.6.

Figure 6 shows the contribution to the entropy from one term in the sum in equation 5. Its maximum value is approximately 0.53 bits, attained when Pi=1/e.

plogp
Figure 6: - Pi log Pi – One Term in the Sum

Figure 7 shows the total entropy for a two-state system such as a coin. Here H represents the probability of the the “heads” state, which gives us one term in the sum. The “tails” state necessarily has probability (1−H) and that gives us the other term in the sum. The total entropy in this case is a symmetric function of H. Its maximum value is 1 bit, attained when H=½.

plogp2
Figure 7: Total Entropy – Two-State System

As discussed in section 8.6 the base of the logarithm in equation 5 is chosen according to what units you wish to use for measuring entropy. If you choose units of joules per kelvin (J/K), we can pull out a factor of Boltzmann’s constant and rewrite the equation as:

S = −k 
 
i
 Pi lnPi              (6)

Entropy itself is conventionally represented by big S and is an extensive property, with rare peculiar exceptions as discussed in section 11.7. Molar entropy is conventionally represented by small s and is the corresponding intensive property.

Although it is often convenient to measure molar entropy in units of J/K/mol, other units are allowed, for the same reason that mileage is called mileage even when it is measured in metric units. In particular, sometimes additional insight is gained by measuring molar entropy in units of bits per particle. See section 8.6 for more discussion of units.

When discussing a chemical reaction using a formula such as

2 O3 → 3 O2 + Δs              (7)

it is common to speak of “the entropy of the reaction” but properly it is “the molar entropy of the reaction” and should be written Δs or ΔS/N (not ΔS). All the other terms in the formula are intensive, so the entropy-related term must be intensive also.

Of particular interest is the standard molar entropy, s0 or S0/N, measured at standard temperature and pressure. The entropy of a gas is strongly dependent on density, as mentioned in section 11.2.

2.7  Surprise Value

If we have a system characterized by a probability distribution P, the surprise value of the ith state is given by

$i := log(1/Pi)              (8)

By comparing this with equation 5, it is easy to see that the entropy is simply the appropriately-weighted average of the surprise value. In particular, it is the expected value of the surprise value. (See equation 143 for the fully quantum-mechanical generalization of this idea.)

Note the following contrast:

Surprise value is a property of the state i.   Entropy is not a property of the state i; it is a property of the distribution P.

This should make it obvious that entropy is not, by itself, the solution to all the world’s problems. Entropy measures a particular average property of the distribution. It is easy to find situations where other properties of the distribution are worth knowing.

2.8  Entropy of Independent Subsystems

Suppose we have subsystem 1 with a set of microstates {(i)} and subsystem 2 with a set of microstates {(j)}. Then in all generality, the microstates of the combined system are given by the Cartesian direct product of these two sets, namely

{(i)}×{(j)} = {(i,j)}              (9)

where (i,j) is an ordered pair, which should be a familiar idea and a familiar notation.

We now consider the less-than-general case where the two subsystems are statistically independent. That means that the probabilities are multiplicative:

R(i,j) = P(iQ(j)              (10)

Let’s evaluate the entropy of the combined system:

S[R] = 
 
i,j
 R(i,j) log[R(i,j)]         
  = 
 
i,j
 P(iQ(j) log[P(iQ(j)]         
  = 
 
i,j
 P(iQ(j) log[P(i)]          −
 
i,j
 P(iQ(j) log[Q(j)]         
  = 
 
j
 Q(j
 
i
 P(i) log[P(i)]          −
 
i
 P(i
 
j
 Q(j) log[Q(j)]         
  = S[P] + S[Q]            
             (11)

where we have used the fact that the subsystem probabilities are normalized.

So we see that the entropy is additive whenever the probabilities are multiplicative, i.e. whenever the probabilities are independent.

3  Basic Concepts (Zeroth Law)

There are a bunch of basic notions that are often lumped together and called the zeroth law of thermodynamics. These notions are incomparably less fundamental than the notion of energy (the first law) and entropy (the second law), so despite its name, the zeroth law doesn’t deserve priority.

Here are some oft-cited rules, and some comments on each.

We can divide the world into some number of regions that are disjoint from each other.   If there are only two regions, some people like to call one of them “the” system and call the other “the” environment, but usually it is better to consider all regions on an equal footing. Regions are sometimes called systems and/or subsystems. Systems are sometimes called objects, especially when they are relatively simple.

There is such a thing as thermal equilibrium.   You must not assume that everything is in thermal equilibrium. Thermodynamics and indeed life itself depend on the fact that some regions are out of equilibrium with other regions.

There is such a thing as temperature.   There are innumerable important examples of systems that lack a well-defined temperature, such as the three-state laser discussed in section 10.4.

Whenever any two systems are in equilibrium with each other, they have the same temperature. See section 9.1.   This is true and important. (To be precise, we should say they have the same average temperature, since there will be fluctuations, which may be significant for very small systems.)

We can establish equilibrium within a system, and equilibrium between selected pairs of systems, without establishing equilibrium between all systems.   This is an entirely nontrivial statement. Sometimes it takes a good bit of engineering to keep some pairs near equilibrium and other pairs far from equilibrium. See section 10.11.

If/when we have established equilibrium within a system, a few variables suffice to entirely describe the thermodynamic state (i.e. macrostate) of the system.7 (See section 11.1 for a discussion of microstate versus macrostate.)   This is an entirely nontrivial statement, and to make it useful you have to be cagey about what variables you choose; for instance,

  • Knowing the temperature and pressure of a parcel of ice gives you more-or-less a complete description of the thermodynamic state of the ice.
  • Knowing the temperature and pressure of a parcel of liquid water gives you more-or-less a complete description of the thermodynamic state of the water.
  • Meanwhile, in contrast, knowing the temperature and pressure of an ice/water mixture does not fully determine the thermodynamic state, because you don’t know what fraction is ice and what fraction is water.

4  Low-Temperature Entropy (Alleged Third Law)

As mentioned in the introduction, one sometimes hears the assertion that the entropy of a system must go to zero as the temperature goes to zero.

There is no theoretical basis for this assertion, so far as I know – just unsubstantiated opinion.

As for experimental evidence, I know of only one case where (if I work hard enough) I can make this statement true, while there are innumerable cases where it is not true:

Note: It is hard to measure the low-temperature entropy by means of elementary thermal measurements, because typically such measurements are insensitive to “spectator entropy” as discussed in section 11.5. So for typical classical thermodynamic purposes, it doesn’t matter whether the entropy goes to zero or not.

5  The Rest of Physics, Chemistry, etc.

The previous sections have set forth the conventional laws of thermodynamics, cleaned up and modernized as much as possible.

At this point you may be asking, why do these laws call attention to conservation of energy, but not the other great conservation laws (momentum, electrical charge, lepton number, et cetera)? And for that matter, what about all the other physical laws, the ones that aren’t expressed as conservation laws? Well, you’re right, there are some quite silly inconsistencies here.

The fact of the matter is that in order to do thermo, you need to import a great deal of classical mechanics. You can think of this as the minus-oneth law of thermodynamics.

Sometimes the process of importing a classical idea into the world of thermodynamics is trivial, and sometimes not. For example:

The law of conservation of momentum would be automatically valid if we applied it by breaking a complex object into its elementary components, applying the law to each component separately, and summing the various contributions. That’s fine, but nobody wants to do it that way. In the spirit of thermodynamics, we would prefer a macroscopic law. That is, we would like to be able to measure the overall mass of the object (M), measure its average velocity (V), and from that compute a macroscopic momentum (MV) obeying the law of conservation of momentum. In fact this macroscopic approach works fine, and can fairly easily be proven to be consistent with the microscopic approach. No problem.   The notion of kinetic energy causes trouble when we try to import it. Sometimes you want a microscopic accounting of kinetic energy, and sometimes you want to include only the macroscopic kinetic energy. There is nontrivial ambiguity here, as discussed in section 17.4 and reference 14.

6  Functions of State

6.1  Functions of State : Basic Notions

Terminology: By definition, the term state function applies to any measurable quantity that is uniquely determined by the thermodynamic state, i.e. the macrostate.

Terminology: The term thermodynamic potential is synonymous with state function.

Example: In an ordinary chunk of metal at equilibrium, state functions include energy (E), entropy (S), temperature (T), molar volume (V/N), total mass, speed of sound, et cetera. Some additional important thermodynamics potentials are discussed in section 13.

In thermodynamics, we require the energy E to be a function of state. This doesn’t tell us anything about E, but it tells us something about our notion of thermodynamic state. That is, we choose our notion of “state” to ensure that E will be a function of state.

Similarly, we require the entropy S to be a function of state.

In the common situation where the volume V is important, we assume V is a function of state. If V is one component of the state vector, that’s not a problem. Calculating V as a function of V is not a problem. I wish all my problems were this easy.

Counterexample: The microstate is not a function of state (except in rare extreme cases). Knowing the macrostate is not sufficient to tell you the microstate (except in rare extreme cases).

Counterexample: Suppose we have a system containing a constant amount H2O. Under “most” conditions, specifying the pressure and temperature suffices to specify the thermodynamic state. However, things get ugly if the temperature is equal to the freezing temperature. Then you don’t know how much of the sample is liquid and how much is solid. In such a situation, pressure and temperature do not suffice to specify the thermodynamic state. (In contrast, specifying the pressure and entropy would suffice.)

6.2  Path Independence

When we say that something is a function of state, we are saying that it does not depend on history; it does not depend on how we got into the given state.

We can apply this idea to changes in any function of state. For example, since E is a function of state, we can write

ΔE = Efinal − Einitial   
  = independent of path
             (12)

When we say that ΔE is independent of path, that mean that ΔE is the same, no matter how many steps it takes to get from the initial state to the final state. The path can be simple and direct, or it can involve all sorts of loops and cycles.

As a corollary, if we get from state A to state D by two different paths, as shown in figure 8, if we add up the changes along each step of each paths, we find that the sum of the changes is independent of paths. That is,

sigma-delta
Figure 8: Sum of Changes Along Different Paths

ΔAD(X)  =   ΔAB(X)  +   ΔBC(X)  + ΔCD(X)              (13)

As usual Δ(X) refers to the change in X. Here X can any thermodynamic potential.

The term sigma-delta is sometimes used to refer to a sum of changes. Equation 13 states that the sigma-delta is independent of path.

It must be emphasized that the principle of the path-independent sigma-delta has got nothing to do with any conservation law. It applies to non-conserved state-functions such as temperature and molar volume just as well as it applies to conserved state-functions such as energy. For example, if the volume V is a function of state, then:

ΔV = Vfinal − Vinitial   
  = independent of path
             (14)

which is true even though V is obviously not a conserved quantity.

Equation 14 looks trivial and usually is trivial. That’s because usually you can easily determine the volume of a system, so it’s obvious that ΔV is indepenent of path.

The derivation of equation 12 is just as trivial as the derivation of equation 14, but the applications of equation 12 are not entirely trivial. That’s because you can’t always determine the energy of a system just by looking at it. It may be useful to calculate ΔE along one simple path, and then argue that it must be the same along any other path connecting the given initial and final states.

Remark: It is a fairly common mistake for people to say that ΔE is a function of state. It’s not a function of state; it’s a function of two states, namely the initial state and the final state, as you can see from the definition: ΔE = EfinalEinitial. For more on this, see reference 3. As explained there,

6.3  Hess’s Law, Or Not

Circa 1840, Germain Henri Hess empirically discovered a sum rule for the so-called heat of reaction. This is called Hess’s Law. Beware that it is not always true, because the heat of reaction is not a function of state.

A simple counterexample is presented in figure 9.

hess-violation
Figure 9: Disproof of Hess’s Law for Heat

We start in the upper left of the figure. We turn the crank on the generator, which charges the battery. That is, electrochemical reactions take place in the battery. We observe that very little heat is involved in this process. The charged-up battery is shown in blue.

If we stop cranking and wait a while, we notice that this battery has a terrible shelf life. Chemical reactions take place inside the battery that discharge it. This is represented conceptually by a “leakage resistor” internal to the battery. This is represented schematically by an explicit resistor in figure 9. In any event, we observe that the battery soon becomes discharged, and becomes warmer. If we wait a little longer, heat flows across the boundary of the system (as shown by the wavy red arrows). Eventually we reach the state shown in the lower right of the diagram, which is identical to the initial state.

There is of course a simpler path for reaching this final state, namely starting at the same initial state and doing nothing ... no cranking, and not even any waiting. This clearly violates Hess’s law because the heat of reaction of the discharge process is the dominant contribution along one path, and nothing similar is observed along the other path.

Hess’s law in its original form is invalid because heat content is not a state function, and heat of reaction is not the delta of any state function.

Tangential remark: in cramped thermodynamics, a cramped version of Hess’s Law is usually valid, because “heat content” is usually a function of state in cramped thermodynamics. This is a trap for the unwary. This is just one of the many things that are true in cramped thermodynamics but cannot be extended to uncramped thermodynamics.

We can extricate ourselves from this mess by talking about enthalpy instead of heat. There is a valid sum rule for the enthalpy of reaction, because enthalpy is a function of state. That is:

ΔH = Hfinal − Hinitial   
  = independent of path
             (15)

We emphasize that this does not express conservation of enthalpy. In fact, enthalpy is not always conserved, but equation 15 remains true whenever enthalpy is a function of state.

Equation 15 could be considered a modernized, “repaired” version of Hess’s law. It is not very important. It does not tell us anything about the enthalpy except that it is a function of state. It is a mistake to focus on applying the sigma-delta idea to enthalpy to the exclusion of the innumerable other state-functions to which the sigma-delta idea applies equally well.

I see no value in learning or teaching any version of Hess’s Law. It is better to simply remember that there is a sigma-delta law for any function of state.

The sigma-delta of any function of state
is independent of path.
     

6.4  Partial Derivatives

Let’s build up a scenario, based on some universal facts plus some scenario-specific assumptions.

We know that the energy of the system is well defined. Similarly we know the entropy of the system is well defined. These aren’t assumptions. Every system has energy and entropy.

Next, as mentioned in section 6.1, we assume that the system has a well-defined thermodynamic state, i.e. macrostate. This macrostate can be represented as a point in some abstract state-space. At each point in macrostate-space, the macroscopic quantities we are interested in (energy, entropy, pressure, volume, temperature, etc.) take on well-defined values.

We further assume that this macrostate-space has dimensionality M, and that M is not very large. (This M may be larger or smaller than the dimensionality D of the position-space we live in, namely D=3.)

Assuming a well-behaved thermodynamic state is a highly nontrivial assumption.

We further assume that the quantities of interest vary smoothly from place to place in macrostate-space.

We must be careful how we formalize this “smoothness” idea. By way of analogy, consider a point moving along a great-circle path on a sphere. This path is nice and smooth, by which we mean differentiable. We can get into trouble if we try to describe this path in terms of latitude and longitude, because the coordinate system is singular at the poles. This is a problem with the coordinate system, not with the path itself. To repeat: a great-circle route that passes over the pole is differentiable, but its representation in spherical polar coordinates is not differentiable.

Applying this idea to thermodynamics, consider an ice/water mixture at constant pressure. The temperature is a smooth function of the energy content, whereas the energy-content is not a smooth function of temperature. I recommend thinking in terms of an abstract point moving in macrostate-space. Both T and E are well-behaved functions, with definite values at each point in macrostate-space. We get into trouble if we try to parameterize this point using T as one of the coordinates, but this is a problem with the coordinate representation, not with the abstract space itself.

We will now choose a particular set of variables as a basis for specifying points in macrostate-space. We will use this set for a while, but we are not wedded to it. As one of our variables, we choose S, the entropy. The remaining variables we will collectively call V, which is a vector with D−1 dimensions. In particular, we choose the macroscopic variable V in such a way that the microscopic energy Êi of the ith microstate is determined by V. (For an ideal gas in a box, V is just the volume of the box.)

Given these assumptions, we can write:

dE =   
∂ E
∂ V
 


 


S
 dV +     
∂ E
∂ S
 


 


V
 dS              (16)

which is just the chain rule for differentiating a function of two variables. More elaborate versions of this will be discussed in section 17.1.

It is conventional to define the symbols

P :=  − 
∂ E
∂ V
 


 


S
             (17)

and

T :=  
∂ E
∂ S
 


 


V
             (18)

You might say this is just terminology, just a definition of T … but we need to be careful because there are also other definitions of T floating around. More importantly, if we are going to connect this T to our notion of temperature, there are some basic qualitative properties that we want temperature to have, as discussed in section 10.1. Equation 18 is certainly not the most general definition of temperature, because of several assumptions that we made in the lead-up to equation 16. By way of counterexample, in NMR or ESR, a τ2 process changes the entropy without changing the energy. As an even simpler counterexample, internal leakage currents within a thermally-isolated storage battery increase the entropy of the system without changing the energy; see figure 4 and section 10.5.4.

Using the symbols we have just defined, we can rewrite equation 16 in the following widely-used form:

dE =   −P dV + T dS              (19)

 

(See equation 39 for a generalization of this equation.)

Similarly, if we choose to define

w := 
 
∂ E
∂ V
 


 


S
 dV
  
    = P dV
             (20)

and

q := 
 
∂ E
∂ S
 


 


V
 dS
  
    = T dS
             (21)

That’s all fine; it’s just terminology. Note that w and q are one-forms, not scalars, as discussed in section 7.1. They are functions of state, i.e. uniquely determined by the thermodynamic state.9 Using these definitions of w and q we can write

dE = w + q                (22)

which is fine so long as we don’t misinterpret it. However you should keep in mind that equation 22 and its precursors are very commonly misinterpreted. In particular, it is tempting to interpret w as “work” and q as “heat”, which is either a good idea or a bad idea, depending on which of the various mutually-inconsistent definitions of “work” and “heat” you happen to use. See section 16.1 and section 17.1 for details.

You should also keep in mind that these equations (equation 16, equation 19 and/or equation 22) do not represent the most general case. An important generalization is mentioned in section 6.8.

Recall that we are not wedded to using (V,S) as our basis in macrostate space. As an easy but useful change of variable, consider the case where V = XYZ, in which case we can expand equation 16 as:

dE = 
  
∂ E
∂ X
 


 


Y,Z,S
 dX +   
∂ E
∂ Y
 


 


Z,X,S
 dY +   
∂ E
∂ Z
 


 


X,Y,S
 dZ +   
∂ E
∂ S
 


 


X,Y,Z
 dS
  
    = − YZP dX +   − ZXP dY    − XYP dZ   + T dS
  
    = − FX dX +   − FY dY    − FZ dZ   + T dS
             (23)

where we define the forces FX, FY, and FZ as directional derivatives of the energy: FX := −∂ E / ∂ X |Y,Z,S and similarly for the others.

6.5  Heat Capacities, Energy Capacity, and Enthalpy Capacity

Here’s another change of variable that calls attention to some particularly interesting partial derivatives. Now that we have introduced the T variable, we can write

dE =   
∂ E
∂ V
 


 


T
 dV +     
∂ E
∂ T
 


 


V
 dT              (24)

assuming things are sufficiently differentiable.

The derivative in the second term on the RHS is conventionally called the heat capacity at constant volume. As we shall see in connection with equation 31, it is safer to think of this as the energy capacity. The definition is:

 CV := 
∂ E
∂ T
 


 


V
                (25)

again assuming the RHS exists. (This is a nontrivial assumption. By way of counterexample, the RHS does not exist near a first-order phase transition such as the ice/water transition, because the energy is not differentiable with respect to temperature there. This corresponds roughly to an infinite energy capacity, but it takes some care and some sophistication to quantify what this means. See reference 15.)

cv
Figure 10: Energy Capacity aka Heat Capacity at Constant Volume

The energy capacity in equation 25 is an extensive quantity. The corresponding intensive quantities are the specific energy capacity (energy capacity per unit mass) and the molar energy capacity (energy capacity per particle).

The other derivative on the RHS of equation 24 doesn’t have a name so far as I know. It is identically zero for a table-top sample of ideal gas (but not in general).

The term isochoric means “at constant volume”, so CV is the isochoric heat capacity ... but more commonly it is just called the “heat capacity at constant volume”.

Using the chain rule, we can find a useful expression for CV in terms of entropy:

CV = 
∂ E
∂ S
  
∂ S
∂ T
    all at constant V  
  
  = 
T   
∂ S
∂ T
 


 


V
             (26)

This equation is particularly useful in reverse, as means for measuring changes in entropy. That is, if you know CV as a function of temperature, you can divide it by T and integrate with respect to T along a contour of constant volume. The relevant formula is:

dS = 
    
1
T
 CV dT
    at constant V
             (27)

We could have obtained the same result more directly using the often-important fact, from equation 19,

dS = 
    
1
T
 dE
    at constant V
             (28)

and combining it with the definition of CV from equation 24 and equation 25:

dE =  CV dT    at constant V
             (29)

Equation 29 is useful, but there are some pitfalls to beware of. For a given sample, you might think you could ascertain the absolute entropy S at a given temperature T by integrating from absolute zero up to T. Alas nobody has ever achieved absolute zero in practice, and using an approximation of zero K does not necessarily produce a good approximation of the total entropy. There might be a lot of entropy hiding in that last little interval of temperature. Even in theory this procedure is not to be trusted. There are some contributions to the entropy – such as the entropy of mixing – that may be hard to account for in terms of dS = dE/T. Certainly it would disastrous to try to “define” entropy in terms of dS = dE/T or anything like that.

Remark: Equation 24 expands the energy in terms of one set of variables, while equation 16 expands it in terms of another set of variables. This should suffice to dispel the misconception that E (or any other thermodynamic potential) is “naturally” a function of one set of variables to the exclusion of other variables. See section 13.6 and reference 2 for more on this.

This concludes our discussion of the constant-volume situation. We now turn our attention to the constant-pressure situation.

Operationally, it is often easier maintain constant ambient pressure than to maintain constant volume. For a gas or liquid, we can measure some sort of “heat capacity” using an apparatus along the lines shown in figure 11. That is, we measure the temperature of the sample as a function of the energy put in via the heater. However, this energy is emphatically not the total energy crossing the boundary, because we have not yet accounted for the PdV work done by the piston as it moves upward (as it must, to maintain constant pressure), doing work against gravity via the weight W. Therefore the energy of the heater does not measure the change of the real energy E of the system, but rather of the enthalpy H, as defined by equation 64.

cp
Figure 11: Heat Capacity at Constant Pressure

This experiment can be modelled using the equation:

dH =   
∂ H
∂ P
 


 


T
 dP +     
∂ H
∂ T
 


 


P
 dT              (30)

This is analogous to equation 24 ... except that we emphasize that it involves the enthalpy instead of the energy. The second term on the right is conventionally called the heat capacity at constant pressure. It is however safer to call it the enthalpy capacity. The definition is:

 CP := 
∂ H
∂ T
 


 


P
                (31)

Under favorable conditions, the apparatus for measuring CV for a chunk of solid substance is particularly simple, because don’t need the container and piston shown in figure 11; the substance contains itself. We just need to supply thermal insulation. The analysis of the experiment remains the same; in particular we still need to account for the PdV work done when the sample expands, doing work against the ambient pressure.

The term isobaric means “at constant pressure”, so another name for CP is the isobaric heat capacity.

In analogy to equation 27 we can write

dS = 
    
1
T
 CP dT
    at constant P
             (32)

which we can obtain using the often-important fact, from equation 67,

dS = 
    
1
T
 dH
    at constant P
             (33)

and combining it with the definition of CP from equation 30 and equation 31:

dH =  CP dT    at constant P
             (34)

Collecting results for comparison, we have

dE =  CVdT    at constant V
dH =  CPdT    at constant P
dS = 
 
1
T
 CV
dT    at constant V
dS = 
 
1
T
 CP
dT    at constant P
             (35)

Remark: We see once again that the term “heat” is ambiguous in ways that entropy is not. In the first two rows, the LHS is different, yet both are called “heat”, which seems unwise. In the second two rows, the LHS is the same, and both are called entropy, which is just fine.

Starting with either of the last two lines of equation 35 and solving for the heat capacity, we see that we can define a generalized heat capacity as:

CX = 
T 
∂ S
∂ T



 


X
 
  = 
∂ S
∂ ln(T)



 


X
 
             (36)

where X can be just about anything, including XV or XP.

Remark: Heat capacity has the same dimensions as entropy.

We see from equation 36 that the so-called heat capacity can be thought of as the entropy capacity ... especially if you use a logarithmic temperature scale.

Equation 36 is useful for many theoretical and analytical purposes, but it does not directly correspond to the way heat capacities are usually measured in practice. The usual procedure is to observe the temperature as a function of energy or enthalpy, and to apply equation 25 or equation 31.

This supports the point made in section 0 and section 16.1, namely that the concept of “heat” is a confusing chimera. It’s part energy and part entropy. It is neither necessary nor possible to have an unambiguous understanding of “heat”. If you understand energy and entropy, you don’t need to worry about heat.

6.6  Yet More Partial Derivatives

Equation 16 is certainly not the only possible way to express the exterior derivative of E. Here’s another widely-useful expression:

dE =   
∂ E
∂ N
 


 


V,S
 dN +
∂ E
∂ V
 


 


N,S
 dV +
∂ E
∂ S
 


 


N,V
 dS              (37)

where N represents the number of particles. We temporarily assume there is only one species of particles, not a mixture.

You can see that this is a more-general expression; equation 16 is a corollary valid in the special case where dN=0.

The conventional pet name for the first derivative on the RHS is chemical potential, denoted µ. That is:

µ :=   
∂ E
∂ N
 


 


V,S
             (38)

where N is the number of particles in the system (or subsystem) of interest.

This means we can write:

dE = µ dN − P dV + T dS              (39)

which is a generalization of equation 19.

It is emphatically not mandatory to express E as a function of (V,S) or (N,V,S). Almost any variables that span the state-space will do, as mentioned in section 13.6 and reference 2.

You should not read too much into the name “chemical” potential. There is not any requirement nor even any connotation that there be any chemical reactions going on.

The defining property of the chemical potential (µ) is that it is conjugate to an increase in number (dN) … just as the pressure (P) is conjugate to a decrease in volume (−dV). Note the contrast: in the scenario described by equation 39:

Stepping across a contour of −dV increases the density (same number in a smaller volume).   Stepping across a contour of dN increases the density (bigger number in the same volume).

This can happen if a piston is used to change the volume.   This can happen if particles are carried across the boundary of the system, or if particles are produced within the interior of the system (by splitting dimers or whatever).

So we see that dN and dV are two different directions in parameter space. Conceptually and mathematically, we have no basis for declaring them to be “wildly” different directions or only “slightly” different directions; all that matters is that they be different i.e. linearly independent. At the end of the day, we need a sufficient number of linearly independent variables, sufficient to span the parameter space.

Equation 39 is a generalization of equation 19, but it is not the absolute most-general equation. In fact there is no such thing as the most-general equation; there’s always another generalization you can make. For example, equation 39 describes only one species of particle; if there is another species, you will have to define a new variable N2 to describe it, and add another term involving dN2 to the RHS of equation 39. Each species will have its own chemical potential. Similarly, if there are significant magnetic interactions, you need to define a variable describing the magnetic field, and add the appropriate term on the RHS of equation 39. If you understand the meaning of the equation, such generalizations are routine and straightforward. Again: At the end of the day, any expansion of dE needs a sufficient number of linearly independent variables, sufficient to span the relevant parameter space.

For a more formal discussion of using the chain rule to expand differentials in terms of an arbitrary number of variables, see reference 2.

If you want to be really retentive about it, you could argue that equation 39 is not quite correct, and a more-correct expression would be

dE = µ dN − P dV + T dS + ⋯              (40)

where the ellipsis (⋯) represents all the terms that have been left out. However, in science it is traditional to leave out the ellipsis, recognizing that no equation is fully general, and equation 39 is merely a corollary of some unstated cosmic generality, valid under the proviso that the omitted terms are unimportant.

6.7  Integration

Let’s continue to assume that T and P are functions of state, and that S and V suffice to span the macrostate-space.

Then, in cases where equation 19 is valid, we can integrate both sides to find E. This gives us an expression for E as a function of V and S alone (plus a constant of integration that has no physical significance). Naturally, this expression is more than sufficient to guarantee that E is a function of state.

Things are much messier if we try to integrate only one of the terms on the RHS of equation 19. Without loss of generality, let’s consider the T dS term. We integrate T dS along some path Γ. Let the endpoints of the path be A and B.

It is crucial to keep in mind that the value of the integral depends on the chosen path — not simply on the endpoints. It is OK to write things like

sΔ QΓ =  
 


Γ
T dS                (41)

whereas it would be quite unacceptable to replace the path with its endpoints:

(anything) =  
B


A
 T dS              (42)

I recommend writing QΓ rather than Q, to keep the path-dependence completely explicit. This QΓ exists only along the low-dimensional subspace defined by the path Γ, and cannot be extended to cover the whole thermodynamic state-space. That’s because T dS is an ungrady one-form. See section 7.1 for more about this.

6.8  Advection

Equation 19 is predicated on the assumption that the energy is known as a function V and S alone. However, this is not the most general case. As an important generalization, consider the energy budget of a typical automobile. The most-common way of increasing the energy within the system is to transfer fuel (and oxidizer) across the boundary of the system. This is an example of advection of energy. This contributes to dE, but is not included in PdV or TdS. So we should write something like:

dE =   −P dV + T dS + advection              (43)

 

It is possible to quantify the advection mathematically. Simple cases are easy. The general case would lead us into a discussion of fluid dynamics, which is beyond the scope of this document.

6.9  Deciding What’s True

Having derived results such as equation 19 and equation 43, we must figure out how to interpret the terms on the RHS. Please consider the following notions and decide which ones are true:

  1. Heat is defined to be TdS (subject to the usual restrictions, discussed in section 6.4).
  2. Heat is defined to be “energy that is transferred from one body to another as the result of a difference in temperature”.
  3. The laws of thermodynamics apply even when irreversible processes are occuring.

It turns out that these three notions are mutually contradictory. You have to get rid of one of them, for reasons detailed in section 16.1 and section 15.

As a rule, you are allowed to define your terms however you like. However, if you want a term to have a formal, well-defined meaning,

The problem is, many textbooks don’t play by the rules. On some pages they define heat to be TdS, on some pages they define it to be flow across a boundary, and on some pages they require thermodynamics to apply to irreversible processes.

This is an example of boundary/interior inconsistency, as discussed in section 15.

The result is a shell game, or a whack-a-mole game: There’s a serious problem, but nobody can pin down the location of the problem.

This results in endless confusion. Indeed, sometimes it results in holy war between the Little-Endians and the Big-Endians: Each side is 100% convinced that their definition is “right”, and therefore the other side must be “wrong”. (Reference 16.) I will not take sides in this holy war. Viable alternatives include:

  1. Pick one definition of heat. Explicitly say which definition you’ve chosen, and use it consistently. Recognize that others may choose differently.
  2. Go ahead and use the term informally, with multiple inconsistent meanings, as many experts do. Just don’t pretend you’re being consistent when you’re not. Use other terms and concepts (e.g. energy and entropy) when you need to convey a precise meaning.
  3. Avoid using term “heat” any more than necessary. Focus attention on other terms and concepts (e.g. energy and entropy).

For more on this, see the discussion near the end of section 6.10.

6.10  Deciding What’s Fundamental

It is not necessarily wise to pick out certain laws and consider them “axioms” of physics. As Feynman has eloquently argued in reference 17, real life is not like high-school geometry, where you were given a handful of axioms and expected to deduce everything from that. In the real world, every fact is linked to many other facts in a grand tapestry. If a hole develops in the tapestry, you can re-weave it starting from the top of the hole, or the bottom, or either side. That is to say, if you forget one particular fact, you can re-derive it in many different ways.

In this spirit, some folks may wish to consider equation 3 and equation 22 as being equally axiomatic, or equally non-axiomatic. One can be used to re-derive the other, with the help of other facts, subject to certain limitations.

On the other hand, some facts are more useful than others. Some are absolutely central to our understanding of the world, while others are less so. Some laws are more worth discussing and remembering, while others are less so. Saying that something is true and useful does not make it fundamental; the expression 1+2+3+4=10 is true and sometimes useful, but it isn’t very fundamental, because it lacks generality.

Deciding which laws to emphasize is to some extent a matter of taste, but one ought to consider such factors as simplicity and generality, favoring laws with a large number of predictions and a small number of exceptions.

In my book, energy conservation (equation 3) is fundamental. From that, plus a couple of restrictions, we can derive equation 22 using calculus. Along the way, the derivation gives us important information about how w and q should be interpreted. It’s pretty clear what the appropriate restrictions are.

If you try to go the other direction, i.e. from w+q to conservation of energy, you must start by divining the correct interpretation of w and q. The usual “official” interpretations are questionable to say the least, as discussed in section 10.5 and section 15. Then you have to posit suitable restrictions and do a little calculus. Finally, if it all works out, you end up with an unnecessarily restrictive version of the local energy-conservation law.

Even in the best case I have to wonder why anyone would bother with the latter approach. I would consider such a derivation as being supporting evidence for the law of local conservation of energy, but not even the best evidence.

I cannot imagine why anyone would want to use equation 22 or equation 43 as “the” first law of thermodynamics. Insead, I recommend using the local law of conservation of energy … which is simpler, clearer, more fundamental, more powerful, and more general.

It’s not at all clear that thermodynamics should be formulated in quasi-axiomatic terms, but if you insist on having a “first law” it ought to be a simple, direct statement of local conservation of energy. If you insist on having a “second law” it ought to be a simple, direct statement of local paraconservation of entropy.

Another way to judge equation 22 is to ask to what extent it describes this-or-that practical device. Two devices of the utmost practical importance are the thermally-insulating pushrod and the ordinary nonmoving heat exchanger. The pushrod transfers energy and momentum (but no entropy) across the boundary, while the heat exchanger transfers energy and entropy (but no momentum) across the boundary.

It is traditional to describe these devices in terms of work and heat, but it is not necessary to do so, and I’m not convinced it’s wise. As you saw in the previous paragraph, it is perfectly possible to describe them in terms of energy, momentum, and entropy, which are the true coin of the realm, the truly primary and fundamental physical quantities. Heat and work are secondary at best (even after you have resolved the nasty inconsistencies discussed in section 6.9 and section 15).

Even if/when you can resolve dE into a −PdV term and a TdS term, that doesn’t mean you must do so. In many cases you are better off keeping track of E by itself, and keeping track of S by itself. Instead of saying no heat flows down the pushrod, it makes at least as much sense to say that no entropy flows down the pushrod. Keeping track of E and S is more fundamental, as you can see from the fact that energy and entropy can be exchanged between systems that don’t even have a temperature (section 10.4).

When in doubt, rely on the fundamental laws: conservation of energy, conservation of momentum, paraconservation of entropy, et cetera.

7  The W + Q Equation

7.1  Grady and Ungrady One-Forms

Sometimes people who are trying to write equation 16 or equation 22 instead write something like

dE = dW + dQ         (allegedly)              (44)

which is deplorable.

Using the language of differential forms, the situation can be understood as follows:

where in the last four items, we have to say “in general” because exceptions can occur in peculiar situations, mainly cramped situations where it is not possible to contruct a heat engine. Such situations are very unlike the general case, and not worth much discussion beyond what was said in conjunction with equation 41. When we say something is a state-function we mean it is a function of the thermodynamic state. The last two items follow immediately from the definition of grady versus ungrady.

Figure 12 shows the difference between a grady one-form and an ungrady one-form.

As you can see in on the left side of the figure, the quantity dS is grady. If you integrate clockwise around the loop as shown, the net number of upward steps is zero. This is related to the fact that we can assign an unambigous height (S) to each point in (T,S) space.   In contrast, as you can see on the right side of the diagram, the quantity TdS is not grady. If you integrate clockwise around the loop as shown, there are considerably more upward steps than downward steps. There is no hope of assigning a height “Q” to points in (T,S) space.

dS-TdS
Figure 12: dS is Grady, TdS is Not

For details on the properties of one-forms, see reference 3 and perhaps reference 18.

Be warned that in the mathematical literature, what we are calling ungrady one-forms are called “inexact” one-forms. The two terms are entirely synonymous. A one-form is called “exact” if and only if it is the gradient of something. We avoid the terms “exact” and “inexact” because they are too easily misunderstood. In particular, in this context,

The difference between grady and ungrady has important consequences for practical situations such as heat engines. Even if we restrict attention to reversible situations, we still cannot think of Q as a function of state, for the following reasons: You can define any number of functions Q1, Q2, ⋯ by integrating TdS along some paths Γ1, Γ2, ⋯ of your choosing. Each such Qi can be interpreted as the total heat that has flowed into the system along the specified path. As an example, let’s choose Γ6 to be the path that a heat engine follows as it goes around a complete cycle – a reversible cycle, perhaps a Carnot cycle or some such. Let Q6(N) be the value of Q6 at the end of the Nth cycle. We see that even after specifying the path, Q6 is still not a state function, because at the end of each cycle, all the state functions return to their initial values, whereas Q6(N) grows linearly with N. This proves that in any situation where you can build a heat engine, q is not equal to d(anything).

7.2  Abuse of the Notation

Suppose there are two people, namely wayne and dwayne. There is no special relationship between them. In particular, we intepret dwayne as a simple six-letter name, not as d(wayne) i.e. not as the derivative of wayne.

Some people try to use the same approach to supposedly define dQ to be a “two-letter name” that represents T dS – supposedly without implying that dQ is the derivative of anything. That is emphatically not acceptable. That would be a terrible abuse of the notation.

In accordance with almost-universally accepted convention, d is an operator, and dQ denotes the operator d applied to the variable Q. If you give it any other interpretation, you are going to confuse yourself and everybody else.

The point remains that in thermodynamics, there does not exist any Q such that dQ = T dS (except perhaps in trivial cases). Wishing for such a Q does not make it so. See section 18 for more on this.

7.3  Why dW and dQ Are Tempting

It is remarkable that people are fond of writing things like dQ … even in cases where it does not exist. (The remarks in this section apply equally well to dW and similar monstrosities.)

Even people who know it is wrong do it anyway. They call dQ an “inexact differential” and sometimes put a slash through the d to call attention to this. The problem is, neither dQ nor ðQ is a differential at all. Yes, TdS is an ungrady one-form or (equivalently) an inexact one-form, but no, it is not properly called an inexact differential, since it is generally not a differential at all. It is not the derivative of anything.

One wonders how such a bizarre tangle of contradictions could arise, and how it could persist. I hypothesize part of the problem is a too-narrow interpretation of the traditional notation for integrals. Most mathematics books say that every integral should be written in the form

 


 
 (integrand) d(something)                (45)

where the d is alleged to be merely part of the notation – an obligatory and purely mechanical part of the notation – and the integrand is considered to be separate from the d(something).

However, it doesn’t have to be that way. If you think about a simple scalar integral from the Lebesgue point of view (as opposed to the Riemann point of view), you realize that what is indispensible is a weighting function. Specifically: d(something) is a perfectly fine, normal type of weighting function, but not the only possible type of weighting function.

In an ordinary one-dimensional integral, we are integrating along a path, which in the simplest case is just an interval on the number line. Each element of the path is a little pointy vector, and the weighing function needs to map that pointy vector to a number. Any one-form will do, grady or otherwise. The grady one-forms can be written as d(something), while the ungrady ones cannot.

For purposes of discussion, in the rest of this section we will put square brackets around the weighting function, to make it easy to recognize even if it takes a somewhat unfamiliar form. As a simple example, a typical integral can be written as:

 


Γ
(integrand) [(weight)]              (46)

where Γ is the domain to be integrated over, and the weight is typically something like dx.

As a more intricate example, in two dimensions the moment of inertia of an object Ω is:

I := 
 


Ω
r2 [dm]              (47)

where the weight is dm. As usual, r denotes distance and m denotes mass. The integral runs over all elements of the object, and we can think of dm as an operator that tells us the mass of each such element. To my way of thinking, this is the definition of moment of inertia: a sum of r2, summed over all elements of mass in the object.

The previous expression can be expanded as:

I = 
 


Ω
r2 [ρ(x,ydx dy]              (48)

where the weighting function is same as before, just rewritten in terms of the density, ρ.

Things begin to get interesting if we rewrite that as:

I = 
 


Ω
r2 ρ(x,y) [dx dy]              (49)

where ρ is no longer part of the weight but has become part of the integrand. We see that the distinction between the integrand and the weight is becoming a bit vague. Exploiting this vagueness in the other direction, we can write:

I = 
 


Ω
[r2 dm]
  
  = 
 


Ω
[r2 ρ(x,ydx dy]
             (50)

which tells us that the distinction between integrand and weighting function is completely meaningless. Henceforth I will treat everything inside the integral on the same footing. The integrand and weight together will be called the argument10 of the integral.

Using an example from thermodynamics, we can write

QΓ = 
 


Γ
T [dS]
  
  = 
 


Γ
[T dS]
  
  = 
 


Γ
[q]
             (51)

where Γ is some path through thermodynamic state-space, and where q is an ungrady one-form, defined as q := TdS.

It must be emphasized that these integrals must not be written as ∫[dQ] nor as ∫[dq]. This is because the argument in equation 51 is an ungrady one-form, and therefore cannot be equal to d(anything).

There is no problem with using TdS as the weighting function in an integral. The only problem comes when you try to write TdS as d(something) or ð(something):

I realize an expression like ∫[q] will come as a shock to some people, but I think it expresses the correct ideas. It’s a whole lot more expressive and more correct than trying to write TdS as d(something) or ð(something).

Once you understand the ideas, the square brackets used in this section no longer serve any important purpose. Feel free to omit them if you wish.

There is a proverb that says if the only tool you have is a hammer, everything begins to look like a nail. The point is that even though a hammer is the ideal tool for pounding nails, it is suboptimal for many other purposes. Analogously, the traditional notation ∫ ⋯ dx is ideal for some purposes, but not for all. Specifically: sometimes it is OK to have no explicit d inside the integral.

There are only two things that are required: the integral must have a domain to be integrated over, and it must have some sort of argument. The argument must be an operator, which operates on an element of the domain to produce something (usually a number or a vector) that can be summed by the integral.

A one-form certainly suffices to serve as an argument (when elements of the domain are pointy vectors). Indeed, some math books introduce the notion of one-forms by defining them to be operators of the sort we need. That is, the space of one-forms is defined as an operator space, consisting of the operators that map column vectors to scalars. (So once again we see that one-forms correspond to row vectors, assuming pointy vectors correspond to column vectors). Using these operators does not require taking a dot product. (You don’t need a dot product unless you want to multply two column vectors.) The operation of applying a row vector to a column vector to produce a scalar is called a contraction, not a dot product.

It is interesting to note that an ordinary summation of the form ∑i Fi corresponds exactly to a Lebesgue integral using a measure that assigns unit weight to each integer (i) in the domain. No explicit d is needed when doing this “integral”. The idea of “weighting function” is closely analogous to the idea of “measure” in Lebesgue integrals, but not exactly the same. We must resist the temptation to use the two terms interchangeably. In particular, a measure is by definition a scalar, but sometimes (such as when integrating along a curve) it is important to use a weighting function that is a vector.

People heretofore have interpreted d in several ways: as a differential operator (with the power, among other things, to produce one-forms from scalars), as an infinitesimal step in some direction, and as the marker for the weighting function in an integral. The more I think about it, the more convinced I am that the differential operator interpretation is far and away the most advantageous. The other interpretations of d can be seen as mere approximations of the operator interpretation. The approximations work OK in elementary situations, but produce profound misconceptions and contradictions when applied to more general situations … such as thermodynamics.   In contrast, note that in section 16.1, I do not take such a hard line about the multiple incompatible definitions of heat. I don’t label any of them as right or wrong. Rather, I recognize that each of them in isolation has some merit, and it is only when you put them together that conflicts arise.

Bottom line: There are two really simple ideas here: (1) d always means exterior derivative. The exterior derivative of any scalar-valued function is a vector. It is a one-form, not a pointy vector. In particular it is always a grady one-form. (2) An integral needs to have a weighting function, which is not necessarily of the form d(something).

8  Connecting Entropy with Energy

8.1  The Boltzmann Distribution

We shall see that in equilibrium, energy is distributed among the microstates according to a very special probability distribution, namely the Boltzmann distribution. That is, the probability of finding the system in microstate i is given by:

Pi = e−Êi / kT    …   for a thermal distribution              (52)

where Êi is the energy of the ith microstate, and kT is the temperature measured in energy units. That is, plain T is the temperature measured in degrees, and k is Boltzmann’s constant, which is just the conversion factor from degrees to whatever units you are using to measure Êi.

Figure 13 shows this distribution graphically.

ei-pi
Figure 13: An Equilibrium Distribution

Evidence in favor of equation 52 is discussed in section 10.2.

8.2  Locrian and Non-Locrian

Alas, a theory of thermodynamics strictly limited to Boltzmann distributions would not be very useful. We must broaden our notion of what a “thermal” distribution is. Specifically, we must consider the case of a Boltzmann exponential distribution with exceptions. The flywheel considered in section 8.3 has ≈1023 modes that follow the Boltzmann distribution, and one that does not. The exception involves a huge amount of energy, but involves essentially zero entropy. Also, most importantly, we can build a thermometer that couples to the thermal modes without coupling to the exceptional mode.

A particularly interesting case is shown in figure 14. In this case there are two exceptions. This situation has exactly the same entropy as the situation shown in figure 13. This can be seen directly from equation 5 since the Pi values are the same, differing only by a permutation of the dummy index i.

ei-pi-x
Figure 14: An Equilibrium Distribution with Exceptions

Meanwhile, the energy shown in figure 14 is significantly larger than the energy shown in figure 13.

Terminology: The Locrian microstates conform to the Boltzmann distribution, while the non-Locrian microstates are the exceptions.

This term originated as follows: I wanted a term I could define without conflict. It would be awkward and ambiguous to speak of non-exceptional and exceptional modes, since things can be exceptional (or non-exceptional) in all sorts of different ways, and I wanted to denote a particular category of exceptions. You could imagine speaking in terms of B-modes (B for Boltzmann) versus non-B-modes, but that is a bit awkward, too. In music, the Locrian mode “is” the B-mode, in the sense that on a harp or piano, with no sharps or flats, you can play the Locrian scale starting on B. Bottom line: Locrian means B-mode, i.e. conforming to the Boltzmann distribution.

locrian
Figure 15: Locrian and Non-Locrian

Figure 15 deepens our understanding of Locrian versus non-Locrian. On the LHS of the figure, we have a so-called air spring; that is, we are applying a force F to a piston, and this force is being resisted by the gas pressure inside the cylinder. Meanwhile, on the RHS of the figure, we are applying a force to a plain old metal spring, and the force is being resisted by the elastic force of the spring. We have arranged so that the forces are equal. We can even move each handle up and down a little bit, and if things are properly arranged, the two force-versus-distance profiles will be equal (to first order, and maybe even better than that).

Let’s temporarily agree to not look inside the “black boxes” indicated by black dashed lines in the figure. We don’t measure anything but the externally-observable force and the position of the handle. Suppose we move the handle downward, gently and smoothly. This puts some energy into the system. The peculiar thing is that on the LHS, this energy goes into Locrian modes, while on the RHS, it goes into a non-Locrian mode (almost entirely). That means that just measuring force and position of the handle does not suffice to distinguish Locrian from non-Locrian phenomena.

If we want to sort out Locrian from non-Locrian, we need make some more elaborate measurements. One good experiment would be to heat up both systems; the force of the air-spring will be directly proportional to absolute temperature, while the force of the mechanical spring will be vastly less sensitive to temperature.

Note the following distinction:

Under mild restrictions, it is possible to split the energy-change dE into a thermal piece T dS and a mechanical piece P dV.   Under mild restrictions, it is possible to split the overall energy E into a Locrian piece and a non-Locrian piece.

We emphasize that these two splits are not the same! Not even close.

dE is a vector, a one-form.   E is an extensive scalar.

On the LHS of figure 15, we are doing reversible mechanical work on the gas, but the energy in question is being transferred to and from Locrian modes. On the RHS of the figure, we are doing the same amount of work, and doing it in the same way, but inside the black box the energy is being transferred to and from a non-Locrian mode.

A huge amount of harm is done when non-experts take their intuition about the Locrian versus non-Locrian split and conflate it with the T dS versus P dV split.

On the RHS of the figure 15, there is a correct and important distinction of Locrian versus non-Locrian, and this can be expressed in terms of thermal versus nonthermal. It is naturally tempting to assume there “must” be a similar distinction on the LHS … but you must resist this temptation.

Locrian modes are fully equilibrated, which implies that they have the maximum entropy consistent with the given energy and other constraints. Non-Locrian modes have less entropy per unit energy (or, equivalently, more energy per unit entropy). This corresponds to the vernacular terminology of “high-grade energy” versus “low-grade energy”. This is a tremendously practical distinction. A cold, fully-charged battery is much more valuable than a warm, discharged battery, when both have the same overall energy.

In quite a few cases, the non-Locrian modes have some macroscopic structure, such as the spin of a flywheel. This arises because a non-Locrian mode is not very interesting unless it remains non-Locrian for a reasonable amount of time. One way this can arise is if the non-Locrian mode is protected by a conservation law (such as conservation of the angular momentum of the flywheel). Of course it is not mandatory for a non-Locrian mode to have macroscopic structure; the spin echo experiment (section 10.7) serves as a counterexample.

See section 18 for more about the Locrian/non-Locrian split.

8.3  An Illustration : Flywheels, Springs, and Batteries

Let box A contain a cold, rapidly-rotating flywheel.11 Actually, let it contain two counter-rotating flywheels, so we won’t have any net angular momentum to worry about. Also let it contain a cold, tightly-compressed spring and a cold, fully-charged battery.

Compare that with box B which is the same except that the flywheels have been stopped, the spring has been released, and the battery has been discharged … all by dissipative processes entirely internal to the box. The non-Locrian rotational energy, elastic energy, and chemical energy have all been converted to Locrian forms of energy. The flywheels, spring, and battery are now warmer than before. Assume losses into other modes (sound etc.) are negligible.

To summarize: box A and box B have the same energy but different temperature.

The difference between box A and box B also has “something” to do with entropy. Be careful not to think that entropy is proportional to temperature or anything like that; in fact entropy is quantitatively related to a certain derivative of the energy with respect to temperature, as we can see in equation 18.

Let’s be clear: It’s true that in the low-temperature box we have energy in a low-entropy form, and in the high-temperature box we have energy in a higher-entropy form, but this is not (by itself) a defining property of entropy or temperature. Indeed, in spin systems it is easy to have a situation where as the energy goes up, the entropy goes down; see reference 19 for a discussion of this.

We can understand box A and box B in terms of macrostates and microstates as follows: Let ω be the speed of the flywheel, L be the extension of the spring, and Q be the charge on the capacitor. As always, let T be the temperature. Then the macrostate can be described in terms of these variables. Knowing the macrostate doesn’t suffice to tell us the system is in a particular microstate; rather, there is some range, some set of microstates consistent with a given macrostate. We can calculate the number of microstates consistent with the (TB, ωB, LB, QB) macrostate, and compare that with the number of microstates consistent with the (TA, ωA, LA, QA) macrostate. These numbers tell us the entropy, which can be related back to the temperature via equation 18.

In this less-than-general case it is tempting to speak of the “energy” being spread out over a large number of microstates, but remember this is not the defining property of entropy, for reasons discussed in section 2.5.3 and section 8.10. The defining property is that the probability gets spread out over a large number of microstates.

Similarly, in this case it is tempting to speak of box A as being more “ordered” than box B. That’s true and even somewhat relevant … but it ought not be overemphasized, and it must not be thought of as a characteristic property – let alone a defining property – of the low-entropy macrostate. Entropy is not synonymous with disorder, for reasons discussed in section 2.5.4.

8.4  Remarks

8.4.1  Predictable Energy is Freely Convertible; Random Energy is Not

The difference between random energy and predictable energy has many consequences. The most important consequence is that the predictable energy can be freely converted to and from other forms, such as gravitational potential energy, chemical energy, electrical energy, et cetera. In many cases, these conversions can be carried out with very high efficiency. In some other cases, though, the laws of thermodynamics place severe restrictions on the efficiency with which conversions can be carried out, depending on to what extent the energy distribution deviates from the Boltzmann distribution.

8.4.2  Thermodynamic Laws without Temperature

Ironically, the first law of thermodynamics (equation 3) does not depend on temperature. Energy is well-defined and is conserved, no matter what. It doesn’t matter whether the system is hot or cold or whether it even has a temperature at all.

Even more ironically, the second law of thermodynamics (equation 4) doesn’t depend on temperature, either. Entropy is well-defined and is paraconserved no matter what. It doesn’t matter whether the system is hot or cold or whether it even has a temperature at all.

(This state of affairs is ironic because thermodynamics is commonly defined to be the science of heat and temperature, as you might have expected from the name: thermodynamics. Yet in our modernized and rationalized thermodynamics, the two most central, fundamental ideas – energy and entropy – are defined without reference to heat or temperature.)

Of course there are many important situations that do involve temperature. Most of the common, every-day applications of thermodynamics involve temperature – but you should not think of temperature as the essence of thermodynamcs. Rather, it is a secondary concept which is defined (if and when it even exists) in terms of energy and entropy.

8.4.3  Kinetic and Potential Microscopic Energy

You may have heard the term “kinetic theory”. In particular, the thermodynamics of ideal gases is commonly called the kinetic theory of gases. However, you should be careful, because “kinetic theory” is restricted to ideal gases (indeed to a subset of ideal gases) ... while thermodynamics applies to innumerable other things. Don’t fall into the trap of thinking that all thermal energy is necessarily kinetic energy. In almost all systems, including solids, liquids, non-ideal gases, and even some ideal gases, the thermal energy is a mixture of kinetic and potential energy. It is safer and in all ways better to say thermodynamics or statistical mechanics instead of “kinetic theory”.

In typical systems, potential energy and kinetic energy play parallel roles:

In fact, for an ordinary crystal such as quartz or sodium chloride, almost exactly half of the heat capacity is due to potential energy, and half to kinetic energy. It’s easy to see why that must be: The heat capacity is well described in terms of thermal phonons in the crystal. Each phonon mode is a harmonic12 oscillator. In each cycle of any harmonic oscillator, the energy changes from kinetic to potential and back again. The kinetic energy goes like sin2(phase) and the potential energy goes like cos2(phase), so on average each of those is half of the total energy.

Not all kinetic energy is thermal.
Not all thermal energy is kinetic.
     

A table-top sample of ideal gas is a special case, where all the energy is kinetic energy. This is very atypical of thermodynamics in general. Table-top ideal gases are very commonly used as an illustration of thermodynamic ideas, which becomes a problem when the example is overused so heavily as to create the misimpression that thermodynamics deals only with kinetic energy.

You could argue that in many familiar systems, the temperature is closely related to random kinetic energy ... but temperature is not the same thing as heat or thermal energy. Furthermore, there are other systems, such as spin systems, where the temperature is not related to the random kinetic energy.

All in all, it seems quite unwise to define heat or even temperature in terms of kinetic energy.

This discussion continues in section 8.4.4.

8.4.4  Ideal Gas : Potential Energy as well as Kinetic Energy

We have seen that for an ideal gas, there is a one-to-one correspondence between the temperature and the kinetic energy of the gas particles. However, that does not mean that there is a one-to-one correspondence between kinetic energy and heat energy. (In this context, heat energy refers to whatever is measured by a heat capacity experiment.)

To illustrate this point, let’s consider a sample of pure monatomic nonrelativistic nondegenerate ideal gas in a cylinder of horizontal radius r and vertical height h at temperature T. The pressure measured at the bottom of the cylinder is P. Each particle in the gas has mass m. We wish to know the heat capacity per particle at constant volume, i.e. CV/N.

At this point you may already have in mind an answer, a simple answer, a well-known answer, independent of r, h, m, P, T, and N. But wait, there’s more to the story: The point of this exercise is that h is not small. In particular, mgh is not small compared to kT, where g is the acceleration of gravity. For simplicity, you are encouraged to start by considering the limit where h goes to infinity, in which case the exact value of h no longer matters. Gravity holds virtually all the gas near the bottom of the cylinder, whenever hkT/mg.

You will discover that a distinctly nontrival contribution to the heat capacity comes from the potential energy of the ideal gas. When you heat it up, the gas column expands, lifting its center of mass, doing work against gravity. (Of course, as always, there will be a contribution from the kinetic energy.)

For particles the size of atoms, the length-scale kT/mg is on the order of several kilometers, so the cylinder we are considering is much too big to fit on a table top. I often use the restrictive term “table-top” as a shorthand way of asserting that mgh is small compared to kT.

So, this reinforces the points made in section 8.4.3. We conclude that in general, heat energy is not just kinetic energy.

Beware that this tall cylinder is not a good model for the earth’s atmosphere. For one thing, the atmosphere is not isothermal. For another thing, if you are going to take the limit as h goes to infinity, you can’t use a cylinder; you need something more like a cone, spreading out as it goes up, to account for the spherical geometry.

8.4.5  Relative Motion versus “Thermal” Energy

Over the years, lots of people have noticed that you can always split the kinetic energy of a complex object into the KE of the center-of-mass motion plus the KE of the relative motion (i.e. the motion of the components relative to the center of mass).

Also a lot of people have tried (with mixed success) to split the energy of an object into a “thermal” piece and a “non-thermal” piece.

It is an all-too-common mistake to think that the overall/relative split is the same as the nonthermal/thermal split. Beware: they’re not the same. Definitely not. See section 14.1 for more on this.

First of all, the microscopic energy is not restricted to being kinetic energy, as discussed in section 8.4.3. So trying to understand the thermal/non-thermal split in terms of kinetic energy is guaranteed to fail. Using the work/KE theorem (reference 14) to connect work (via KE) to the thermal/nonthermal split is guaranteed to fail for the same reason.

Secondly, a standard counterexample uses flywheels, as discussed in section 17.4. You can impart macroscopic, non-Locrian KE to the flywheels without imparting center-of-mass KE or any kind of potential energy … and without imparting any kind of Locrian energy (either kinetic or potential).

The whole idea of “thermal energy” is problematic, and in many cases impossible to define, as discussed in section 18. If you find yourself worrying about the exact definition of “thermal energy”, it means you’re trying to solve the wrong problem. Find a way to reformulate the problem in terms of energy and entropy.

Center-of-mass motion is an example but not the only example of low-entropy energy. The motion of the flywheels is one perfectly good example of low-entropy energy. Several other examples are listed in section 10.3.

A macroscopic object has something like 1023 modes. The center-of-mass motion is just one of these modes. The motion of counter-rotating flywheels is another mode. These are slightly special, but not very special. A mode to which we can apply a conservation law, such as conservation of momentum, or conservation of angular momentum, might require a little bit of special treatment, but usually not much … and there aren’t very many such modes.

Sometimes on account of conservation laws, and sometimes for other reasons as discussed in section 10.11 it may be possible for a few modes of the system to be strongly coupled to the outside (and weakly coupled to the rest of the system), while the remaining 1023 modes are more strongly coupled to each other than they are to the outside. It is these issues of coupling-strength that determine which modes are in equilibrium and which (if any) are far from equilibrium. This is consistent with our definition of equilibrium (section 9.1).

Thermodynamics treats all the equilibrated modes on an equal footing. One manifestation of this can be seen in equation 52, where each state contributes one term to the sum … and addition is commutative.

There will never be an axiom that says such-and-such mode is always in equilibrium or always not; the answer is sensitive to how you engineer the couplings.

8.5  Entropy Without Constant Re-Shuffling

It is a common mistake to visualize entropy as a highly dynamic process, whereby the system is constantly flipping from one microstate to another. This may be a consequence of the fallacy discussed in section 8.4.5 (mistaking the thermal/nonthermal distinction for the kinetic/potential distinction) … or it may have other roots; I’m not sure.

In any case, the fact is that re-shuffling is not an essential part of the entropy picture.

An understanding of this point proceeds directly from fundamental notions of probability and statistics.

By way of illustration, consider one hand in a game of draw poker.

  A)   The deck is shuffled and hands are dealt in the usual way.
  B)   In preparation for the first round of betting, you look at your hand and discover that you’ve got the infamous “inside straight”. Other players raise the stakes, and when it’s your turn to bet you drop out, saying to yourself “if this had been an outside straight the probability would have been twice as favorable”.
  C)   The other players, curiously enough, stand pat, and after the hand is over you get a chance to flip through the deck and see the card you would have drawn.

Let’s more closely examine step (B). At this point you have to make a decision based on probability. The deck, as it sits there, is not constantly re-arranging itself, yet you are somehow able to think about the probability that the card you draw will complete your inside straight.

The deck, as it sits there during step (B), is not flipping from one microstate to another. It is in some microstate, and staying in that microstate. At this stage you don’t know what microstate that happens to be. Later, at step (C), long after the hand is over, you might get a chance to find out the exact microstate, but right now at step (B) you are forced to make a decision based only on the probability.

The same ideas apply to the entropy of a roomful of air, or any other thermodynamic system. At any given instant, the air is in some microstate with 100% probability; you just don’t know what microstate that happens to be. If you did know, the entropy would be zero … but you don’t know. You don’t need to take any sort of time-average to realize that you don’t know the microstate.

The bottom line is that the essence of entropy is the same as the essence of probability in general: The essential idea is that you don’t know the microstate. Constant re-arrangement is not essential.

This leaves us with the question of whether re-arrangement is ever important. Of course the deck needs to be shuffled at step (A). Not constantly re-shuffled, just shuffled the once.

Again, the same ideas apply to the entropy of a roomful of air. If you did somehow obtain knowledge of the microstate, you might be interested in the timescale over which the system re-arranges itself, making your erstwhile knowledge obsolete and thereby returning the system to a high-entropy condition.

The crucial point remains: the process whereby knowledge is lost and entropy is created is not part of the definition of entropy, and need not be considered when you evaluate the entropy. If you walk into a room for the first time, the re-arrangement rate is not your concern. You don’t know the microstate of this room, and that’s all there is to the story. You don’t care how quickly (if at all) one unknown microstate turns into another.

If you don’t like the poker analogy, we can use a cryptology analogy instead. Yes, physics, poker, and cryptology are all the same when it comes to this. Statistics is statistics.

If I’ve intercepted just one cryptotext from the opposition and I’m trying to crack it, on some level what matters is whether or not I know their session key. It doesn’t matter whether that session key is 10 microseconds old, or 10 minutes old, or 10 days old. If I don’t have any information about it, I don’t have any information about it, and that’s all that need be said.

On the other hand, if I’ve intercepted a stream of messages and extracted partial information from them (via a partial break of the cryptosystem), the opposition would be well advised to “re-shuffle the deck” i.e. choose new session keys on a timescale fast compared to my ability to extract information about them.

Applying these ideas to a roomful of air: Typical sorts of measurements give us only a pathetically small amout of partial information about the microstate. So it really doesn’t matter whether the air re-arranges itself super-frequently or super-infrequently. We don’t have any significant amount of information about the microstate, and that’s all there is to the story.

Reference 20 presents a simulation that demonstrates the points discussed in this subsection.

8.6  Units of Entropy

Before we go any farther, convince yourself that

log10(x) = 
ln(x)
ln(10)
  
    0.434294 ln(x)
             (53)

and in general, multiplying a logarithm by some positive number corresponds to changing the base of the logarithm.

In the formula for entropy, equation 5, the base of the logarithm has intentionally been left unspecified. You get to choose a convenient base. This is the same thing as choosing what units will be used for measuring the entropy.

Some people prefer to express the units by choosing the base of the logarithm, while others prefer to stick with natural logarithms and express the units more directly, using an expression of the form:

S[P] := k 
 
i
 Pi ln(1/Pi)              (54)

where we have introduced an explicit prefactor k and fixed the logarithm to be base-e. Whereas equation 5 was arbitrary as to the base of the logarithm, equation 54 is arbitrary as to the choice of k. Either way, the meaning is the same.

Unit of S Prefactor (k) Name Concise form
J/K kB = 1.3806504(24)×1023 Boltzmann’s constant S[P] := kB ∑i Pi ln(1/Pi)
trit 1/ln(3) S[P] := ∑i Pi log3(1/Pi)
nat 1 S[P] := ∑i Pi ln(1/Pi)
bit 1/ln(2) S[P] := ∑i Pi log2(1/Pi)
Table 1: Units of Entropy and Associated Prefactors

It must be emphasized that all these expresions for S are mathematically equivalent. In each case, the choice of prefactor and choice of base for the logarithm balances the choice of units, so that the meaning remains unchanged.

Note that when measuring entropy in bits, base-2 logarithms must be used in equation 54. Similarly, the conventional meaning of Boltzmann’s constant assumes that base-e logarithms will be used. Switching from base-2 to base-e introduces a factor of ln(2), which is dimensionless and easy to overlook.

When dealing with smallish amounts of entropy, units of bits are conventional and often convenient. When dealing with large amounts of entropy, units of J/K are conventional and often convenient. These are related as follows:

1 J/K = 1.04×1023 bits
  1 bit = 9.57×10−24 J/K
             (55)

A convenient unit for molar entropy is Joules per Kelvin per mole:

1 J/K/mol = 0.17 bit/particle
1 bit/particle = 5.76 J/K/mol = R ln(2)
             (56)

Values in this range (on the order of one bit per particle) are very commonly encountered.

If you are wondering whether equation 56 is OK from a dimensional-analysis point of view, fear not. Temperature units are closely related to energy units. Specifically, energy is extensive and measured in J, while temperature is intensive and measured in K. Therefore combinations such as (J/K/mol) are dimensionless units. A glance at the dimensions of the ideal gas law should suffice to remind you of this if you ever forget.

See reference 21 for more about dimensionless units.

8.7  Probability versus Multiplicity

Let us spend a few paragraphs discussing a strict notion of multiplicity, and then move on to a more nuanced notion. (We also discuss the relationship between an equiprobable distribution and a microcanonical ensemble.)

8.7.1  Exactly Equiprobable

Suppose we have a system where a certain set of states13 (called the “accessible” states) are equiprobable, i.e. Pi = 1/W for some constant W. Furthermore, all remaining states are “inaccessible” which means they all have Pi = 0. The constant W is called the multiplicity.

Note: Terminology: The W denoting multiplicity in this section is unrelated to the W denoting work elsewhere in this document. Both usages of W are common in the literature. It is almost always obvious from context which meaning is intended, so there isn’t a serious problem. Some of the literature uses Ω to denote multiplicity.

The probability per state is necessarily the reciprocal of the number of accessible states, since (in accordance with the usual definition of “probability”) we want our probabilities to be normalized: ∑ Pi = 1.

In this less-than-general case, the entropy (as given by equation 5) reduces to

S = logW              (57)

As usual, you can choose the base of the logarithm according to what units you prefer for measuring entropy: bits, nats, trits, J/K, or whatever. Equivalently, you can fix the base of the logarithm and express the units by means of a factor of k out front, as discussed in section 8.6:

S = k lnW              (58)

There are various ways a system could wind up with equiprobable states:

Consider two blocks of copper that are identical except that one of them has more energy than the other. They are thermally isolated from each other and from everything else. The higher-energy block will have a greater number of accessible states, i.e. a higher multiplicity. In this way you can, if you wish, define a notion of multiplicity as a function of energy level.

Terminology: By definition, a level is a group of microstates. An energy level is a group of microstates all with the same energy (or nearly the same energy, relative to other energy-scales in the problem). By connotation, usually when people speak of a level they mean energy level.

8.7.2  Approximately Equiprobable

We now introduce a notion of “approximate” equiprobability and “approximate” multiplicity by reference to the example in the following table:

Level   # microstatesProbability Probability Entropy
   in level of microstate of level (in bits)
1    2 0.01 0.020 0.133
2    979 0.001 0.989 9.757
3    1,000,000 1E-09 0.001 0.030
Total:   1,000,981 1.000 9.919

The system in this example 1,000,981 microstates, which we have grouped into three levels. There are a million states in level 3, each of which occurs with probability one in a billion, so the probability of observing some state from this level is one in a thousand. There are only two microstates in level 1, each of which is observed with a vastly larger probability, namely one in a hundred. Level 2 is baby-bear just right. It has a moderate number of states, each with a moderate probability ... with the remarkable property that on a level-by-level basis, this level dominates the probability distribution. The probability of observing some microstate from level 2 is nearly 100%.

The bottom line is that the entropy of this distribution is 9.919 bits, which is 99.53% of the entropy you would have if all the probability were tied up in 1000 microstates with probability 0.001 each.

Beware of some overloaded terminology:

In the table, the column we have labelled “# microstates in level” is conventionally called the multiplicity of the level.   If we apply the S = log(W) formula in reverse, we find that our example distribution has a multiplicity of W = 2S = 29.919 = 968; this is the effective multiplicity of the distribution as a whole.

So we see that the effective multiplicity of the distribution is dominated by the multiplicity of level 2. The other levels contribute very little to the entropy.

You have to be careful how you describe the microstates in level 2. Level 2 is the most probable level (on a level-by-level basis), but its microstates are not the most probable microstates (on a microstate-by-microstate basis).

In the strict notion of multiplicity, all the states that were not part of the dominant level were declared “inaccessible”, but alas this terminology becomes hopelessly tangled when we progress to the nuanced notion of multiplicity. In the table, the states in level 3 are high-energy states, and it might be OK to say that they are energetically inaccessible, or “almost” inaccessible. It might be superficially tempting to label level 1 as also inaccessible, but that would not be correct. The states in level 1 are perfectly accessible; their only problem is that they are few in number.

I don’t know how to handle “accessibility” except to avoid the term, and to speak instead of “dominant” levels and “negligible” levels.

A system that is thermally isolated so that all microstates have the same energy is called microcanonical.   In contrast, an object in contact with a constant-temperature heat bath is called canonical (not microcanonical). Furthermore, a system that can exchange particles with a reservoir, as described by a chemical potential, is called grand canonical (not microcanonical or canonical).

The strict definition of multiplicity applies directly to microcanonical ensembles and other strictly equiprobable distributions. Equation 57 applies exactly to such systems.   Equation 57 does not apply exactly to canonical or grand-canonical systems, and may not apply even approximately. The correct thermal probability distribution is shown in figure 13.

There exist intermediate cases, which are common and often important. In a canonical or grand-canonical thermal system, we can get into a situation where the notion of multiplicity is a good approximation – not exact, but good enough. This can happen if the energy distribution is so strongly peaked near the most-probable energy that the entropy is very nearly what you would get in the strictly-equiprobable case. This can be roughly understood in terms of the behavior of Gaussians. If we combine N small Gaussians to make one big Gaussian, the absolute width scales like √N and the relative with scales like √N/N. The latter is small when N is large.

One should not attach too much importance to the tradeoff in the table above, namely the tradeoff between multiplicity (increasing as we move down the table) and per-microstate probability (decreasing as we move down the table). It is tempting to assume all thermal systems must involve a similar tradeoff, but they do not. In particular, at negative temperatures (as discussed in reference 19), it is quite possible for the lower-energy microstates to outnumber the higher-energy microstates, so that both multiplicity and per-microstate probability are decreasing as we move down the table toward higher energy.

You may resonably ask whether such a system might be unstable, i.e. whether the entire system might spontaneously move toward the high-energy high-probability high-multiplicity state. The answer is that such a move cannot happen because it would not conserve energy. In a thermally-isolated system, if half of the system moved to higher energy, you would have to “borrow” that energy from the other half, which would then move to lower energy, lower multiplicity, and lower probability per microstate. The overall probability of the system depends on the probability of the two halves taken jointly, and this joint probability would be unfavorable. If you want to get technical about it, stability does not depend on the increase or decrease of multiplicity as a function of energy, but rather on the convexity which measures what happens if you borrow energy from one subsystem and lend it to another.

8.8  Discussion

Some people are inordinately fond of equation 57 or equivalently equation 58. They are tempted to take it as the definition of entropy, and sometimes offer outrageously unscientific arguments in its support. But the fact remains that Equation 5 is an incomparably more general, more reliable expression, while equation 58 is a special case, a less-than-general corollary, a sometimes-acceptable approximation.

Specific reasons why you should not consider equation 57 to be axiomatic include:

  1. Theory says that you cannot exactly reconcile a Boltzmann probability distribution with an equiprobable distribution.
  2. In practice, equation 57 is usually not an acceptable approximation for small systems. Thermodynamics applies to small systems, but equation 57 usually does not.
  3. For large systems, even though equation 57 commonly leads to valid approximations for first-order quantities (e.g. energy, entropy, temperature, and pressure) ... it does not lead to valid results for second-order quantities such as fluctuations (energy fluctuations, temperature fluctuations, et cetera).

For a thermal distribution, the probability of a microstate is given by equation 52. So, even within the restricted realm of thermal distributions, equation 58 does not cover all the bases; it applies if and only if all the accessible microstates have the same energy. It is possible to arrange for this to be true, by constraining all accessible microstates to have the same energy. That is, it is possible to create a microcanonical system by isolating or insulating and sealing the system so that no energy can enter or leave. This can be done, but it places drastic restrictions on the sort of systems we can analyze.

8.9  Misconceptions about Spreading

Non-experts sometimes get the idea that whenever something is more dispersed – more spread out in position – its entropy must be higher. This is a mistake. Yes, there are scenarios where a gas expands and does gain entropy (such as isothermal expansion, or diffusive mixing as discussed in section 10.6) … but there are also scenarios where a gas expands but does not gain entropy (reversible thermally-isolated expansion).

As another example, consider two counter-rotating flywheels, as mentioned in section 8.3. In particular, imagine that these flywheels are annular in shape, so that to a good approximation, all the mass is at the rim, and every bit of mass is moving at the same speed. Also imagine that they are stacked on the same axis. Now let the two wheels rub together, so that friction causes them to slow down and heat up. Entropy has been produced, but the energy has not become more spread-out in space. In fact, just the opposite has occurred. As the entropy increased, the energy dispersal decreased, i.e. the energy became less evenly distributed in space. Under the initial conditions, the nonthermal rotational mechanical energy was evenly distributed, and the thermal energy was evenly distributed on a macroscopic scale, plus or minus small local thermal fluctuations. Afterward, the all the energy is in thermal form. It is still evenly distributed on a macroscopic scale, plus or minus thermal fluctuations, but the thermal fluctuations are now larger because the temperature is higher. Let’s be clear: If we ignore thermal fluctuations, the increase in entropy was accompanied by no change in the spatial distribution of energy, while if we include the fluctuations, the increase in entropy was accompanied by less even dispersal of the energy.

Here’s another reason why any attempt to define entropy in terms of “energy dispersal” or the like is Dead on Arrival: Entropy is defined in terms of probability, and applies to systems where the energy is zero, irrelevant, and/or undefinable.

As previously observed, states are states; they are not necessarily energy states.

Here’s a third reason: to the extent that it is possible to measure the degree of energy dispersal, it can be measured on a state-by-state basis. However, entropy is a property of the ensemble, not a property of any particular microstate. Therefore whatever “energy dispersal” is measuring, it’s not entropy. (A similar microstate versus macrostate argument applies to the “disorder” model of entropy, as discussed in section 2.5.4.)

8.10  Spreading in Probability Space

The spreading that we should pay attention to is the spreading of probabilities in probability-space.

Here’s a good example. This one can be analyzed in great detail. Figure 16 shows two blocks under three transparent cups. In the first scenario, the blocks are “concentrated” in the 00 state. In the probability histogram below the cups, there is unit probability (shown in magenta) in the 00 slot, and zero probability in the other slots, so p log(1/p) is zero everywhere. That means the entropy is zero.

In the next scenario, the blocks are spread out in position, but since we know exactly what state they are in, all the probability is in the 02 slot. That means p log(1/p) is zero everywhere, and the entropy is still zero.

In the third scenario, the system is in some randomly chosen state, namely the 21 state, which is as disordered and as random as any state can be, yet since we know what state it is, p log(1/p) is zero everywhere, and the entropy is zero.

The fourth scenario is derived from the third scenario, except that the cups are behind a screen. We can’t see the blocks right now, but we remember where they are. The entropy remains zero.

Finally, in the fifth scenario, we simply don’t know what state the blocks are in. The blocks are behind a screen, and have been shuffled since the last time we looked. We have some vague notion that on average, there is 2/3rds of a block under each cup, but that is only an average over many states. The probability histogram shows there is a 1-out-of-9 chance for the system to be in any of the 9 possible states, so ∑ p log(1/p) = log(9) .

cups-dispersion
Figure 16: Spreading vs. Randomness vs. Uncertainty

One point to be made here is that entropy is not defined in terms of particles that are spread out (“dispersed”) in position-space, but rather in terms of probability that is spread out in state-space. This is quite an important distinction. For more details on this, including an interactive simulation, see reference 20.

Entropy involves probability spread out in state-space,
• not necessarily anything spread out in position-space,
• not necessarily particles spread out in any space,
• not necessarily energy spread out in any space.
     

To use NMR language, entropy is produced on a timescale τ2, while energy-changes take place on a timescale τ1. There are systems where τ1 is huuugely longer than τ2. See also section 10.5.4 and figure 4. (If this paragraph doesn’t mean anything to you, don’t worry about it.)

As a way of reinforcing this point, consider a system of spins such as discussed in section 10.10. The spins change orientation, but they don’t change position at all. Their positions are locked to the crystal lattice. The notion of entropy doesn’t require any notion of position; as long as we have states, and a probability of occupying each state, then we have a well-defined notion of entropy. High entropy means the probability is spread out over many states in state-space.

State-space can sometimes be rather hard to visualize. As mentioned in section 2.3, a well-shuffled card deck has nearly 2226 bits of entropy … which is a stupendous number. If you consider the states of gas molecules in a liter of air, the number of states is even larger – far, far beyond what most people can visualize. If you try to histogram these states, you have an unmanageable number of slots (in contrast to the 9 slots in figure 16) with usually a very small probability in each slot.

Another point to be made in connection with figure 16 concerns the relationship between observing and stirring (aka mixing, aka shuffling). Here’s the rule:

 not looking looking
not stirring entropy constant entropy decreasing (aa)
stirring entropy increasing (aa) contest

where (aa) means almost always; we have to say (aa) because entropy can’t be increased by stirring if it is already at its maximum possible value, and it can’t be decreased by looking if it is already zero. Note that if you’re not looking, lack of stirring does not cause an increase in entropy. By the same token, if you’re not stirring, lack of looking does not cause a decrease in entropy. If you are stirring and looking simultaneously, there is a contest between the two processes; the entropy might decrease or might increase, depending on which process is more effective.

The simulation in reference 20 serves to underline these points.

9  Additional Fundamental Notions

9.1  Equilibrium

Feynman defined equilibrium to be “when all the fast things have happened but the slow things have not” (reference 22). That statement pokes fun at the arbitrariness of the split between “fast” and “slow” – but at the same time it is 100% correct and insightful. There is an element of arbitrariness in our notion of equilibrium. Over an ultra-long timescale, a diamond will turn into graphite. And in the ultra-short timescale, you can have non-equilibrium distributions of phonons rattling around inside a diamond crystal, such that it doesn’t make sense to talk about the temperature thereof. But usually we are interested in the intermediate timescale, long after the phonons have become thermalized but long before the diamond turns into graphite. During this intermediate timescale it makes sense to talk about the temperature of the diamond.

One should neither assume that equilibrium exists, nor that it doesn’t.

Diamond has a vast, clear-cut separation between the slow timescale and the fast timescale. Most intro-level textbook thermodynamics deal only with systems that have a clean separation.   In the real world, one often encounters cases where the separation of timescales is not so clean, and an element of arbitrariness is involved. The laws of thermodynamics can still be applied, but more effort and more care is required. See section 10.3 for a discussion.

The word equilibrium is quite ancient. The word has the same stem as the name of the constellation “Libra” — the scale. The type of scale in question is the two-pan balance shown in figure 17, which has been in use for at least 7000 years.

libra
Figure 17: Equilibrium — Forces in Balance

The notion of equilibrium originated in mechanics, long before thermodynamics came along. The compound word “equilibrium” translates literally as “equal balance” and means just that: everything in balance. In the context of mechanics, it means there are no unbalanced forces, as illustrated in the top half of figure 18.

Our definition of equilibrium applies to infinitely large systems, to microscopic systems, and to everything in between. This is important because in finite systems, there will be fluctuations even at equilibrium. See section 9.6 for a discussion of fluctuations and other finite-size effects.

9.2  Non-Equilibrium; Timescales

The idea of equilibrium is one of the foundation-stones of thermodynamics ... but any worthwhile theory of thermodynamics must also be able to deal with non-equilibrium situations.

Consider for example the familiar Carnot heat engine: It depends on having two heat reservoirs at two different temperatures. There is a well-known and easily-proved theorem that says at equilibrium, everything must be at the same temperature. Heat bath #1 may be internally in equilibrium with itself at temperature T1, and heat bath may be internally in equilibrium with itself at temperature T2, but the two baths cannot be in equilibrium with each other.

So we must modify Feynman’s idea. We need to identify a timescale of interest such that all the fast things have happened and the slow things have not. This timescale must be long enough so that certain things we want to be in equilibrium have come into equilibrium, yet short enough so that things we want to be in non-equilibrium remain in non-equilibrium.

Here’s another everyday example where non-equilibrium is important: sound. As you know, in a sound wave there will be some points where the air is compressed an other points, a half-wavelength away, where the air is expanded. For ordinary audible sound, this expansion occurs isentropically not isothermally. It you analyze the physics of sound using the isotermal compressibilty instead of the isentropic compressibility, you will get the wrong answer. Among other things, your prediction for the speed of sound will be incorrect. The first guy to analyze the physics of sound, Isaac Newton, made this mistake.

Again we invoke the theorem that says in equilbrium, the whole system must be at the same temperature. Since the sound wave is not isothermal, and cannot even be satisfactorily approximated as isothermal, we conclude that any worthwhile theory of thermodynamics must include non-equilibrium thermodynamics.

For a propagating wave, the time (i.e. period) scales like the distance (i.e. wavelength). In contrast, for diffusion and thermal conductivity, the time scales like distance squared. That means that for ultrasound, at high frequencies, a major contribution to the attentuation of the sound wave is thermal conduction between the high-temperature regions (wave crests) and the low-temperature regions (wave troughs). If you go even farther down this road, toward high thermal conductivity and short wavelength, you can get into a regime where sound is well approximated as isothermal. Both the isothermal limit and the isentropic limit have relatively low attenuation; the intermediate case has relatively high attentuation.

9.3  Efficiency; Timescales

Questions of efficiency are central to thermodynamics, and have been since Day One (reference 23).

For example in figure 4, if we try to extract energy from the battery very quickly, using a very low impedance motor, there will be a huge amount of power dissipated inside the battery, due to the voltage drop across the internal series resistor R1. On the other hand, if we try to extract energy from the battery very slowly, most of the energy will be dissipated inside the battery via the shut resistor R2 before we have a chance to extract it. So efficiency requires a timescale that is not too fast and not too slow.

Another example is the familiar internal combustion engine. It has a certain tach at which it works most efficiently. The engine is always nonideal because some of the heat of combustion leaks across the boundary into the cylinder block. Any energy that goes into heating up the cylinder block is unavailable for doing P DV work. This nonideality becomes more serious when the engine is turning over slowly. On the other edge of the same sword, when the engine is turning over all quickly, there are all sorts of losses due to friction in the gas, friction between the mechanical parts, et cetera. These losses increase faster than linearly as the tach goes up.

If you have gas in a cylinder with a piston and compress it slowly, you can (probably) treat the process as reversible. On the other hand, if you move the piston suddenly, it will stir the gas. This can be understood macroscopically in terms of sound radiated into the gas, followed by frictional dissipation of the sound wave (section 10.5.1). It can also be understood microscopically in terms of time-dependent perturbation theory; a sudden movement of the piston causes microstate transitions that would otherwise not have occurred (section 10.5.2).

Timescales matter.
     

9.4  Spontaneity and Irreversibility

Another of the great achievements of thermodynamics is the ability to understand what processes occur spontaneously (and therfore irreversibly) and what processes are reversible (and therefore non-spontaneous).

Therefore any theory of thermodynamics that considers only reversible processes – or which formulates its basic laws and concepts in terms of reversible processes – is severely crippled.

If you want to derive the rules that govern spontaneity and irreversibility, as is done in reference 24, you need to consider perturbations away from equilibrium. If you assume that the perturbed states are in equilibrium, the derivation is guaranteed to give the wrong answer.

In any reversible process, entropy is a conserved quantity. In the real world, entropy is not a conserved quantity.

If you start with a reversible-only equilibrium-only (ROEO) theory of thermodynamics and try to extend it to cover real-world situations, it causes serious conceptual difficulties. For example, consider an irreversible process that creates entropy from scratch in the interior of a thermally-isolated region. Then imagine trying to model it using ROEO ideas. You could try to replace the created entropy by entropy the flowed in from some fake entropy reservoir, but that would just muddy up the already-muddy definition of heat. Does the entropy from the fake entropy reservoir count as “heat”? The question is unanswerable. The “yes” answer is unphysical since it violates the requirement that the system is thermally isolated. The “no” answer violates the basic conservation laws.

Additional examples of irreversible processes that deserve our attention are discussed in sections 9.3, 10.5.1, 10.5.3, 10.5.4, and 10.6.

Any theory of reversible-only equilibrium-only thermodynamics is dead on arrival.

ROEO = DoA
     

9.5  Stability

If a system is in equilibrium, we can ask whether it has positive stability, neutral stability, or negative stability. The three possibilities are illustrated in the bottom half of figure 18.

eq-stab
Figure 18: Equilibrium and Stability

We define stability as follows: Starting from equilibrium conditions, we slightly perturb the system and observe what happens next.

The term “unstable” certainly applies to systems with negative stability. Alas there is no certainty as to whether it can also be applied to systems with neutral stability. Sometimes you hear people say that a neutrally stable system is neither stable nor unstable, which I find confusing. I recommend sticking to the precise terms: positive, neutral, or negative stability.

Tangential remark: In chemistry class you may have heard of “Le Chatelier’s principle”. Ever since Le Chatelier’s day there have been two versions of the “principle” ... and neither of them can be taken seriously:

This “principle” needs to be thrown out and replaced by two well-defined concepts, namely equilibrium and stability. (This is analogous to the way that “heat” needs to be thrown out and replaced by two well-defined concepts, namely energy and entropy, as discussed in section 16.1.)

9.6  Finite Size Effects

As we shall discuss, finite size effects can be categorized as follows (although there is considerable overlap among the categories):

We shall see that:

  1. In microscopic systems, finite-size effects dominate.
  2. In moderately-large systems, finite-size effects lead to smallish correction terms.
  3. In infinite systems, finite-size effects are negligible.

Let’s start with an example: The usual elementary analysis of sound in air considers only adiabatic changes in pressure and density. Such an analysis leads to a wave equation that is non-dissipative. In reality, we know that there is some dissipation. Physically the dissipation is related to transport of energy from place to place by thermal conduction. The amount of transport depends on wavelength, and is negligible in the hydrodynamic limit, which in this case means the limit of very long wavelengths.

We can come to the same conclusion by looking at things another way. The usual elementary analysis treats the air in the continuum limit, imagining that the gas consists of an infinite number density of particles each having infinitesimal size and infinitesimal mean free path. That’s tantamount to having no particles at all; the air is approximated as a continuous fluid. In such a fluid, sound would travel without dissipation.

So we have a macroscopic view of the situation (in terms of nonzero conductivity) and a microscopic view of the situation (in terms of quantized atoms with a nonzero mean free path). These two views of the situation are equivalent, because thermal conductivity is proportional to mean free path (for any given heat capacity and given temperature).

In any case, we can quantify the situation by considering the ratio of the wavelength Λ to the mean free path λ. Indeed we can think in terms of a Taylor series in powers of λ/Λ.

Let us now discuss fluctuations.

As an example, in a system at equilibrium, the pressure as measured by a very large piston will be essentially constant. Meanwhile, the pressure as measured by a very small piston will fluctuate. These pressure fluctuations are closely related to the celebrated Brownian motion.

Fluctuations are the rule, whenever you look closely enough and/or look at a small enough subsystem. There will be temperature fluctuations, density fluctuations, entropy fluctuations, et cetera.

We remark in passing that the dissipation of sound waves is intimately connected to the fluctuations in pressure. They are connected by the fluctuation / dissipation theorem, which is a corollary of the second law of thermodynamics.

There is magnificent discussion of fluctuations in Feynman volume I chapter 46 (“Ratchet and Pawl”). See reference 5.

As another example, consider shot noise. That is: in a small-sized electronic circuit, there will be fluctuations in the current, because the current is not carried by a continuous fluid but rather by electrons which have a quantized charge.

Let us now discuss boundary terms.

If you change the volume of a sample of compressible liquid, there is a well-known P dV contribution to the energy, where P is the pressure and V is the volume. There is also a τ dA contribution, where τ is the surface tension and A is the area.

A simple scaling argument proves that for very large systems, the P dV term dominates, whereas for very small systems the τ dA term dominates. For moderately large systems, we can start with the P dV term and then consider the τ dA term as a smallish correction term.

10  Experimental Basis

In science, questions are not decided by taking votes, or by seeing who argues the loudest or the longest. Scientific questions are decided by a careful combination of experiments and reasoning. So here are some epochal experiments that form the starting point for the reasoning presented here, and illustrate why certain other approaches are unsatisfactory.

10.1  Basic Notions of Temperature and Equilibrium

Make a bunch of thermometers. Calibrate them, to make sure they agree with one another. Use thermometers to measure each of the objects mentioned below.

10.2  Exponential Dependence on Energy

Here is a collection of observed phenomena that tend to support equation 52.

10.3  Metastable Systems with a Temperature

Consider an ordinary electrical battery. This is an example of a system where most of the modes are characterized by well-defined temperature, but there are also a few exceptional modes. Often such systems have an energy that is higher than you might have guessed based on the temperature and entropy, which makes them useful repositories of “available” energy.

Figure 19 shows two states of the battery, discharged (on the left) and charged (on the right). Rather that labeling the states by the subscript i as we have done in the past, we label them using a pair of subscripts i,j, where i takes on the values 0 and 1 meaning discharged and charged respectively, and j runs over the thermal phonon modes that we normally think of as embodying the heat capacity of an object.

ei-pi-batt
Figure 19: Probability versus Energy for a Battery

Keep in mind that probabilities such as Pi,j are defined with respect to some ensemble. For the discharged battery at temperature T, all members of the ensemble are in contact with a heat bath at temperature T. That means the thermal phonon modes can exchange energy with the heat bath, and different members of the ensemble will have different amounts of energy, leading to the probabilistic distribution of energies shown on the left side of figure 19. The members of the ensemble are not able to exchange electrical charge with the heat bath (or with anything else), so that the eight microstates corresponding to the charged macrostate have zero probability.

Meanwhile, on the right side of the figure, the battery is in the charged state. The eight microstates corresponding to the discharged macrostate have zero probability, while the eight microstates corresponding to the charged macrostate have a probability distribution of the expected Boltzmann form.

Comparing the left side with the right side of figure 19, we see that the two batteries have the same temperature. That is, the slope of log(Pi,j) versus Ei,j – for the modes that are actually able to contribute to the heat capacity – is the same for the two batteries.

You may be wondering how we can reconcile the following four facts: (a) The two batteries have the same temperature T, (b) the accessible states of the two batteries have different energies, indeed every accessible state of the charged battery has a higher energy than any accessible state of the discharged battery, (c) corresponding accessible states of the two batteries have the same probabilities, and (d) both batteries obey the Boltzmann law, Pi,j proportional to exp(−Ei,j/kT). The answer is that there is a bit of a swindle regarding the meaning of “proportional”. The discharged battery has one proportionality constant, while the charged battery has another. For details on this, see section 22.1.

Here is a list of systems that display this sort of separation between thermal modes and nonthermal modes:

(Section 10.4 takes another look at metastable systems.)

There are good reasons why we might want to apply thermodynamics to systems such as these. For instance, the Clausius-Clapeyron equation can tell us interesting things about a voltaic cell.

Also, just analyzing such a system as a Gedankenexperiment helps us understand a thing or two about what we ought to mean by “equilibrium”, “temperature”, “heat”, and “work”.

In equilibrium, the “accessible” states are supposed to be occupied in accordance with the Boltzmann distribution law (equation 52).

An example is depicted in figure 19, which is a scatter plot of Pi,j versus Ei,j.

As mentioned in section 9.1, Feynman defined equilibrium to be “when all the fast things have happened but the slow things have not” (reference 22). The examples listed at the beginning of this section all share the property of having two timescales and therefore two notions of equilibrium. If you “charge up” such a system you create a Boltzmann distribution with exceptions. There are not just a few exceptions like we saw in figure 14, but huge classes of exceptions, i.e. huge classes of microstates that are (in the short run, at least) inaccessible. If you revisit the system on longer and longer timescales, eventually the energy may become dissipated into the previously-inaccessible states. For example, the battery may self-discharge via some parasitic internal conduction path.

The idea of temperature is valid even on the shorter timescale. In practice, I can measure the temperature of a battery or a flywheel without waiting for it to run down. I can measure the temperature of a bottle of H2O2 without waiting for it to decompose.

These are all examples of a Boltzmann exponential distribution with exceptions, as discussed in section 8.2.

This proves that in some cases of interest, we cannot write the sytem energy E as a function of the macroscopic thermodynamic variables V and S. Remember, V determines the spacing between energy levels (which is the same in both figures) and S tells us something about the occupation of those levels, but alas S does not tell us everything we need to know. An elementary example of this can be seen by comparing figure 13 with figure 14, where we have the same V, the same S, and different E. So we must not assume E = E(V,S). A more spectacular example of this can be seen by comparing the two halves of figure 19.

Occasionally somebody tries to argue that the laws of thermodynamics do not apply to figure 14 or figure 19, on the grounds that thermodynamics requires strict adherence to the Boltzmann exponential law. This is a bogus argument for several reasons. First of all, strict adherence to the Boltzmann exponential law would imply that everything in sight was at the same temperature. That means we can’t have a heat engine, which depends on having two heat reservoirs at different temperatures. A theory of pseudo-thermodynamics that cannot handle exceptions to the Boltzmann exponential law is useless.

So we must allow some exceptions to the Boltzmann exponential law … maybe not every imaginable exception, but some exceptions. A good criterion for deciding what sort of exceptions to allow is to ask whether it is operationally possible to measure the temperature. For example, in the case of a storage battery, it is operationally straightforward to design a thermometer that is electrically insulated from the exceptional mode, but thermally well connected to the thermal modes.

Perhaps the most important point is that equation 3 and equation 4 apply directly, without modification, to the situations listed at the beginning of this section. So from this point of view, these situations are not “exceptional” at all.

The examples listed at the beginning of this section raise some other basic questions. Suppose I stir a large tub of water. Have I done work on it (w) or have I heated it (q)? If the question is answerable at all, the answer must depend on timescales and other details. A big vortex can be considered a single mode with a huge amount of energy, i.e. a huge exception to the Boltzmann distribution. But if you wait long enough the vortex dies out and you’re left with just an equilibrium distribution. Whether you consider this sort of dissipation to be q and/or heat is yet another question. (See section 6.9 and especially section 16.1 for a discussion of what is meant by “heat”.)

In cases where the system’s internal “spin-down” time is short to all other timescales of interest, we get plain old dissipative systems. Additional examples include:

10.4  Metastable Systems without a Temperature

An interesting example is:

In this case, it’s not clear how to measure the temperature or even define the temperature of the spin system. Remember that in equilibrium, states are supposed to be occupied with probability proportional to the Boltzmann factor, Pi ∝ exp(−Êi/kT). However, the middle microstate is more highly occupied than the microstates on either side, as depicted in figure 20. This situation is clearly not describable by any exponential, since exponentials are monotone.

ei-pi-3
Figure 20: Three-State System without a Temperature

We cannot use the ideas discussed in section 10.3 to assign a temperature to such a system, because it has so few states that we can’t figure out which ones are the thermal “background” and which ones are the “exceptions”.

Such a system does have an entropy – even though it doesn’t have a temperature, even though it is metastable, and even though it is grossly out of equilibrium. It is absolutely crucial that the system system have a well-defined entropy, for reasons suggested by figure 21. That is, suppose the system starts out in equilibrium, with a well-defined entropy S(1). It then passes through in intermediate state that is out of equilibrium, and ends up in an equilibrium state with entropy S(3). The law of paraconservation of entropy is meaningless unless we can somehow relate S(3) to S(1). The only reasonable way that can happen is if the intermediate state has a well-defined entropy. The intermediate state typically does not have a temperature, but it does have a well-defined entropy.

s-not-t
Figure 21: Non-Equilibrium: Well-Defined Entropy

10.5  Dissipative Systems

10.5.1  Sudden Piston : Sound

Consider the apparatus shown in figure 22. You can consider it a two-sided piston.

Equivalently you can consider it a loudspeaker in an unusual full enclosure. (Loudspeakers are normally only half-enclosed.) It is roughly like two unported speaker enclosures face to face, completely enclosing the speaker driver that sits near the top center, shown in red. The interior of the apparatus is divided into two regions, 1 and 2, with time-averaged properties (E1, S1, T1, P1, V1) and (E2, S2, T2, P2, V2) et cetera. When the driver (aka piston) moves to the right, it increase volume V1 and decreases volume V2. The box as a whole is thermally isolated / insulated / whatever. That is to say, no entropy crosses the boundary. No energy crosses the boundary except for the electricity feeding the speaker.

two-sided-piston
Figure 22: Two-Sided Piston

You could build a simplified rectangular version of this apparatus for a few dollars. It is considerably easier to build and operate than Rumford’s cannon-boring apparatus (section 10.5.3).

We will be primarily interested in a burst of oscillatory motion. That is, the piston is initially at rest, then oscillates for a while, and then returns to rest at the original position.

When the piston moves, it does F·dx work against the gas. There are two contributions. Firstly, the piston does work against the gas in each compartment. If P1 = P2 this contribution vanishes to first order in dV. Secondly, the piston does work against the pressure in the sound field.

The work done against the average pressure averages to zero over the course of one cycle of the oscillatory motion ... but the work against the radiation field does not average to zero. The dV is oscillatory but the field pressure is oscillatory too, and the product is positive on average.

The acoustic energy radiated into the gas is in the short term not in thermal equilibrium with the gas. In the longer term, the sound waves are damped i.e. dissipated by internal friction and also by thermal conductivity, at a rate that depends on the frequency and wavelength.

What we put in is F·dx (call it “work” if you wish) and what we get out in the long run is an increase in the energy and entropy of the gas (call it “heat” if you wish).

It must be emphasized that whenever there is appreciable energy in the sound field, it is not possible to write E1 as a function of V1 and S1 alone, or indeed to write E1 as a function of any two variables whatsoever. In general, the sound creates a pressure P(r) that varies from place to place as a function of the position-vector r. That’s why we call it a sound field; it’s a scalar field, not a simple scalar.

As a consequence, when there is appreciable energy in the sound field, it is seriously incorrect to expand dE = T dSP dV. The correct expansion necessarily has additional terms on the RHS. Sometimes you can analyze the sound field in terms of its normal modes, and in some simple cases most of the sound energy resides in only a few of the modes, in which case you need only a few additional variables. In general, though, the pressure can vary from place to place in an arbitrarily complicated way, and you would need an arbitrarily large number of additional variables. This takes us temporarily outside the scope of ordinary thermodynamics, which requires us to describe the macrostate as a function of some reasonably small number of macroscopic variables. The total energy, total entropy, and total volume are still perfectly well defined, but they do not suffice to give a complete description of what is going on. After we stop driving the piston, the sound waves will eventually dissipate, whereupon we will once again be able to describe the system in terms of a few macroscopic variables.

If the piston moves slowly, very little sound will be radiated and the process will be essentially isentropic and reversible. On the other hand, if the piston moves quickly, there will be lots of sound, lots of dissipation, and lots of newly created entropy. This supports the point made in section 9.2: timescales matter.

At no time is any entropy transferred across the boundary of the region. The increase in entropy of the region is due to new entropy, created from scratch in the interior of the region.

If you want to ensure the gas exerts zero average force on the piston, you can cut a small hole in the baffle near point b. Then the only work the piston can do on the gas is work against the sound pressure field. There is no longer any important distinction between region 1 and region 2.

You can even remove the baffle entirely, resulting in the “racetrack” apparatus shown in figure 23.

racetrack-piston
Figure 23: Racetrack with Piston

The kinetic energy of the piston is hardly worth worrying about. When we say it takes more work to move the piston rapidly than slowly, the interesting part is the work done on the gas, not the work done to accelerate the piston. Consider a very low-mass piston if that helps. Besides, whatever KE goes into the piston is recovered at the end of each cycle. Furthermore, it is trivial to calculate the F·dx of the piston excluding whatever force is necessary to accelerate the piston. Let’s assume the experimenter is clever enough to apply this trivial correction, so that we know, moment by moment, how much F·dx “work” is being done on the gas. This is entirely conventional; the conventional pressures P1 and P2 are associated with the forces F1 and F2 on the faces of the piston facing the gas, not the force Fd that is driving the piston. To relate Fd to F1 and F2 you would need to consider the mass of the piston, but if you formulate the problem in terms of F1·dx and F2·dx, as you should, questions of piston mass and piston KE should hardly even arise.

10.5.2  Sudden Piston : State Transitions

Let’s forget about all the complexities of the sound field discussed in section 10.5.1. Instead let’s take the quantum mechanical approach. Let’s simplify the gas down to a single particle, the familiar particle in a box, and see what happens.

As usual, we assume the box is rigid and thermally isolated / insulated / whatever. No entropy flows across the boundary of the box. Also, no energy flows across the boundary except for the work done by the piston.

Since we are interested in entropy, it will not suffice to talk about “the” quantum state of the particle. The entropy of any particular quantum state (microstate) is zero. We can however represent the thermodynamic state (macrostate) using a density matrix ρ. For some background on density matrices in the context of thermodynamics, see section 25.

The entropy is given by equation 142. which is the gold-standard most-general definition of entropy; in the classical limit it reduces to the familiar workhorse expression equation 5

For simplicity we consider the case where the initial state is a pure state, i.e. a single microstate. That means the initial entropy is zero, as you can easily verify. Hint: equation 142 is particularly easy to evaluate in a basis where ρ is diagonal.

Next we perturb our particle-in-a-box by moving one wall of the box inward. We temporarily assume this is done in such a way that the particle ends up in the “same” microstate. That is, the final state is identical to the original quantum state except for the shorter wavelength as required to fit into the smaller box. It is a straightforward yet useful exercise to show that this does P dV “work” on the particle. The KE of the new state is higher than the KE of the old state.

Now the fun begins. We retract the previous assumption about the final state; instead we calculate the final macrostate using perturbation theory. In accordance with Fermi’s golden rule we calculate the overlap integral between the original quantum state (original wavelength) and each of the possible final quantum states (slightly shorter wavelength).

Each member of the original set of basis wavefunctions is orthogonal to the other members. The same goes for the final set of basis wavefunctions. However, each final basis wavefunction is only approximately orthogonal to the various original basis wavefunctions. So the previous assumption that the particle would wind up in the corresponding state is provably not quite true; when we do the overlap integrals there is always some probability of transition to nearby states.

It is straightforward to show that if the perturbation is slow and gradual, the corresponding state gets the lion’s share of the probability. Conversely, if the perturbation is large and sudden, there will be lots of state transitions. The final state will not be a pure quantum state. It will be a mixture. The entropy will be nonzero, i.e. greater than the initial entropy.

To summarize:

       slow and gradual=⇒isentropic, non dissipative
      sudden=⇒dissipative

So we are on solid grounds when we say that in a thermally isolated cylinder, a gradual movement of the piston is isentropic, while a sudden movement of the piston is dissipative. Saying that the system is adiabatic in the sense of thermally insulated does not suffice to make it adiabatic in the sense of isentropic.

Note that in the quantum mechanics literature the slow and gradual case is conventionally called the “adiabatic” approximation in contrast to the “sudden” approximation. These terms are quite firmly established ... even though it conflicts with the also well-established convention in other branches of physics where “adiabatic” means thermally insulated; see next message.

There is a nice introduction to the idea of “radiation resistance” in reference 5 chapter 32.

10.5.3  Rumford’s Experiment

Benjamin Thompson (Count Rumford) did some experiments that were published in 1798. Before that time, people had more-or-less assumed that “heat” by itself was conserved. Rumford totally demolished this notion, by demonstrating that unlimited amounts of “heat” could be produced by nonthermal mechanical means. Note that in this context, the terms “thermal energy”, “heat content”, and “caloric” are all more-or-less synonymous ... and I write each of them in scare quotes.

From the pedagogical point of view Rumford’s paper is an optimal starting point; the examples in section 10.5.1 and section 10.5.2 are probably better. For one thing, a microscopic understanding of sound and state-transitions in a gas is easier than a microscopic understanding of metal-on-metal friction.

Once you have a decent understanding of the modern ideas, you would do well to read Rumford’s original paper, reference 25. The paper is of great historical importance. It is easy to read, informative, and entertaining. On the other hand, beware that it contains at least one major error, plus one trap for the unwary:

The main point of the paper is that “heat” is not conserved. This point remains true and important. The fact that the paper has a couple of bugs does not detract from this point.

You should reflect on how something can provide valuable (indeed epochal) information and still not be 100% correct.

All too often, the history of science is presented as monotonic “progress” building one pure “success” upon another, but this is not how things really work. In fact there is a lot of back-tracking out of dead ends. Real science and real life are like football, in the sense that any play that advances the ball 50 or 60 yards it is a major accomplishment, even if you get knocked out of bounds before reaching the ultimate goal. Winning is important, but you don’t need to win the entire game, single handedly, the first time you get your hands on the ball.

Rumford guessed that all the heat capacity was associated with “motion” – because he couldn’t imagine anything else. It was a process-of-elimination argument, and he blew it. This is understandable, given what he had to work with.

A hundred years later, guys like Einstein and Debye were able to cobble up a theory of heat capacity based on the atomic model. We know from this model that the heat capacity of solids is half kinetic and half potential. Rumford didn’t stand much of a chance of figuring this out.

It is possible to analyze Rumford’s experiment without introducing the notion of “heat content”. It suffices to keep track of the energy and the entropy. The energy can be quantified by using the first law of thermodynamics, i.e. the conservation of energy. We designate the cannon plus the water bath as the “system” of interest. We know how much energy was pushed into the system, pushed across the boundary of the system, in the form of macroscopic mechanical work. We can quantify the entropy by means of equation 33, i.e. dS=(1/T)dE at constant pressure. Energy and entropy are functions of state, even in situations where “heat content” is not.

Heat is a concept rooted in cramped thermodynamics, and causes serious trouble if you try to extend it to uncramped thermodynamics. Rumford got away with it, in this particular context, because he stayed within the bounds of cramped thermodynamics. Specifically, he did everything at constant pressure. He used the heat capacity of water at constant pressure as his operational definition of heat content.

To say the same thing the other way, if he had strayed off the contour of constant P, perhaps by making little cycles in the PV plane, using the water as the working fluid in a heat engine, any notion of “heat content” would have gone out the window. There would have been an unholy mixture of CP and CV, and the “heat content” would have not been a function of state, and everybody would have been sucked down the rabbit-hole into crazy-nonsense land.

We note in passing that it would be impossible to reconcile Rumford’s notion of “heat” with the various other notions listed in section 16.1 and section 17.1. For example: work is being done in terms of energy flowing across the boundary, but no work is being done in terms of the work/KE theorem, since the cannon is not accelerating.

For more about the difficulties in applying the work/KE theorem to thermodynamic questions, see reference 14.

We can begin to understand the microscopics of sliding friction using many of the same ideas as in section 10.5.1. Let’s model friction in terms of asperities on each metal surface. Each of the asperities sticks and lets go, sticks and lets go. When it lets go it wiggles and radiates ultrasound into the bulk of the metal. This produces in the short term a nonequilibrium state due to the sound waves, but before long the sound field dissipates, depositing energy and creating entropy in the metal.

Again, if you think in terms only of the (average force) dot (average dx) you will never understand friction or dissipation. You need to model many little contributions of the form (short term force) dot (short term dx) and then add up all the contributions. This is where you see the work being done against the radiation field.

At ordinary temperatures (not too hot and not too cold) most of the heat capacity in a solid is associated with the phonons. Other phenomena associated with friction, including deformation and abrasion of the materials, are only very indirectly connected to heating. Simply breaking a bunch of bonds, as in cleaving a crystal, does not produce much in the way of entropy or heat. At some point, if you want to understand heat, you need to couple to the phonons.

10.5.4  Flywheels with Oil Bearing

Here is a modified version of Rumford’s experiment, more suitable for quantitative analysis. Note that reference 26 carries out a similar analysis and reaches many of the same conclusions. Also note that this can be considered a macroscopic mechanical analog of the NMR τ2 process, where there is a change in entropy with no change in energy. See also figure 4.

Suppose we have an oil bearing as shown in figure 24. It consists of an upper plate and a lower plate, with a thin layer of oil between them. Each plate is a narrow annulus of radius R. The lower plate is held stationary. The upper plate rotates under the influence of a force F, applied via a handle as shown. The upper plate is kept coaxial with the lower plate by a force of constraint, not shown. The two forces combine to create a pure torque, τ = F/R. The applied torque τ is balanced in the long run by a frictional torque τ′; specifically

⟨ τ ⟩ =   ⟨ τ′ ⟩              (60)

where ⟨ … ⟩ denotes a time-average. As another way of saying the same thing, in the long run the upper plate settles down to a more-or-less steady velocity.

bearing
Figure 24: Oil Bearing

We arrange that the system as a whole is thermally insulated from the environment, to a sufficient approximation. This includes arranging that the handle is thermally insulating. In practice this isn’t difficult.

We also arrange that the plates are somewhat thermally insulating, so that heat in the oil doesn’t immediately leak into the plates.

Viscous dissipation in the oil causes the oil to heat up. To a good approximation this is the only form of dissipation we must consider.

In an infinitesimal period of time, the handle moves through a distance dx or equivalently through an angle dθ = dx/R. We consider the driving force F to be a controlled variable. We consider θ to be an observable dependent variable. The relative motion of the plates sets up a steady shearing motion within the oil. We assume the oil forms a sufficiently thin layer and has sufficiently high viscosity that the flow is laminar (i.e. non-turbulent) everywhere. We say the fluid has a very low Reynolds number (but if you don’t know what that means, don’t worry about it). The point is that the velocity of the oil follows the simple pattern shown by the red arrows in figure 25.

shear
Figure 25: Shear: Velocity Field in the Oil

The local work done on the handle by the driving force is w = Fdx or equivalently w = τdθ. This tells us how much energy is flowing across the boundary of the system. From now on we stop talking about work, and instead talk about energy, confident that energy is conserved.

We can keep track of the energy-content of the system by integrating the energy inputs. Similarly, given the initial entropy and the heat capacity of the materials, we can predict the entropy at all times14 by integrating equation 26. Also given the initial temperature and heat capacity, we can predict the temperature at all times by integrating equation 25. We can then measure the temperature and compare it with the prediction.

We can understand the situation in terms of equation 3. Energy τdθ comes in via the handle. This energy cannot be stored as potential energy within the system. This energy also cannot be stored as macroscopic or mesoscopic kinetic energy within the system, since at each point the velocity is essentially constant. By a process of elimination we conclude that this energy accumulates inside the system in microscopic form.

This gives us a reasonably complete description of the thermodynamics of the oil bearing.

This example is simple, but helps make a very important point. If you base your thermodynamics on wrong foundations, it will get the wrong answer when applied to dissipative systems such as fluids, brakes, grindstones, et cetera. Some people try to duck this problem this by narrowing their definition of “thermodynamics” so severely that it has nothing to say (right or wrong) about dissipative systems. Making no predictions is a big improvement over making wrong predictions … but still it is a terrible price to pay. Real thermodynamics has tremendous power and generality. Real thermodynamics applies just fine to dissipative systems. See section 20 for more on this.

10.5.5  Misconceptions : Heat

There are several correct ways of analyzing the oil-bearing system, one of which was presented in section 10.5.4. In addition, there are innumerably many incorrect ways of analyzing things. We cannot list all possible misconceptions, let alone discuss them all. However, it seems worthwhile to point out some of the most prevalent pitfalls.

You may have been taught to think of heating as thermal energy transfer across a boundary. That’s definition #3 in section 16.1. That’s fine provided you don’t confuse it with definition #2 (TdS).

The oil bearing serves as a clear illustration of the difference between heat-flow and heat-TdS. This is an instance of boundary/interior inconsistency, as discussed in section 15.

No heat is flowing into the oil. The oil is hotter than its surroundings, so if there is any heat-flow at all, it flows outward from the oil.   The TdS/dt is strongly positive. The entropy of the oil is steadily increasing.

Another point that can be made using this example is that the laws of thermodynamics apply just fine to dissipative systems. Viscous damping has a number of pedagogical advantages relative to (say) the sliding friction in Rumford’s cannon-boring experiment. It’s clear where the dissipation is occurring, and it’s clear that the dissipation does not prevent us from assigning a well-behaved temperature to each part of the apparatus. Viscous dissipation is more-or-less ideal in the sense that it does not depend on submicroscopic nonidealities such as the asperities that are commonly used to explain solid-on-solid sliding friction.

10.5.6  Misconceptions : Work

We now discuss some common misconceptions about work.

Work is susceptible to boundary/interior inconsistencies for some of the same reasons that heat is.

You may have been taught to think of work as an energy transfer across a boundary. That’s one of the definitions of work discussed in section 17.1. It’s often useful, and is harmless provided you don’t confuse it with the other definition, namely PdV.

Work-flow is the “work” that shows up in the principle of virtual work (reference 27), e.g. when we want to calculate the force on the handle of the oil bearing.   Work-PdV is the “work” that shows up in the work/KE theorem.

10.5.7  Remarks

This discussion has shed some light on how equation 16 can and cannot be interpreted.

In all cases, the equation should not be considered the first law of thermodynamics, because it is inelegant and in every way inferior to a simple, direct statement of local conservation of energy.

10.6  The Gibbs Gedankenexperiment

As shown in figure 26, suppose we have two moderate-sized containers connected by a valve. Initially the valve is closed. We fill one container with an ideal gas, and fill the other container with a different ideal gas, at the same temperature and pressure. When we open the valve, the gases will begin to mix. The temperature and pressure will remain unchanged, but there will be an irreversible increase in entropy. After mixing is complete, the molar entropy will have increased by Rln2.

gibbs
Figure 26: The Gibbs Gedankenexperiment

As Gibbs observed,15 the Rln2 result is independent of the choice of gases, “… except that the gases which are mixed must be of different kinds. If we should bring into contact two masses of the same kind of gas, they would also mix, but there would be no increase of entropy”.

There is no way to explain this in terms of 19th-century physics. The explanation depends on quantum mechanics. It has to do with the fact that one helium atom is identical (absolutely totally identical) with another helium atom.

Also consider the following contrast:

In figure 26, the pressure on both sides of the valve is the same. There is no net driving force. The process proceeds by diffusion, not by macroscopic flow.   This contrasts with the scenario where we have gas on one side of the partition, but vacuum on the other side. This is dramatically different, because in this scenario there is a perfectly good 17th-century dynamic (not thermodynamic) explanation for why the gas expands: there is a pressure difference, which drives a flow of fluid.

Entropy drives the process. There is no hope of extracting energy from the diffusive mixing process.   Energy drives the process. We could extract some of this energy by replacing the valve by a turbine.

The timescale for free expansion is roughly L/c, where L is the size of the apparatus, and c is the speed of sound. The timescale for diffusion is slower by a huge factor, namely by a factor of L/λ, where λ is the mean free path in the gas.

Pedagogical note: The experiment in figure 26 is not very exciting to watch. Here’s an alternative: Put a drop or two of food coloring in a beaker of still water. The color will spread throughout the container, but only rather slowly. This allows students to visualize a process driven by entropy, not energy.

Actually, it is likely that most of the color-spreading that you see is due to convection, not diffusion. To minimize convection, try putting the water in a tall, narrow glass cylinder, and putting it under a Bell jar to protect it from drafts. Then the spreading will take a very long time indeed.

Beware: Diffusion experiments of this sort are tremendously valuable if explained properly … but they are horribly vulernable to misinterpretation if not explained properly, for reasons discussed in section 8.10.

For a discussion of the microscopic theory behind the Gibbs mixing experiments, see section 23.2.

10.7  Spin Echo Experiment

It is possible to set up an experimental situation where there are a bunch of nuclei whose spins appear to be oriented completely at random, like a well-shuffled set of cards. However, if I let you in on the secret of how the system was prepared, you can, by using a certain sequence of Nuclear Magnetic Resonance (NMR) pulses, get all the spins to line up – evidently a very low-entropy configuration.

The trick is that there is a lot of information in the lattice surrounding the nuclei, something like 1023 bits of information. I don’t need to communicate all this information to you explicitly; I just need to let you in on the secret of how to use this information to untangle the spins.

The ramifications and implications of this are discussed in section 11.7.

10.8  Melting

Take a pot of ice water. Add energy to it via friction, à la Rumford, as described in section 10.5.3. The added energy will cause the ice to melt. The temperature of the ice water will not increase, not until all the ice is gone.

This illustrates the fact that temperature is not the same as thermal energy. It focuses our attention on the entropy. A gram of liquid water has more entropy than a gram of ice. So at any given temperature, a gram of water has more energy than a gram of ice.

The following experiment makes an interesting contrast.

10.9  Isentropic Expansion and Compression

Take an ideal gas acted upon by a piston. For simplicity, assume a nonrelativistic nondegenerate ideal gas, and assume the sample is small on the scale of kT/mg. Assume everything is thermally insulated, so that no energy enters or leaves the system via thermal conduction. Gently retract the piston, allowing the gas to expand. The gas cools as it expands. In the expanded state,

Before the expansion, the energy in question (ΔE) was in microscopic Locrian form, within the gas.   After the expansion, this energy is in macroscopic non-Locrian form, within the mechanism that moves the piston.

This scenario illustrates some of the differences between temperature and entropy, and some of the differences between energy and entropy.

Remember, the second law of thermodynamics says that the entropy obeys a local law of paraconservation. Be careful not to misquote this law.

It doesn’t say that the temperature can’t decrease. It doesn’t say that the thermal energy can’t decrease.   It says the entropy can’t decrease in any given region of space, except by flowing into adjacent regions.

Energy is conserved. That is, it cannot increase or decrease except by flowing into adjacent regions. (You should not imagine that there is any law that says “thermal energy” by itself is conserved.)

If you gently push the piston back in, compressing the gas, the temperature will go back up.

Isentropic compression is an increase in temperature at constant entropy. Melting (section 10.8) is an increase in entropy at constant temperature. These are two radically different ways of increasing the energy.

10.10  Demagnetization Refrigerator

Attach a bar magnet to a wooden board so that it is free to pivot end-over-end. This is easy; get a metal bar magnet and drill a hole in the middle, then nail it loosely to the board. Observe that it is free to rotate. You can imagine that if it were smaller and more finely balanced, thermal agitation would cause it to rotate randomly back and forth forever.

Now hold another bar magnet close enough to ruin the free rotation, forcing the spinner to align with the imposed field.

This is a passable pedagogical model of part of a demagnetization refrigerator. There is current work aimed at using this effect to produce refrigeration in the neighborhood of zero centigrade; see e.g. reference 28. Heretofore, however, the main use of this principle has been to produce much lower temperatures, in the neighborhood of zero kelvin (i.e. microkelvins or less). Copper nuclei can be used as the spinners.

It is worthwhile to compare theory to experiment:

These s values have a firm theoretical basis. They require little more than counting. We count microstates and apply the definition of entropy. Then we obtain Δs by simple subtraction.   Meanwhile, Δs can also obtained experimentally, by observing the classical macroscopic thermodynamic behavior of the refrigerator.

Both ways of obtaining Δs give the same answer. What a coincidence! This answers the question about how to connect microscopic state-counting to macroscopic thermal behavior. The Shannon entropy is not merely analogous to the thermodynamic entropy; it is the thermodynamic entropy.

Spin entropy is discussed further in section 11.3.

10.11  Thermal Insulation

As a practical technical matter, it is often possible to have a high degree of thermal insulation between some objects, while other objects are in vastly better thermal contact.

For example, if we push on an object using a thermally-insulating stick, we can transfer energy to the object, without transferring much entropy. In contrast, if we push on a hot object using a non-insulating stick, even though we impart energy to one or two of the object’s modes by pushing, the object could be losing energy overall, via thermal conduction through the stick.

Similarly, if you try to build a piece of thermodynamic apparatus, such as an automobile engine, it is essential that some parts reach thermal equilibrium reasonably quickly, and it is equally essential that other parts do not reach equilibrium on the same timescale.

11  More About Entropy

11.1  Microstate versus Macrostate

Beware: In the thermodynamics literature, the word “state” is used with two inconsistent meanings. It could mean either microstate or macrostate.

In a system such as a deck of cards, the microstate is specified by saying exactly which card is on top, exactly which card is in the second position, et cetera.   The macrostate is the ensemble of all card decks consistent with what we know about the situation.

In a system such as a cylinder of gas, a microstate is a single fully-specified quantum state of the gas.   For a gas, the macrostate is specified by macroscopic variables such as the temperature, density, and pressure.

     In general, a macrostate is an equivalence class, i.e. a set containing some number of microstates (usually many, many microstates).

In the context of quantum mechanics, state always means microstate.   In the context of classical thermodynamics, state always means macrostate, for instance in the expression “function of state”.

The idea of microstate and the idea of macrostate are both quite useful. The problem arises when people use the word “state” as shorthand for one or both. You can get away with state=microstate in introductory quantum mechanics (no thermo), and you can get away with state=macrostate in introductory classical thermo (no quantum mechanics) … but there is a nasty collision as soon as you start doing statistical mechanics, which sits astride the interface between QM and thermo.

In this document, the rule is that state means microstate, unless the context requires otherwise.   When we mean macrostate, we explicitly say macrostate or thermodynamic state. The idiomatic expression “function of state” necessarily refers to macrostate.

See section 19 for a discussion of other inconsistent terminology.

11.2  Phase Space

As mentioned in section 2.5.1, our notion of entropy is completely dependent on having a notion of microstate, and on having a procedure for assigning probability to microstates.

For systems where the relevant variables are naturally discrete, this is no problem. See section 2.2 and section 2.3 for examples involving symbols, and section 10.10 for an example involving real thermal physics.

We now discuss the procedure for dealing with continuous variables. In particular, we focus attention on the position and momentum variables.

It turns out that we must account for position and momentum jointly, not separately. That makes a lot of sense, as you can see by considering a harmonic oscillator with period τ: If you know the oscillator’s position at time t, you know know its momentum at time t+τ/4 and vice versa.

Figure 27 shows how this works, in the semi-classical approximation. There is an abstract space called phase space. For each position variable q there is a momentum variable p. (In the language of classical mechanics, we say p and q are dynamically conjugate, but if you don’t know what that means, don’t worry about it.)

phase-space
Figure 27: Phase Space

Area in phase space is called action. We divide phase space into cells of size h, where h is Planck’s constant, also known as the quantum of action. A system has zero entropy if it can be described as sitting in a single cell in phase space. If we don’t know exactly where the system sits, so that it must be described as a probability distribution in phase space, it will have some correspondingly greater entropy.

If you are wondering why each state has area h, as opposed to some other amount of area, see section 23.8.

If there are M independent position variables, there will be M momentum variables, and each microstate will be associated with a 2M-dimensional cell of size hM.

Using the phase-space idea, we can already understand, qualitatively, the entropy of an ideal gas in simple situations:

For a non-classical variable such as spin angular momentum, we don’t need to worry about conjugate variables. The spin is already discrete i.e. quantized, so we know how to count states … and it already has the right dimensions, since angular momentum has the same dimensions as action.

In section 2, we introduced entropy by discussing systems with only discrete states, namely re-arrangements of a deck of cards. We now consider a continuous system, such as a collection of free particles. The same ideas apply.

For each continuous variable, you can divide the phase space into cells of size h and then see which cells are occupied. In classical thermodynamics, there is no way to know the value of h; it is just an arbitrary constant. Changing the value of h changes the amount of entropy by an additive constant. But really there is no such arbitrariness, because “classical thermodynamics” is a contradiction in terms. There is no fully self-consistent classical thermodynamics. In modern physics, we definitely know the value of h, Planck’s constant. Therefore we have an absolute scale for measuring entropy.

As derived in section 23.2, there exists an explicit, easy-to-remember formula for the molar entropy of a monatomic three-dimensional ideal gas, namely the Sackur-Tetrode formula:

S/N
k
 = ln(
V/N
Λ3
) + 
5
2
             (61)

where S/N is the molar entropy, V/N is the molar volume, and Λ is the thermal de Broglie length, i.e.

Λ := 
(
2πℏ2
mkT
)              (62)

and if you plug this Λ into the Sackur-Tetrode formula you find the previously-advertised dependence on h3.

You can see directly from equation 113 that the more spread out the gas is, the greater its molar entropy. Divide space into cells of size Λ3, count how many cells there are per particle, and then take the logarithm.

The thermal de Broglie length Λ is very commonly called the thermal de Broglie wavelength, but this is something of a misnomer, because Λ shows up in a wide variety of fundamental expressions, usually having nothing to do with wavelength. This is discussed in more detail in reference 29.

11.3  Entropy in a Crystal; Phonons, Electrons, and Spins

Imagine a crystal of pure copper, containing only the 63Cu isotope. Under ordinary desktop conditions, most of the microscopic energy in the crystal takes the form of random potential and kinetic energy associated with vibrations of the atoms relative to their nominal positions in the lattice. We can find “normal modes” for these vibrations. This is the same idea as finding the normal modes for two coupled oscillators, except that this time we’ve got something like 1023 coupled oscillators. There will be three normal modes per atom in the crystal. Each mode will be occupied by some number of phonons.

At ordinary temperatures, almost all modes will be in their ground state. Some of the low-lying modes will have a fair number of phonons in them, but this contributes only modestly to the entropy. When you add it all up, the crystal has about 6 bits per atom of entropy in the thermal phonons at room temperature. This depends strongly on the temperature, so if you cool the system, you quickly get into the regime where thermal phonon system contains much less than one bit of entropy per atom.

There is, however, more to the story. The copper crystal also contains conduction electrons. They are mostly in a low-entropy state, because of the exclusion principle, but still they manage to contribute a little bit to the entropy, about 1% as much as the thermal phonons at room temperature.

A third contribution comes from the fact that each 63Cu nucleus can be be in one of four different spin states: +3/2, +1/2, -1/2, or -3/2. Mathematically, it’s just like flipping two coins, or rolling a four-sided die. The spin system contains two bits of entropy per atom under ordinary conditions.

You can easily make a model system that has four states per particle. The most elegant way might be to carve some tetrahedral dice … but it’s easier and just as effective to use four-sided “bones”, that is, parallelepipeds that are roughly 1cm by 1cm by 3 or 4 cm long. Make them long enough and/or round off the ends so that they never settle on the ends. Color the four long sides four different colors. A collection of such bones is profoundly analogous to a collection of copper nuclei. The which-way-is-up variable contributes two bits of entropy per bone, while the nuclear spin contributes two bits of entropy per atom.

In everyday situations, you don’t care about this extra entropy in the spin system. It just goes along for the ride. This is an instance of spectator entropy, as discussed in section 11.5.

However, if you subject the crystal to a whopping big magnetic field (Teslas) and get things really cold (milliKelvins), you can get the nuclear spins to line up. Each nucleus is like a little bar magnet, so it tends to align itself with the applied field, and at low-enough temperature the thermal agitation can no longer overcome this tendency.

Let’s look at the cooling process, in a high magnetic field. We start at room temperature. The spins are completely random. If we cool things a little bit, the spins are still completely random. The spins have no effect on the observable properties such as heat capacity.

As the cooling continues, there will come a point where the spins start to line up. At this point the spin-entropy becomes important. It is no longer just going along for the ride. You will observe a contribution to the heat capacity whenever the crystal unloads some entropy.

You can also use copper nuclei to make a refrigerator for reaching very cold temperatures, as discussed in section 10.10.

11.4  Entropy is Entropy

Some people who ought to know better try to argue that there is more than one kind of entropy.

Sometimes they try to make one or more of the following distinctions:

Shannon entropy.   Thermodynamic entropy.

Entropy of abstract symbols.   Entropy of physical systems.

Entropy as given by equation 5 or equation 142.   Entropy defined in terms of energy and temperature.

Small systems: 3 blocks with 53 states, or 52 cards with 52! states   Large systems: 1025 copper nuclei with 41025 states.

It must be emphasized that none of these distinctions have any value.

For starters, having two types of entropy would require two different paraconservation laws, one for each type. Also, if there exist any cases where there is some possibility of converting one type of entropy to the other, we would be back to having one overall paraconservation law, and the two type-by-type laws would be seen as mere approximations.

Also note that there are plenty of systems where there are two ways of evaluating the entropy. The copper nuclei described in section 10.10 have a maximum molar entropy of R ln(4). This value can be obtained in the obvious way by counting states, just as we did for the small, symbol-based systems in section 2. This is the same value that is obtained by macroscopic measurements of energy and temperature. What a coincidence!

Let’s be clear: The demagnetization refrigerator counts both as a small, symbol-based system and as a large, thermal system. Additional examples are mentioned in section 21.

11.5  Spectator Entropy

Suppose we define a bogus pseudo-entropy S′ as

S′ := S + K              (63)

for some arbitrary constant K. It turns out that in some (but not all!) situations, you may not be sensitive to the difference between S′ and S.

For example, suppose you are measuring the heat capacity. That has the same units as entropy, and is in fact closely related to the entropy. But we can see from equation 26 that the heat capacity is not sensitive to the difference between S′ and S, because the derivative on the RHS annihilates additive constants.

Similarly, suppose you want to know whether a certain chemical reaction will proceed spontaneously or not. That depends on the difference between the initial state and the final state, that is, differences in energy and differences in entropy. So once again, additive constants will drop out.

There are many standard reference books that purport to tabulate the entropy of various chemical compounds … but if you read the fine print you will discover that they are really tabulating the pseudo-entropy S′ not the true entropy S. In particular, the tabulated numbers typically do not include the contribution from the nuclear spin-entropy, nor the contribution from mixing the various isotopes that make up each element. They can more-or-less get away with this because under ordinary chem-lab conditions those contributions are just additive constants.

However, you must not let down your guard. Just because you can get away with using S′ instead of S in a few simple situations does not mean you can get away with it in general. There is a correct value for S and plenty of cases where the correct value is needed.

11.6  No Secret Entropy, No Hidden Variables

Suppose we want to find the value of the true entropy, S. We account for the thermal phonons, and the electrons, and the nuclear spins. We even account for isotopes, chemical impurities, and structural defects in the crystal. But … how do we know when to stop? How do we know if/when we’ve found all the entropy? In section 11.5 we saw how some of the entropy could silently go along for the ride, as a spectator, under certain conditions. Is there some additional entropy lurking here or there? Could there be hitherto-unimagined quantum numbers that couple to hitherto-unimagined fields?

The answer is no. According to all indications, there is no secret entropy. At any temperature below several thousand degrees, electrons, atomic nuclei, and all other subatomic particles can be described by their motion (position and momentum) and by their spin, but that’s it, that’s a complete description. Atoms, molecules, and all larger structures can be completely described by what their constituent particles are doing.

In classical mechanics, there could have been an arbitrary amount of secret entropy, but in the real world, governed by quantum mechanics, the answer is no.

We have a firm experimental basis for this conclusion. According to the laws of quantum mechanics, the scattering of indistinguishable particles is different from the scattering of distinguishable particles.

Therefore let’s consider a low-energy proton/proton scattering experiment. We arrange that the protons are not distinguishable on the basis of position, or on any basis other than spin. That is, the protons are indistinguishable if and only if they have the same spin.

Next we randomize the spins, so that for each proton, each of the two spin states is equally likely. Our ignorance of the spin state contributes exactly 1 bit per particle to the entropy.

Now, to make things interesting, suppose that in addition to the aforementioned 1 bit of spin-entropy, each proton had 17 bits of “secret entropy”, in whatever form you can imagine. That would mean that there would be 217 different distinguishable types of proton. If you pick protons at random, they would almost certainly be distinguishable, whether or not their spins were aligned, and you would almost never observe like-spin scattering to be different from unlike-spin scattering.

Such scattering experiments have been conducted with electrons, protons, various heavier nuclei, and sometimes entire atoms. There has never been any indication of any secret entropy.

The thermodynamics of chemical reactions tells us that larger structures can be described in terms of their constituents with no surprises.

The existence of superfluidity is further evidence that we can correctly account for entropy. All the atoms in the superfluid phase are described by a single quantum wavefunction. The entropy per atom is zero; otherwise it wouldn’t be a superfluid. Superfluid 4He depends on the fact that all 4He atoms are absolutely totally indistinguishable – not distinguishable on the basis of position, spin, or any other quantum numbers. This is what we expected, based on two-particle scattering experiments, but the existence of superfluidity reassures us that we haven’t overlooked anything when going from two particles to 1023 particles.

Superfluidity occurs because certain identical-particle effects are cumulative and therefore have a spectacular effect on the entire fluid. Similar macroscopic identical-particle effects have been directly observed in 3He, spin-polarized monatomic hydrogen, sodium atomic gas, and other systems.

It might also be remarked that the existence of superconductors, semiconductors, metals, molecular bonds, and the periodic table of elements is strong evidence that electrons have no secret entropy. The existence of lasers is strong evidence that photons have no secret entropy.

I can’t prove that no hitherto-secret entropy will ever be discovered. We might discover a new atom tomorrow, called loonium, which is exactly the same as helium except that for some reason it always obeys the distinguishable-particle scattering law when scattering against helium. This wouldn’t be the end of the world; we would just postulate a new quantum number and use it to distinguish the two types of atom. All I can say is that loonium must be exceedingly rare; otherwise it would have been noticed.

Reminder: The foregoing discussion applies to “secret entropy” that might exist at room temperature or below, in analogy to spin entropy. In contrast we are not talking about the plethora of quantum numbers that are known to come into play at higher energies, but are all in their ground state under ordinary room-temperature conditions.

11.7  Entropy is Context Dependent

Consider 100 decks of cards. The first one is randomly shuffled. It has an entropy of just under 226 bits. All the rest are ordered the same way as the first. If you give me any one of the decks in isolation, it will take me 226 yes/no questions to figure out how to return the deck to standard order. But after I’ve seen any one of the decks, I know the exact microstate of every other deck without asking additional questions. The other 99 decks contain zero additional entropy.

In a situation like this, it’s hard to consider entropy to be a state variable. In particular, the entropy density will not be an intensive property.

I know this sounds creepy, but it’s real physics. Creepy situations like this do not usually occur in physical systems, but sometimes they do. Examples include:

In an ordinary ideal gas, you can pretty much assume the entropy density is a well-behaved intensive property – but don’t completely let down your guard, or you’ll be badly fooled by the spin-echo setup.

A related issue concerns the dependence of entropy on the choice of observer. Entropy is not simply a property of a system, but rather a property of the system and the description thereof. This was mentioned in passing near the end of section 2.

Let’s be clear: As a matter of principle, two different observers will in general assign two different values to “the” entropy.

This is easy to express in mathematical terms. The trustworthy workhorse formula for entropy is equation 5. If P is a conditional probability, as it often is, then S is a conditional entropy.

Human observers are so grossly dissipative and usually “know” so little that it is academic to worry about the thermodynamics of human “knowledge”. However, the issue takes on new life when we consider highly-optimized robot measuring devices – Maxwell demons and the like.

For microscopic systems, it is for sure possible for different observers to report different values of “the” entropy (depending on what each observer knows about the system). The discrepancy can be a large percentage of the total.

By way of analogy, you know that different observers report different values of “the” kinetic energy (depending on the velocity of the observer), and this hasn’t caused the world to end.

For macroscopic systems (1023 particles or thereabouts) it is uncommon for one observer to know 1023 things that the other observer doesn’t … but even this is possible. The spin echo experiment is a celebrated example, as discussed in section 10.7.

Regardless of the size of the system, it is often illuminating to consider a complete thermodynamic cycle, such that all participants are returned to the same state at the end of the cycle. This de-emphasizes what the observers “know” and instead focuses attention on how they “learn” … and how they forget. In more technical terms: this focusses attention on the observation/measurement process, which is crucial if you want a deep understanding of what entropy is and where it comes from. See reference 30 and reference 31.

In particular, at some point in each cycle the observer will have to forget previous information, to make room for the new information. This forgetting expels entropy, and at temperature T it dissipates energy TS.

To repeat: When evaluating “the” entropy, it is necessary to account for the information in the observer-system. In a closed cycle, this focusses attention on the observation and measurement process. If you don’t do this, you will get the wrong answer every time when analyzing spin echo systems, Maxwell demons, Szilard engines, reversible computers, et cetera.

12  Entropy versus “Irreversibility” in Chemistry

In chemistry, the word “irreversible” is commonly used in connection with multiple inconsistent ideas, including:

Those ideas are not completely unrelated … but they are not completely identical, and there is potential for serious confusion.

You cannot look at a chemical reaction (as written in standard form) and decide whether it is spontaneous, let alone whether it goes to completion. For example, if you flow steam over hot iron, you produce iron oxide plus hydrogen. It goes to completion in the sense that the iron is used up. Conversely, if you flow hydrogen over hot iron oxide, you produce iron and H2O. It goes to completion in the sense that the iron oxide is used up.

And none of that has much to do with whether the reaction was thermodynamically reversible or not.

Here is a pair of scenarios that may clarify a few things.

Scenario #1: Suppose a heavy brick slides off a high shelf and falls to the floor. Clearly this counts as a “spontaneous” process. It liberates energy and liberates free energy.

Further suppose that near the floor we catch the brick using some sort of braking mechanism. The brakes absorb the energy and get slightly warm. This braking process is grossly irreversible in the thermodynamic sense. That is, the process is very far from being isentropic.

Now we can use the heat in the brakes to run a heat engine. Let’s suppose that it is an ideal heat engine. The fact that the engine is thermodynamically reversible is interesting, but it does not mean that the overall process (brick + brake + heat engine) is reversible. There was a terrible irreversibility at an upstream point in the process, before the energy reached the heat engine. The thermodynamic efficiency of the overall process will be terrible, perhaps less than 1%.

Scenario #2: Again the brick slides off the shelf, but this time we attach it to a long lever (rather than letting it fall freely). As the brick descends to the floor, the lever does useful work (perhaps raising another weight, generating electrical power, or whatever). The overall thermodynamic efficiency of this process could be very high, easily in excess of 90%, perhaps even in excess of 99%. The process is still spontaneous and still goes to completion.

From these scenarios we see that being spontaneous and/or going to completion does not necessarily tell you anything about whether the process is irreversible in the thermodynamic sense.

In elementary chemistry classes, people tend to pick up wrong ideas about thermodynamics, because the vast preponderance of the reactions that they carry out are analogous to scenario #1 above. That is, the reactions are grossly irreversible in the thermodynamic sense. The reactions are nowhere near isentropic.

There are some examples of chemical reactions that are essentially reversible, in analogy to scenario #2. In everyday life, the commonest examples of this are electrochemical reactions, e.g. storage batteries and fuel cells. Another example is the CO2/carbonate reaction discussed below. Alas, there is a tendency for people to forget about these reversible reactions and to unwisely assume that all reactions are grossly irreversible, in analogy to scenario #1. This unwise assumption can be seen in the terminology itself: widely-used tables list the “standard heat of reaction” (rather than the standard energy of reaction), apparently under the unjustifiable assumption that the energy liberated by the reaction will always show up as heat. Similarly reactions are referred to as “exothermic” and “endothermic”, even though it would be much wiser to refer to them as exergonic and endergonic.

It is very difficult, perhaps impossible, to learn much about thermodynamics by studying bricks that fall freely and smash against the floor. Instead, thermodynamics is most understandable and most useful when applied to situations that have relatively little dissipation, i.e. that are nearly isentropic.

Lots of people get into the situation where they have studied tens or hundreds or thousands of reactions, all of which are nowhere near isentropic. That’s a trap for the unwary. It would be unwise to leap to the conclusion that all reactions are far from isentropic … and it would be even more unwise to leap to the conclusion that “all” natural processes are far from isentropic.

Chemists are often called upon to teach thermodynamics, perhaps under the guise of a “P-Chem” course (i.e. physical chemistry). This leads some people to ask for purely chemical examples to illustrate entropy and other thermodynamic ideas. I will answer the question in a moment, but first let me register my strong objections to the question. Thermodynamics derives its great power and elegance from its wide generality. Specialists who cannot cope with examples outside their own narrow specialty ought not be teaching thermodynamics.

Here’s a list of reasons why a proper understanding of entropy is directly or indirectly useful to chemistry students.

  1. Consider electrochemical reactions. Under suitable conditions, some electrochemical reactions can be made very nearly reversible in the thermodynamic sense. (See reference 32 for some notes on how such cells work.) In these cases, the heat of reaction is very much less than the energy of reaction, and the entropy is very much less than the energy divided by T.
  2. Consider the reaction that children commonly carry out, adding vinegar to baking soda, yielding sodium acetate and carbon dioxide gas. Let’s carry out this reaction in a more grown-up apparatus, namely a sealed cylinder with a piston. By pushing on the piston with weights and springs, we can raise the pressure of the CO2 gas. If we raise the pressure high enough, we push CO2 back into solution. This in turn raises the activity of the carbonic acid, and at some point it becomes a strong enough acid to attack the sodium acetate and partially reverse the reaction, liberating acetic acid. So this is clearly and inescapably a chemistry situation.

    Much of the significance of this story revolves around the fact that if we arrange the weights and springs just right, the whole process can be made thermodynamically reversible (nearly enough for practical purposes). Adding a tiny bit of weight will make the reaction go one way, just as removing a tiny bit of weight will make the reaction go the other way.

    Now some interesting questions arise: Could we use this phenomenon to build an engine, in analogy to a steam engine, but using CO2 instead of steam, using the carbonate ↔ CO2 chemical reaction instead of the purely physical process of evaporation? How does the CO2 pressure in this system vary with temperature? How much useful work would this CO2 engine generate? How much waste heat? What is the best efficiency it could possibly have? Can we run the engine backwards so that it works as a refrigerator?

    There are more questions of this kind, but you get the idea: once we have a reaction that is more-or-less thermodynamically reversible, we can bring to bear the entire machinery of thermodynamics.

  3. Consider the colligative effects of a solute on the on freezing point, boiling point, and vapor pressure of a solvent. The fact that they’re colligative – i.e. insensitive to the chemical properties of the solute – is strong evidence that entropy is what’s driving these effects, not enthalpy, energy, or free energy.
  4. Similarly: consider the Gibbs Gedankenexperiment (section 10.6). Starting with a sample of 4He, we get an increase in entropy if we mix it with 3He, or Ne, or Xe … but we get no effect if we “mix” it with more of the same 4He.
  5. People who take chemistry classes often go on to careers in other fields. For example, you might need knowledge of chemistry, physics, and engineering in order to design a rocket engine, or a jet engine, or a plain old piston engine. Such things commonly involve a chemical reaction followed by a more-or-less isentropic expansion. Even though the chemical reaction is grossly irreversible, understanding the rest of the system requires understanding thermodynamics.

    To be really specific, suppose you are designing something with multiple heat engines in series. This case is considered as part of the standard “foundations of thermodynamics” argument, as illustrated figure 28. Entropy is conserved as it flows down the totem-pole of heat engines. The crucial conserved quantity that is the same for all the engines is entropy … not energy, free energy, or enthalpy. No entropy is lost during the process, because entropy cannot be destroyed, and no entropy (just work) flows out through the horizontal arrows. No entropy is created, because we are assuming the heat engines are 100% reversible. For more on this, see reference 5.

    heat-totem
    Figure 28: Heat Engines In Series
  6. Consider “Design of Experiment”, as discussed in reference 9. In this case the entropy of interest is not the entropy of the reaction, but still it is entropy, calculated in accordance with equation 5, and it is something a chemist ought to know. Research chemists and especially chemical engineers are often in the situation where experiments are very expensive, and someone who doesn’t understand Design of Experiment will be in big trouble.

13  The “Big Four” Energy-Like State Functions

13.1  Energy

The energy is one of the “big four” thermodynamic potentials.

The concept of energy has already been introduced; see section 1.

13.2  Enthalpy

We hereby define the enthalpy as:

H := E + P V              (64)

where H is the near-universally conventional symbol for enthalpy, E is the energy, V is the volume of the system, and P is the pressure on the system. We will briefly explore some of the mathematical consequences of this definition, and then explain what enthalpy is good for.

We will need the fact that

d(P V) = PdV + VdP              (65)

which is just the rule for differentiating a product. This rule applies to any two variables (not just P and V), provided they were differentiable to begin with. Note that this rule is intimately related to the idea of integrating by parts, as you can see by writing it as

PdV = d(P V) − VdP              (66)

and integrating both sides.

Differentiating equation 64 and using equation 19 and equation 65, we find that

dH = PdV + TdS  + PdV + VdP
  = VdP + TdS  
             (67)

which runs nearly parallel to equation 19; on the RHS we have transformed −PdV into VdP, and of course the LHS is enthalpy instead of energy.

This trick of transforming xdy into −ydx (with a leftover d(xy) term) is called a Legendre transformation. Again we note the idea may be somewhat familiar in the guise of integrating by parts.

In the chemistry lab, it is common to carry out reactions under conditions of constant pressure. If the reaction causes the system to expand or contract – for instance if gas is evolved from a solid or liquid – it will do work against atmospheric pressure. This work will change the energy ... but it will not change the enthalpy, because the latter depends on VdP.

This means that under conditions of constant pressure, it is easier to keep track of the enthalpy than to keep track of the energy.

It is also amusing to differentiate H with respect to P and S directly, using the chain rule. This gives us:

dH =   
∂ H
∂ P
 


 


S
 dP +     
∂ H
∂ S
 


 


P
 dS              (68)

which is interesting because we can compare it, term by term, with equation 67. When we do that, we find that the following identities must hold:

V = 
∂ H
∂ P
 


 


S
             (69)

and

T = 
∂ H
∂ S
 


 


P
             (70)

Equation 70 is not meant to redefine T; it is merely a corollary of our earlier definition of T (equation 18) and our definition of H (equation 64).

13.3  Free Energy

In many situations – for instance when dealing with heat engines – it is convenient to keep track of the free energy of a given parcel. This is also known as the Helmholtz potential, or the Helmholtz free energy. It is defined as:

F := E − T S              (71)

where F is the conventional symbol for free energy, E is (as always) the energy, S is the entropy, and T is the temperature of the parcel.

The free energy is extremely useful for analyzing the spontaneity and reversibility of transformations taking place at constant T and constant V. See reference 24 for details.

See section 13.5 for a discussion of what is (or isn’t) “free” about the free energy.

The energy and the free energy are related to the partition function, as discussed in section 22.

13.4  Free Enthalpy

Combining the ideas of section 13.2 and section 13.3, there are many situations where it is convenient to keep track of the free enthalpy. This is also known as the Gibbs potential or the Gibbs free enthalpy. It is defined as:

G = E + P V − T S
  = H − T S
             (72)

where G is the conventional symbol for free enthalpy. (Beware: G is all-too-commonly called the Gibbs free “energy” but that is a bit of a misnomer. Please call it the free enthalpy, to avoid confusion between F and G.)

The free enthalpy has many uses. For starters, it is extremely useful for analyzing the spontaneity and reversibility of transformations taking place at constant T and constant P, as discussed in reference 24. (You should not however imagine that G is restricted to constant-T and/or constant-P situations, for reasons discussed in section 13.6.)

13.5  Thermodynamically Available Energy

The notion of “free energy” is often misunderstood. Indeed the term “free energy” practically begs to be misunderstood.

It is superficially tempting to divide the energy E into two pieces, the “free” energy F and the “unfree” energy TS. This is formally possible, but not very helpful as far as I can tell . In particular, there is no connection to the ordinary meaning of “free”. You should not think that the free energy is the “thermodynamically available” part of the energy, or that TS is the “unavailable” part of the energy.

According to section 1.5, it is in general not possible to define the “available” energy as a function of state. Free energy is no exception to this rule, as we now discuss.

The free energy of a given parcel is a function of state, and in particular is a function of the thermodynamic state of that parcel. That is, for parcel #1 we have F1 = E1T1 S1 and for parcel #2 we have F2 = E2T2 S2.

Suppose we hook up a heat engine that uses parcel #1 as its heat source and parcel #2 as its heat sink. This is the situation shown in figure 3, except that now we make the cold-side reservoir very much larger and very much colder than the hot-side reservoir. Assume the heat engine is maximally efficient, so its efficiency is the Carnot efficiency, (T1T2)/T1. We see that the amount of “thermodynamically available” energy depends on T2, whereas the free energy of parcel #1 does not. In particular, if T2 is cold enough, the work done by the heat engine will exceed the free energy of parcel #1. Indeed, in the limit that parcel #2 is very large and very cold (approaching absolute zero), the work done by the heat engine will converge to the entire energy E1, not the free energy F1.

The notion of thermodynamically available energy is often not precisely definable, as you can see by considering the case where parcel #1 is in contact with two other parcels, with two different temperatures. On the other hand, if you do happen to have a well-defined, unique “ambient” temperature then you might be able to formulate a well-behaved notion of “thermal energy”.

In any case, if you find yourself trying to quantify the “thermal energy” content of something, it is likely that you are asking the wrong question. You will probably be much better off quantifying something else instead, perhaps the energy E and the entropy S. See section 18 for more on this.

Similar remarks apply to the free enthalpy, G.

In general, you should never assume you can figure out the nature of a thing merely by looking at the name of a thing. As discussed in reference 33, a titmouse is not a kind of mouse, milk of magnesia is not made of milk, and chocolate turtles are not made of turtles. As Voltaire remarked, the Holy Roman Empire was neither holy, nor Roman, nor an empire. By the same token, free energy is not the “free” part of the energy.

13.6  Relationships among E, F, G, and H

We have now encountered four quantities {E, F, G, H} all of which have dimensions of energy. The relationships among these quantities can be nicely summarized in two-dimensional charts, as in figure 29.

efgh
Figure 29: Energy, Enthalpy, Free Energy, and Free Enthalpy
d-efgh
Figure 30: Some Derivatives of E, F, G, and H

The four expressions in figure 30 constitute all of the expressions that can be generated by starting with equation 19 and applying Legendre transformations. They are emphatically not the only valid ways of differentiating E, F, G, and H. Equation 24 is a very practical example – namely heat capacity – that does not show up in figure 30. It involves expressing dE in terms of dV and dT (rather than dV and dS). As another example, equation 108 naturally expresses the energy as a function of temperature, not as a function of entropy.

Beware: There is a widespread misconception that E is “naturally” (or necessarily) expressed in terms of V and S, while H is “naturally” (or necessarily) expressed in terms of P and S, and so on for F(V,T) and G(P,S). To get an idea of how widespread this misconception is, see reference 34 and references therein. Alas, there are no good reasons for such restrictions on the choice of variables.

These restrictions may be a crude attempt to solve the problems caused by taking shortcuts with the notation for partial derivatives. However, the restrictions are neither necessary nor sufficient to solve the problems. One key requirement for staying out of trouble is to always specify the direction when writing a partial derivative. That is, do not leave off the “at constant X” when writing the partial derivative at constant X. See section 6.5 and reference 2 for more on this.

Subject to some significant restrictions, you can derive a notion of conservation of enthalpy. Specifically, this is restricted to conditions of constant pressure, plus some additional technical restrictions. See reference 24. (This stands in contrast to energy, which obeys a strict local conservation law without restrictions.) If the pressure is changing, the safest procedure is to keep track of the pressure and volume, apply the energy conservation law, and then calculate the enthalpy from the definition (equation 64) if desired.

13.7  Yet More Transformations

Starting from equation 39 there is another whole family of Legendre transformations involving µN.

14  Adiabatic Processes

The word adiabatic is another term that suffers from multiple inconsistent meanings. The situation is summarized in figure 31.

adiabatic
Figure 31: Multiple Definitions of Adiabatic
  1. Some thoughtful experts use “adiabatic” to denote a process where no entropy is transferred across the boundary of the region of interest. This was probably the original meaning, according to several lines of evidence, including the Greek etymology: α + δια + βατoς = not passing across. As a corollary, we conclude the entropy of the region does not decrease.
  2. Other thoughtful experts refer to the adiabatic approximation (in contrast to the sudden approximation) to describe a perturbation carried out sufficiently gently that each initial state can be identified with a corresponding final state, and the occupation number of each state is preserved during the process. As a corollarly, we conclude that the entropy of the region does not change.
  3. Dictionaries and textbooks commonly define “adiabatic” to mean no flow of entropy across the boundary and no creation of entropy.

In the dream-world where only reversible processes need be considered, definitions (1) and (2) are equivalent, but that’s not much help to us in the real world.

Also note that when discussing energy, the corresponding ambiguity cannot arise. Energy can never be created or destroyed, so if there is no transfer across the boundary, there is no change.

As an example where the first definition (no flow) applies, but the second definition (occupation numbers preserved) does not, see reference 35. It speaks of an irreversible adiabatic process, which makes sense in context, but is clearly inconsistent with the second meaning. This is represented by point (1) in the figure.

As an example where the second definition applies but the first definition does not, consider the refrigeration technique known as adiabatic demagnetization. The demagnetization is carried out gently, so that the notion of corresponding states applies to it. If the system were isolated, this would cause the temperature of the spin system to decrease. The interesting thing is that people still call it adiabatic demagnetization even when the spin system is not isolated. Specifically, consider the subcase where there is a steady flow of heat inward across the boundary of the system, balanced by a steady demagnetization, so as to maintain constant temperature. Lots of entropy is flowing across the boundary, violating the first definition, but it is still called adiabatic demagnetization in accordance with the second definition. This subcase is represented by point (2) in the diagram.

As an example where the second definition applies, and we choose not to violate the first definition, consider the NMR technique known as “adiabatic fast passage”. The word “adiabatic” tells us the process is slow enough that there will be corresponding states and occupation numbers will be preserved. Evidently in this context the notion of no entropy flow across the boundary is not implied by the word “adiabatic”, so the word “fast” is adjoined, telling us that the process is sufficiently fast that not much entropy does cross the boundary. To repeat: adiabatic fast passage involves both ideas: it must be both “fast enough” and “slow enough”. This is represented by point (3) in the diagram.

My recommendation is to avoid using the term adiabatic whenever possible. Some constructive suggestions include:

14.1  Internal Energy

The notion of “internal energy” arises in fluid dynamics when we have two different reference frames to keep track of.

Suppose we have a smallish parcel of fluid with total mass M. Its center of mass is located at position R in the lab frame, and is moving with velocity V relative to the lab frame.

We can express the energy of the parcel as:

E = Ein + ½ M V2 + Φ(R)              (73)

where Φ is some potential, perhaps a gravitational potential such that Φ(R) = − M g·R.

In this expression, Ein denotes the internal energy. It accounts for the fact that the particles within the parcel of fluid are moving relative to each other, and interacting via interparticle potentials that depend on their positions relative to each other. It also includes terms such as each particle’s binding energy, rotational energy, and vibrational energy ... terms that are independent of the other particles.

The term Ein is conventionally called “internal” energy, but it could equally well be called “intrinsic” or “inherent” energy. It is important because it is independent of the choice of reference frame (to an excellent approximation, assuming the potential Φ is not too horribly nonlinear). Also, see section 8.4.5 and reference 14 for more about the center-of-mass kinetic energy – namely the ½ M V2 term in equation 73 – and how it relates to other forms of energy.

It must be emphasized that the law of conservation of energy applies to “the” energy E, not to the internal energy Ein.

15  Boundary versus Interior

We now discuss two related notions:

When we consider a conserved quantity such as energy, momentum, or charge, these two notions stand in a one-to-one relationship. In general, though, these two notions are not equivalent.

In particular, consider equation 43, which is restated here:

dE =   −P dV + T dS + advection              (74)

 

Although officially dE represents the change in energy in the interior of the region, we are free to interpret it as the flow of energy across the boundary. This works because E is a conserved quantity.

The advection term is explicitly a boundary-flow term.

It is extremely tempting to interpret the two remaining terms as boundary-flow terms also … but this is not correct!

Officially PdV describes a property of the interior of the region. Ditto for TdS. Neither of these can be converted to a boundary-flow notion, because neither of them represents a conserved quantity. In particular, PdV energy can turn into TdS energy entirely within the interior of the region, without any boundary being involved.

Let’s be clear: boundary-flow ideas are elegant, powerful, and widely useful. Please don’t think I am saying anything bad about boundary-flow ideas. I am just saying that the PdV and TdS terms do not represent flows across a boundary.

Misinterpreting TdS as a boundary term is a ghastly mistake. It is more-or-less tantamount to assuming that heat is a conserved quantity unto itself. It would set science back over 200 years, back to the “caloric” theory.

Once these mistakes have been pointed out, they seem obvious, easy to spot, and easy to avoid. But beware: mistakes of this type are extremely prevalent in introductory-level thermodynamics books.

16  Heat

16.1  Definitions

The term “heat” is a confusing chimera. It is partly energy, partly entropy, partly temperature, and partly who-knows-what.

There are at least five sensible and widely-used but mutually-inconsistent technical meanings (not to mention innumerable nontechnical and metaphorical meanings). It is not worth arguing about the relative merits of these meanings, except to say that each has some merit. I observe that a typical thoughtful expert will use each of these meanings, depending on context. It would be nice to have a single, universally-accepted meaning, but I doubt that will happen anytime soon.

You may be wondering how it is possible that a concept as important as “heat” could lack a clear, unique meaning? Well, the answer is that “heat” just isn’t that important. When we want to quantify things, it is better to forget about “heat” and just quantify energy and entropy, which are unambiguous and unproblematic.

Sensible technical definitions of “heat” include:

  1. Sometimes “heat” simply means hotness, i.e. relatively high temperature. Example: if we’re having a heat wave, it means a spell of hot weather. The corresponding verb, heating, simply means making something hotter. This type of heat is an intensive scalar quantity, and can be measured in degrees.
  2. Sometimes the word “heat” is used to refer to the T dS term in equation 19. This type of heat is a vector quantity, not a scalar. In particular it is an ungrady one-form. The corresponding verb, heating, happens if and only if there is a change in the entropy of the region.
  3. Sometimes “heat” is defined as “energy that is transferred from one body to another as the result of a difference in temperature”. This implies a transfer of entropy across the boundary of the region. This definition is quite prevalent in encyclopedias, dictionaries, and textbooks. Some people learn this by rote, and rely on it as if it were the 11th commandment, and fail to appreciate its limitations. It works OK within a modest range of “textbook” situations, but it can be hard to quantify and can lead to nasty inconsistencies when applied to other situations, notably when dissipation is occurring, as discussed in section 10.5.5.
  4. Sometimes people use the terms “heat energy” or “thermal energy” (in contrast to “mechanical energy”) to express the idea of Locrian modes (in contrast to non-Locrian modes) as discussed in section 8.2. With care, this idea can be applied to uncramped situations. The idea is sound, but the terminology risks confusion with all the other definitions of “heat”. This type of heat is an extensive scalar, and can be measured in joules.
  5. Within the narrow limits of a cramped thermodynamic situation there is a useful, self-consistent concept of heat content, aka thermal energy, aka caloric, aka Q. An example of this is discussed in section 10.5.3. This is an extensive scalar, and can be measured in joules. Beware that this notion cannot be extended to uncramped thermodynamics. It cannot even be extended from one cramped situation to another, as you can see from the fact that ΔQ=CVΔT is different from ΔQ=CPΔT – yet each is called “heat” within its own cramped subspace (constant V or constant P respectively).

In addition, one sometimes encounters some less-than-sensible definitions, including:

As an example where definition #1 and definition #2 apply, but definition #3 does not, consider the notion that a microwave oven heats the food. Clearly (1) the food gets hotter. Clearly (2) the entropy of the food changes. But (3) no entropy was transferred across the boundary of the food. Energy was transferred, but the entropy was created from scratch, within the food. According to any reasonable definition of temperature, the magnetron (the wave-generating device inside the oven) isn’t very hot, so you can’t say the energy was transferred “as the result of a difference in temperature”.

The distinction between (2) and (3) is an instance of the boundary/interior issue, as discussed in section 15.

As an example where definition #2 and definition #3 apply, but definition #1 does not, consider a glass of ice water sitting on the table. We say that heat leaks into the system and melts the ice. The temperature does not change during the process.

As an example where definition #1 applies but definition #2 and definition #3 do not, consider the reversible thermally-insulated compression of a parcel of gas. We say the gas heats up, and there is an increase in the amount of thermal energy within the region. On the other hand, clearly no heat or entropy was transferred across the boundary, and there was no change in the entropy within the region.

We now discuss the advantages and disadvantages of definition #3:

Definition #3 is the most prevalent, perhaps in part because it is easily expressed in non-mathematical words. Many students have been forced to learn this definition by rote.   Rote learning is a poor substitute for understanding.

Definition #3 makes sense in some situations, such as a simple non-moving heat exchanger in a non-dissipative system.   Such situations are not representative of the general case.

Definition #3 focusses attention on flow across a boundary. This is good, because we believe all the laws of physics should be stated in local form, and flows across a boundary are crucial for this.   It focusses on temperature and heat. It would be better to focus on energy and entropy. Certainly energy and entropy can flow between systems that don’t even have a well-defined temperature (let alone a difference in temperature). Also remember that heat is not a conserved quantity, and it is hard to know what “flow” means when applied to non-conserved quantities. Whenever you talk about heat flow, you run the risk that non-experts will visualize heat as some sort of conserved fluid.

Heat is non-conserved twice over. First of all, even in reversible processes, heat is non-conserved because non-Locrian energy can be converted to Locrian energy and (within limits) vice versa. As mentioned in section 10.5.6 energy is conserved, but heat (by itself) is not conserved. Secondly, in irreversible processes heat is not conserved because entropy is not conserved.

16.2  Resolving or Avoiding the Ambiguities

The word “heat” occurs in a great number of familiar expressions. Usually these are harmless, especially when used in a loose, qualitative sense … but they can cause trouble if you try to quantify them, and some of them should be avoided entirely, because they are just begging to be misunderstood.

Terminology: Keep in mind that the symbol H conventionally stands for enthalpy; it does not stand for heat. Alas, many texts don’t distinguish between heat and enthalpy. That’s a problem because sometimes the enthalpy of reaction (δH) shows up as heat, and sometimes as something else (such as electrical energy).

As discussed in section 12, whenever you see the phrase “heat of reaction” you should cross it out and replace it with “enthalpy of reaction” or something similar. Also beware that Hess’s law is often taught in such a way that it seems to express conservation of heat, as discussed in connection with figure 9. That’s terrible! Heat is not conserved!

Talking about energy flow is incomparably better than talking about heat flow, because energy is a conserved quantity.

If you mean hotness, as in definition #1 above, it is better to speak of temperature rather than heat. This avoids an ambiguous use of the term “heat”.

When experts talk about the T dS vector (definition #2) they commonly call it literally T dS (pronounced literally “tee dee ess”). This is nicely unambiguous. The term “heat vector” is a slightly more elegant way of talking about the same idea. The point is that saying “heat vector” rather than merely “heat” makes it clear we are talking about T dS, thereby removing a great deal of ambiguity. Remember that this vector is a one-form (as opposed to a pointy vector), and lives in abstract thermodynamic state-space (unlike everyday position vectors). The RHS of figure 12 shows you how to visualize the T dS vector. For an introduction to one-forms and how they apply to thermodynamics, see reference 3.

In almost all cases where the “transfer across a boundary” idea is used (definition #3), the T dS vector idea (definition #2) would be a more precise and more reliable way of describing what is going on. This removes the inconsistencies associated with the “transfer across a boundary” idea. Also, whether or not energy is being transferred across a boundary, visualizing T dS as a vector resolves a goodly number of conceptual problems.

Here is a helpful analogy:

The problematic concept of phlogiston was replaced by two precise concepts (namely oxygen and energy).   The problematic concept of heat has been replaced by two precise concepts (namely energy and entropy).

As another analogy, consider the comparison between “heat” and “blue”, another common four-letter word.

Nobody in his right mind would try to quantify what “blue” means. Instead of quantifying the blueness, you should quantify something else, perhaps power versus wavelength.   Instead of quantifying heat, you should quantify the energy and entropy.

Actually “heat” is far more problematic than “blue”, because there’s something even worse than imprecision, namely holy wars between the big-endians and the little-endians, each of whom think they know “the one true meaning” of the term.

17  Work

17.1  Definitions

The definition of work suffers from one major problem plus several minor nuisances.

The major problem is that there are two perfectly good but inconsistent notions:

  1. Mechanical transfer of energy across a boundary. Here mechanical means non-thermal and non-advective.
  2. Force times distance.

These two notions are closely related but certainly not identical. This is an instance of the boundary/interior issue, as discussed in section 15. This is a recipe for maximal confusion. (Wildly different ideas are easily distinguished, and identical ideas need not be distinguished.)

Within the force-times-distance family, there are the following nuisance factors, which will be discussed below:

We start by considering the case where the energy is a nice differentiable function of state, and is known as a function of two variables V and S alone. Then we can write

dE = 
∂ E
∂ V
 


 


S
 dV +     
∂ E
∂ S
 


 


V
 dS
  
      =  −P dV + T dS
             (75)

which is just a repeat of equation 16 and equation 19. This gives us the differential formulation of work, as follows:

The first term on the RHS, namely −P dV, is commonly called the work done on the system. Positive work done on the system increases the energy of the system.   The negative thereof, namely P dV, is the work done by the system. Positive work done by the system decreases the energy of the system.

As an elaboration, consider the common case where V itself is known as a differentiable function of some other variables (say) A, B, and C.

  Example #1:   Suppose the system is the parallelepiped spanned by the vectors A, B, and C. Then the volume is V = ABC.
  Example #2:   Suppose the system is a spring as shown in figure 33. It has one end attached to point A and the other end attached to point B, where both A and B are points on a long one-dimensional track. Then V is just the length of the spring, V = BA.

ppiped
Figure 32: Parallelepiped
spring-ab
Figure 33: Spring

We can differentiate V to obtain

dV =    
∂ V
∂ A
 


 


B,C
 dA +   
∂ V
∂ B
 


 


C,A
 dB +   
∂ V
∂ C
 


 


A,B
 dC              (76)

and plug that into equation 75 to obtain

dE = 
∂ E
∂ V
 


 


S
 
∂ V
∂ A
 


 


B,C
 dA +
∂ E
∂ V
 


 


S
 
∂ V
∂ B
 


 


C,A
 dB +
∂ E
∂ V
 


 


S
        
∂ V
∂ C
 


 


A,B
 dC +     
∂ E
∂ S
 


 


V
 dS
             (77)

We can write this more compactly as:

dE = FA|B,C dAFB|C,A dBFC|A,B dC  + T dS
             (78)

where we have defined the notion of force in a given direction according to:

FA|B,C := − 
∂ E
∂ A
 


 


B,C
             (79)

and similarly for the other directions.

It is conventional but very risky to write FA (meaning force “in the A direction”) as shorthand for FA|B,C. This is risky because the notion of “the A direction” is not well defined. It is OK to speak of the direction of constant B and C, but not the direction of changing A. Specifically, in example #2, when we evaluate ∂E / ∂A, we get very different results depending on whether we evaluate it at constant B or at constant V.

There is no reliable, general way to disambiguate this by assuming that B and C are the directions “perpendicular” to A. As an aside, note that in the two examples above, if A and B are interpreted as position-vectors in real space, they are definitely not perpendicular. More to the point, when A and B are interpreted as part of the abstract thermodynamic state-space, we cannot even define a notion of perpendicular.

In the present context, FA is unambiguous because FA|B,C is by far the strongest candidate for what it might mean. But in another context, the symbol FA might be highly ambiguous.

*   Integral versus Differential

We can convert to the integral formulation of work by integrating the differential representation along some path Γ. The work done by the system is:

workby[Γ] = 
 


Γ
P · dV              (80)

 

Consider the contrast:

The differential formulation of work (PdV) is a vector, specifically a one-form. A one-form can be considered as a mapping from pointy vectors to scalars.   The integral formulation of work (workby[⋯]) is a functional. It is a mapping from paths to scalars.

In particular, if Γ is a path from point X to point Y, you should not imagine that the work is a function of X and/or Y; rather it is a functional of the entire path. If PdV were a grady one-form, you could express the work as a function of the endpoints alone, but is isn’t so you can’t.

*   Coarse Graining

For each length scale λ, we get a different notion of work; these include microscopic work, mesoscopic work, and holoscopic work (aka macroscopic work, aka pseudowork). These are all similar in spirit, but the differences are hugely important. To illustrate this point, consider a flywheel in a box:

More generally, there are innumerable gray areas, depending on the length scale λ.

In thermodynamics, it is usually – but not necessarily – appropriate to assume that “work” refers to either mesoscopic or holoscopic work.

*   Local versus Overall

Sometimes it is useful to consider the force and displacement acting locally on part of the boundary, and sometimes it is useful to consider the overall force and overall displacement.

To say the same thing in mathematical terms, let’s multiply both sides of equation 76 by P to obtain:

P dV =   FA|B,C dA +  FB|C,A dB +  FC|A,B dC              (81)

 

In some contexts, it would make sense to speak of just one of the terms on the RHS as “the” work.

17.2  Energy Flow versus Work

Let’s consider systems that have some internal structure.

Our first example is shown in figure 34, namely a spring with a massive bob at one end. The other end is anchored. The mass of the spring itself is neglible compared to the mass of the bob. Dissipation is negligible. I am pushing on the bob, making it move at a steady speed v ≡ dA/dt. This requires adjusting the applied force F so that it always just balances the force of the spring.

spring-bob
Figure 34: Spring with Bob

When we ask how much “work” is involved, we have a bit of a dilemma.

It certainly feels to me like I am doing work on the spring+bob system. Energy is flowing across the boundary from me into the bob.   The overall work on the spring+bob system is zero. The force of my push on one end is exactly balanced by the force of constraint on the other end. Zero total force implies zero macroscopic work (aka pseudowork). Having zero macroscopic work is consistent with the work/KE theorem, since the KE of the system is not changing.

This dilemma does not go away if we break the system into sub-systems. The applied force on the bob is just balanced by the force of the spring, so there is no net force (hence no overall work) on the bob considered as a subsystem. The same goes for each small subsection of the spring: No net force, no acceleration, no work, and no change in KE.

The “local work” at the moving end is F · dx.

The “local work” at the fixed end is zero, since it is F · 0.

It is OK to think of energy pouring into the spring as a whole at the rate dE/dt = F · v. It is OK to think of energy as being like an abstract fluid flowing across the boundary.

It seems highly problematic to treat work as if it were a fluid flowing across the boundary. In particular, a naive attempt to apply the work/KE theorem is a disaster, because the energy inside the spring is virtually all potential energy; the KE inside the spring is negligible. The alleged work-fluid is flowing into the spring from the bob, and not flowing out anywhere, yet no work or KE is accumulating inside the spring.

As a second example, consider the oil bearing in section 10.5.4. Again we have a boundary/interior issue. Again we have a dilemma, due to conflicting definitions of work:

I am doing work in the sense of force (at a given point) times distance (moved by that point). I am doing work in the sense of pouring net energy across the boundary of the system.   There is no overall force, no overall work, no acceleration, and no change in KE.

Part of the lesson here is that you need to think carefully about the conditions for validity of the work/KE theorem. A non-exhaustive list is:

There are some interesting parallels between the oil bearing and the spring:

If you want a third parallel system, consider a force applied to a free body, such as the bob in figure 34 without the spring and without the anchor. Energy and momentum flow into the system and accumulate. The accumulated energy takes the form of non-Locrian kinetic energy.

From this we see that the work/KE theorem is intimately connected to the accumulation of momentum within the system, not the accumulation of energy per se.

A related thought is that momentum is conserved and energy is conserved, while work (by itself) is not conserved. KE (by itself) is not conserved.

17.3  Remarks

Keep in mind that “work” is ambiguous. If you decide to speak in terms of work, you need to spell out exactly what you mean.

Also keep in mind that dissipative processes commonly convert mesoscopic KE into microscopic KE as well as non-kinetic forms of energy. Energy is conserved; mesoscopic KE is not (by itself) conserved.

17.4  Hidden Energy

You can’t hide momentum; if an object has momentum its center-of-mass will be moving, and this will be easy to notice. In contrast, you can easily hide energy in an object’s internal degrees of freedom, perhaps in the form of spinning flywheels, taut springs, random microscopic energy, or other things having nothing to do with center-of-mass motion.

Here is an example of hidden energy: Consider a cart with two flywheels on board. Initially everything is at rest. Apply a pair of forces (equal and opposite) to the front flywheel, causing it to spin up, clockwise. Apply a similar pair of forces to the back flywheel, causing it to spin up, counterclockwise. The net force on the cart is zero. The motion of the cart’s center of mass is zero. The net force dot the overall motion is zero squared. The cart’s overall angular momentum is also zero. Yet the cart has gained kinetic energy: internal, mesoscopic kinetic energy.

Examples like this are a dime a dozen. In some sense what we are seeing here is the difference between holoscopic and mesoscopic kinetic energy. If you don’t recognize the difference, and recklessly talk about “the” kinetic energy, you’re going to have trouble.

17.5  Pseudowork

Sometimes in thermodynamics it is appropriate to focus attention on the large-λ limit of equation 80. In that case we have:

d (P2 / (2M)) = Ftot · dxcm
             (82)

where P = ∑pi is the total momentum of the system, M := ∑ mi is the total mass, Ftot := ∑Fi is total force applied to the system, and xcm is the distance travelled by the center of mass. See reference 14 for a derivation and discussion.

The RHS of equation 82 is called the pseudowork. The LHS represents the change in something we can call the pseudokinetic energy. This is just a synonym for the holoscopic kinetic energy.

There is an easy-to-prove theorem that says that for any length scale λ, an object’s total KE[λ] measured in the lab frame is equal to the KE[λ] of the relative motion of the components of the object (i.e. the KE[λ] measured in a frame comoving with the CM of the object) … plus the holoscopic KE associated with the motion of the CM relative to the lab frame (as given by equation 82).

Mesoscopic work and holoscopic work (aka pseudowork) are consistent with the spirit of thermodynamics, because they don’t require knowing the microscopic forces and motions.

However, the pseudowork is not equal to the “thermodynamic” w that appears in the oft-abused equation 16. Here’s a counterexample: Suppose you apply a combination of forces to a system and its center of mass doesn’t move. Then there are at least three possibilities:

According to the meaning of w usually associated with equation 16, w is zero in the first case, nonzero in the second case, and who-knows-what in the third case. It is a common mistake to confuse w with work or pseudowork. Don’t do it.

18  Cramped versus Uncrampled Thermodynamics

We have a choice. We can have at most one of the following two options, not both:

  1. Option #1: We might want to divide the energy of the system into two pieces, the thermal piece and the nonthermal piece:

      E =Enonthermal + Ethermal
    or 
      E =W + Q
                 (83)

    where

    W := nonthermal energy content
    Q := thermal energy content (aka heat content)
                 (84)

    and where Q is well-defined, well-behaved, and a function of state, just as E is a function of state.

  2. Option #2: We might want a theory of thermodynamics complete enough to describe heat engines, refrigerators, and suchlike.

There are innumerable ways of demonstrating that it’s physically impossible to choose both options at the same time. For starters, the argument in section 7.1 makes the point rather directly: if the system is capable of going around a cycle, it is capable of converting heat to work (or vice versa) while leaving all state variables unchanged.

As an even simpler argument that leads to the same conclusion, consider the elementary example of “heat content” that might arise in connection with a measurement of the heat capacity of a cylinder of compressed gas. We have a problem already, because there are two heat capacities: the heat capacity at constant pressure, and the heat capacity at constant volume. So it is unclear whether the heat content should be CP T or CV T. Now we get to play whack-a-mole: You can remove the ambiguity by rigorously restricting attention to either constant volume or constant pressure … but that restriction makes it impossible to analyze a Carnot-type heat engine.

To repeat: It is tempting to think that the gas cylinder has a thermal energy related to T and S, plus a nonthermal energy related to P and V, but if you try to build a theory of thermodynamics on this basis you are guaranteed to fail. The sooner you give up, the happier you will be.

Option #1 is a legitimate option. This is what we have been calling cramped thermodynamics. It is only a small subset of thermodynamics, but it’s not crazy. Almost everyone learns about cramped thermodynamics before they learn about uncramped thermodynamics. Consider for example warming the milk in a baby-bottle. This is almost always carried out under conditions of constant pressure. You’re not trying to build a steam engine (or any other kind of engine) out of the thing. In this case, for this narrow purpose, there is a valid notion of the “heat content” of the system.   Since this document is mostly about uncramped thermodynamics, I have chosen option #2: you will find almost no mention of “heat content” or “thermal energy” (except in warnings and counterexamples).

Within limits, the choice is yours: If you want to do cramped thermodynamics, you can do cramped thermodynamics. Just please don’t imagine your results apply to thermodynamics in general. Cramped thermodynamics by definition is restricted to situations where the state-space is so low-dimensional that there is no hope of building a heat engine or a refrigerator or anything like that. There are no Carnot cycles, nor indeed any other kind of nontrivial cycles.

Trying to divide the energy along the lines suggested by equation 83 is allowable within cramped thermodynamics, but is completely incompatible with thermodynamics. The Q that appears in this equation could be called “heat content” or “thermal energy” or caloric. Long ago, there was a fairly elaborate theory of caloric. The elaborate parts are long dead, having been superseded by thermodynamics during the 19th century. The idea of caloric (aka heat content, aka thermal energy) remains valid only within very narrow limits.

By way of contrast, note that the Locrian versus non-Locrian distinction is compatible with thermodynamics, as discussed in section 8.2.

It is a bad idea, incompatible with thermodynamics, to overlook the path-dependence of QΓ and pretend that Q is a state function. (See section 6.7.)   It is a fine idea to distinguish Locrian from non-Locrian. (See section 8.2.)

To repeat, it is OK to talk about “heat content” in the context of warming up a baby bottle. It is OK to talk about “caloric” in connection with a swimming pool as it warms up in the spring and cools down in the fall. It is OK to talk about “thermal energy” in connection with the heat capacity of a chunk of copper in a high-school lab experiment.

However, just because it works in cramped situations doesn’t mean it works in uncramped situations.

It is not OK to talk about “heat content” or “thermal versus nonthermal energy” or “caloric” in the context of uncramped thermodynamics, i.e. in any situation where a thermodynamic cycle is possible. Energy is energy. Energy doesn’t recognize the distinction between thermal and nonthermal, and thermodynamics allows us to convert between the two (subject to important restrictions).

The problem is that the Q that appears in equation 83 simply cannot exist in the context of uncramped thermodynamics.

The problem still is that Q exists only within cramped thermodynamics, not more generally, not in any situation where a thermodynamic cycle is possible. You can visualize the problem by reference to figure 2.

On the LHS, if we restrict attention to the red subspace, the path ABC is the only path from A to C.   On the RHS, within the dark-gree subspace there are many ways of getting from A to C, including ABC, ABXBC, ABXBXBC, and so forth.

Within the red subspace, you can represent Q as height, and this Q is well defined everywhere in this small, cramped subspace.   You cannot define a Q value as a function of position in a way that is consistent throughout the dark-green subspace. The peculiar thing is that you can take almost any small sub-subspace of the dark-green subspace and define a consistent Q function there, but you cannot extend this to cover the entire dark-green subspace. The problem is nowhere in particular, yet the problem is everywhere: you cannot assign a consistent height to points in this space.

Pedagogical remarks: Virtually everyone begins the study of thermodynamics by considering cramped situations. This is traditional … but it is a pedagogical disaster for anyone trying to learn uncramped thermodynamics. Cramped thermodynamics is a not a good foundation for learning uncramped thermodynamics; it is aggressively deceptive.

Virtually every newcomer to thermodynamics tries to extend the ‘heat content” idea from cramped thermodynamics to uncramped thermodynamics. It always almost works … but it never really works.

The next time you feel the need for a measure of “heat content” in the context of uncramped thermodynamics, lie down until the feeling goes away.

19  Ambiguous Terminology

There is only one technical meaning of “energy”, but the technical meaning conflicts with the vernacular meaning, as discussed in section 1.6.

There is only one technical meaning of “conservation”, but the technical meaning conflicts with the vernacular meaning, as discussed in section 1.6.

There are multiple inconsistent technical meanings for “heat”, not to mention innumerable nontechnical meanings, as discussed in section 16.

There are multiple inconsistent technical meanings for “work” as discussed in section 17.

There are multiple inconsistent technical meanings for “adiabatic” as discussed in section 14.

In the literature, the term “state” is used inconsistently. It can either mean microstate or macrostate, as discussed in section 11.1.

Similarly, “phase space” is ambiguous:

Phase-space means one thing in classical canonical mechanics; it corresponds to what we have been calling state-space, as discussed in section 11.2.   Phase space means something else in classical thermodynamics; it has to do with macroscopic phases such as the liquid phase and the solid phase.

(Ironically, Gibbs has his name associated with both of these notions.)

I’m not even talking about quantum mechanical phase φ, as in exp(i φ); that’s a third notion, which is not terribly troublesome because you can usually figure out the meaning based on context.

Given how messed-up our language is, it’s a miracle anybody ever communicates anything.

20  Thermodynamics, Restricted or Not

There are various ways of restricting the applicability of thermodynamics, including

Indeed, there are some people who seem to think that thermodynamics applies only to microcanonical reversible processes in a fully-equilibrated ideal gas.

To make progress, we need to carefully distinguish two ideas:

  a) Simplifying assumptions made in the context of a particular scenario. Depending on details, these may be entirely appropriate. Sometimes the gases involved are ideal, to an excellent approximation … but not always. Sometimes a process is reversible, to an excellent approximation … but not always.
  b) Restrictions applied to the foundations of thermodynamics. We must be very careful with this. There must not be too many restrictions, nor too few. Some restrictions are necessary, while other restrictions are worse than useless.

Some thermodynamic concepts and/or formulas necessarily have restricted validity.

In contrast, very importantly, the law of conservation of energy applies without restriction. Similarly, the law of paraconservation of entropy applies without restriction. You must not think of E and/or S as being undefined in regions where “non-ideal” processes are occuring. Otherwise, it would be possible for some energy and/or entropy to flow into the “non-ideal” region, become undefined, and never come out again, thereby undermining the entire notion of conservation.

The ideas in the previous paragraph should not be overstated, because an approximate conservation law is not necessarily useless. For example, ordinary chemistry is based on the assumption that each of the chemical elements is separately conserved. But we know that’s only approximately true; if we wait long enough uranium will decay into thorium. Still, on the timescale of ordinary chemical reactions, we can say that uranium is conserved, to an excellent approximation.

When a law has small exceptions, you shouldn’t give up on the law entirely. You shouldn’t think that just because a process is slightly non-ideal, it becomes a free-for-all, where all the important quantities are undefined and none of the laws apply.

If you want to make simplifying assumptions in the context of a specific scenario, go ahead … but don’t confuse that with restrictions on the fundamental laws.

Also, in an elementary course, it might be necessary, for pedagogical reasons, to use simplified versions of the fundamental laws … but you need to be careful with this, lest it create misconceptions.

Finally, it must be emphasized that one should not ask whether thermodynamics “is” or “is not” applicable to a particular situation, as if it were an all-or-nothing proposition. Some concepts (such as energy and entropy) are always valid, while other concepts (such as equilibrium and temperature) might or might not be valid, depending on the situation.

21  The Relevance of Entropy

The concept of entropy is important in the following areas, among others:

  1. cryptography and cryptanalysis (secret codes)
  2. communications (error-correcting codes, as part of electronic engineering)
  3. computer science, including data-compression codes, machine learning, speech recognition, etc.
  4. librarianship
  5. the design of experiments (reference 9)
  6. the physics of computation
  7. the design of refrigerators, heat pumps, and engines (including piston, turbine, and rocket engines)
  8. nuclear engineering (reactors and weapons)
  9. fluid dynamics
  10. astrophysics and cosmology
  11. chemistry and chemical engineering

Very roughly speaking, the items higher on the list can be assigned to the “information theory” camp, while the items lower on the list can be assigned to the “thermodynamics” camp. However, there is tremendous overlap between the two camps. The approach of understanding the microscopic quantum states and using that to explain macroscopic observables such as energy, entropy, temperature, etc. is called statistical mechanics; see e.g. reference 22 and reference 36. Examples of this include

a)  The physics of computation is squarely in both camps; see reference 30, reference 31, and reference 37.
b)  Things like Maxwell demons and Szilard engines are squarely in both camps; see reference 38 and <