Why the Sky is Blue

1 Why “Is” The Sky Blue?

Before we ask why the sky is blue, we should ask whether the sky is blue. There are many times and places where the sky is, in fact, not blue. There is some nifty physics that explains the blue part of the story, which is our topic for today, but we should keep in mind that it is not the whole story.

We shall consider the restricted case of standing on the earth’s surface in the daytime under a clear sky, without clouds, dust, or pollution. This is sometimes a decent approximation in real life. As we shall see, under such conditions we expect the sky to be blue.

A crucial sub-goal will be to understand why the scattering depends on the fourth power of the wavelength.

There are lots of pseudo-explanations out there that focus attention on a piece of sky “one wavelength on a side”. However, I believe any alleged explanation of that sort is wrong physics, and misses the right physics, as will become clear below.

2 Executive Summary

Here is an “executive summary” that outlines where the discussion is going. For details see section ‍3.

Divide the interaction region in to P zones and N zones. The contributions from the P zones are roughly 180 degrees out of phase with the contributions from the N zones, so if the zones were balanced the amplitude of the scattered wave would be zero.
Distribute the air molecules into these zones at random, like sand on a checkerboard.
Due to statistical fluctuations, there will be some imbalance between P and N, and therefore some scattering. The chance of an imbalance is independent of wavelength.
The significance of an imbalance does, however, depend on wavelength. It is a stronger source when the wavelength is small. The scattering amplitude is proportional to k², and the intensity is proportional to k⁴.

3 Detailed Explanation

Consider the following diagram of the interaction:

        .
       .    .
      .    .    .
     .    .    .    .
    .    .    .    .    .
        .    .    .    .    .
            .    .    .    .    .               transmitted -->
    |   |   |   /   |/  | / |  /|   /   |   |   |   |   |   |   |
    |   |   |   |   /   |/  | / |  /|   /   |   |   |   |   |   |
    |   |   |   |   |   /   |/  | / |  /|   /   |   |   |   |   |
    |   |   |   |   |   |   /   |/  | / |  /|   /   |   |   |   |
    |   |   |   |   |   |   |   /   |/  | / |  /|   /   |   |   |
      incident -->                  /    /    /    /    /
                                        /    /    /    /    /
                                            /    /    /    /    /
                                                /    /    /    /    /
                                                    /    /    /    /    /
                                         scattered      /    /    /    /
                                                  -->       /    /    /
                                                                /    /
                                                                    /

Figure ‍1: The Interaction Region

The lines represent the crests of the waves.

The dots don’t have much physical significance. They are just the backwards extrapolation of the scattered beam. Loosely speaking, they represent the direction the scattered beam “appears” to be coming from. Ignore the dots if you like.

Here is the diagram again, with labels on some points in the interaction region:

        .
       .    .
      .    .    .
     .    .    .    .
    .    .    .    .    .
        .    .    .    .    .
            .    .    .    .    .               transmitted -->
    |   |   |   P   |/  | N |  /|   P   |   |   |   |   |   |   |
    |   |   |   |   P   |/  | N |  /|   P   |   |   |   |   |   |
    |   |   |   |   |   P   |/  | N |  /|   P   |   |   |   |   |
    |   |   |   |   |   |   P   |/  | N |  /|   P   |   |   |   |
    |   |   |   |   |   |   |   P   |/  | N |  /|   P   |   |   |
      incident -->                  /    /    /    /    /
                                        /    /    /    /    /
                                            /    /    /    /    /
                                                /    /    /    /    /
                                                    /    /    /    /    /
                                         scattered      /    /    /    /
                                                  -->       /    /    /
                                                                /    /
                                                                    /

Figure ‍2: The Interaction Region, Labeled

Let’s use this to identify a pattern in the index-deviations that will result in strong scattering.
- At each point P, a crest lines up with a crest. A positive deviation in the index at this point will make a positive contribution to the overall interaction.
- At each point N, a crest lines up with a trough. A negative deviation in the index at this point will make a positive contribution to the overall interaction.
In fact, we can classify every point in the interaction region (not just the special points mentioned in the previous item) according to whether a positive or negative index-deviation (at the given point) results in a positive contribution to the desired interaction. This defines a notion of “P zones” and “N zones”.
We have made some mildly arbitrary choices about the relative phases. This results in no loss of generality.

We assume that the interaction region is transparent to zeroth order. This is consistent with (and stronger than) previous assumptions.
Note that the P zones collectively cover half the interaction region, while the N zones cover the other half.
This assumes the interaction region is reasonably large relative to lambda. This is consistent with previous assumptions.

This assumes the scattered beam does not coincide with the transmitted beam. In the other case (i.e. forward scattering) this half-and-half property does not hold, which is a good thing; the distinction allows us to uphold the optical theorem, conservation of energy, and other good things.
We could, if we wanted, use what we have just learned to construct a fancy diffraction grating (actually a hologram) that would create the strongest possible scattered beam. We just put many scatterers in the P regions and few (if any) scatterers in the N regions.
Note that this part of the argument is independent of wavelength, for over the relevant range of wavelengths. In the atmosphere, the interaction region is vastly larger than wavelength cubed. That means there are a vast number of P regions, covering half of the interaction region. If the wavelength is smaller, each P zone will be smaller, but there will be more of them, and collectively they will still cover 50% of the interaction region.
Now the statistical question reduces to this: what is the chance that the air will fluctuate into a configuration that has an extra-large number of molecules in the P zones, and an extra-small number of molecules in the N zones? Essentially we are talking about thermally-excited sound modes.
The amplitude of these excitations should be independent of frequency, since the compressibility of air doesn’t depend on wavelength.
We are approximating the air as an ideal gas. This should be a very good approximation.
Compressibility being independent of wavelength is another line of argument supporting the previously-mentioned point that we should not focus attention on a single region that has volume on the order of wavelength cubed.
If you don’t want to think of it in terms of compressibility, you can instead use elementary notions of statistical mechanics. That is, we model the distribution of air molecules as a random statistical process. Distributing the air molecules into P and N regions is like scattering sand particles at random onto a checkerboard. It doesn’t matter how large are the black and white cells on the board. It doesn’t matter whether they are even square or not. As long as half of the board is black and half of the board is white, and as long as the sand-grains are randomly and independently distributed, about half of them will land on black squares and about half will land on white squares.
Applying this idea to the air: We have every reason to believe that half of the interaction region is P-type, and half is N-type. We have every reason to believe that air is an ideal gas, to a good enough approximation, indeed more than good enough. Therefore it does not matter how big the P-regions and N-regions are.
This point is worth emphasizing because there are widespread misconceptions that somehow the light is being scattered by “small” regions where the density differs from the average.
The fluctuation in index of refraction is proportional to the fluctuation in the number of atoms, according to the Clausius-Mossotti relation.
This assumes the index is not too different from unity. This is a good assumption for gases under ordinary conditions. If you care about liquids, you could easily relax this assumption and redo the following derivation, thereby obtaining a more general result.
We can use a variant of the Born approximation to understand how a fluctuation in refractive index produces scattering. In a uniform medium we can write the wave equation as:

v² (∂ / ∂ x)² φ − (∂ / ∂ t)² φ = 0 ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍(1)

where v is the speed of light in the medium,¹ x is position, and t is time. We call equation ‍1 the “unperturbed” wave equation, for reasons that will be obvious in a moment.
There is a one-to-one relationship between the index of refraction and the speed of light in the medium. To first order, the relationship is:

v = v₀ / (1 + d) ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍(2)

where (1 + d) is the index, 1 is the average index (not the index of the vacuum) and d is the fractional deviation from the average index, and v₀ is the propagation speed associated with the average index. We assume d is small compared to unity.
So in a slightly non-uniform medium we have, to first order in d:
v₀² (1−2d) (∂ / ∂ x)² φ − (∂ / ∂ t)² φ = 0 ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍(3)

and by re-arrangement:

v₀² (∂ / ∂ x)² φ − (∂ / ∂ t)² φ = 2d v₀² (∂ / ∂ x)² φ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍(4)

So, the term involving the index-deviation d can be moved to the RHS. This term can be considered a driving force, i.e. a source term added to the RHS of the unperturbed wave equation (equation ‍1).
We are assuming that the scattering is not too strong, and the interaction region is not too overly huge, so that we imagine that the light gets scattered at most once. We account for light scattering from the incident beam into the scattered beam, while ignoring any possible secondary scattering (out of the scattered beam). This is called the first Born approximation.
For simplicity, we are ignoring polarization. It would introduce some factors that depend on theta (the scattering angle). You can add them in if you like.
For clarity, I left out the y and z variables in equation ‍4. But you get the idea.
But wait, 2d is not the only factor on the RHS of equation ‍4. The source term is not 2d times the wavefunction, it is more like 2d times the second derivative of the wavefunction. For periodic waves, the scattered amplitude picks up a factor of k² (or ω²), and the scattered power picks up a factor of k⁴ (or ω⁴).
This factor of k⁴ is not “on top of” the factor of k⁴ you would get in the formula for scattering from a single molecule. It is the same factor, derived in a slightly more general context. Specifically, you can view the isolated atom as an isolated localized deviation in the index of refraction of the vacuum.

4 Remarks

There is only a rather narrow set of conditions that can give a planet a blue sky. An atmosphere with too little depth and/or too little density would have very little scattering of any kind, so the sky would be black with only a slight tinge of deep blue. At the other extreme, an atmosphere with too much depth and/or too much density would have a lot of second-order and higher-order scattering, violating some of the assumptions made in section ‍3 and leading to a murky white sky.
As Einstein pointed out in reference ‍1, the color of the sky can be used to pin down the size of air molecules, within a rather narrow range. The argument goes like this: Macroscopic benchtop measurements tell us the overall refractive index of a parcel of air, but they don’t tell us whether that is due to a smallish number of highly refractive molecules, or a larger number of less refractive molecules. However, the fluctuations in the refractive index do depend on how many molecules there are. If there are N molecules in the parcel, the fluctuations in the number will scale like √(N). Meanwhile, the refraction per molecule must scale like 1/N, to be consistent with the macroscopic observations, so the fluctuations in the index scale like √(1/N). That means that if atoms were 100 times smaller than they really are, there would be 10 times less scattering in the atmosphere, and the sky would be almost black in the daytime.
Here is an analogy that might help: Consider an altocumulus standing lenticular (ACSL) cloud, such as the one shown in figure ‍3. The cloud remains stationary over the mountain as the air flows past the mountain and through the cloud. The air is nearly transparent before entering the cloud, cloudy while within the cloud, and nearly transparent again afterwards.

Figure ‍3: Standing Lenticular Cloud
Furthermore, suppose a similar cloud produced a quarter inch of rain. If you look at a sheet of water, a quarter inch thick, it is almost perfectly transparent.
Each water molecule is very small, so it is not, by itself, a very strong scatterer. On the other hand, they are very numerous, and if they scatter coherently, the effect is quite large.
We get strong scattering from the cloud because there are tiny droplets, a fraction of a micron across. All N molecules in a given droplet scatter more-or-less coherently, in phase with each other, so the scattered intensity is proportional to N squared rather than simply N. Each droplet, however, is out of phase with other droplets.
In the sheet of liquid water, the density is the same everywhere, so in accordance with the Huygens construction, you get very strong forward scattering. In other words, you get an index of refraction but no sideways scattering.
The moral of the story is you can’t think in terms of scattering by individual molecules. The intensity of the scattering you see from the blue sky, or from the white cloud, is determined by the variations in the density.

5 References

: 1.
Albert Einstein, “Theorie der Opaleszenz von homogenen Flüssigkeiten und Flüssigkeitsgemischen in der Nähe des kritischen Zustandes”, Annalen der Physik, 33, 1275–1298 (1910). http://www.physik.uni-augsburg.de/annalen/history/einstein-papers/1910_33_1275-1298.pdf
English translation: https://einsteinpapers.press.princeton.edu/vol3-trans/245

1: We reserve the symbol c to denote the speed of light in a vacuum.

[Contents]