Let’s consider waves on a string. The string has a uniform mass per unit length ρ and a uniform tension τ. For simplicity, we assume zero stiffness, so the only restoring force comes from the tension. Similarly we assume the amplitude is small compared to the shortest wavelengthcomponent of interest, so that we can make the smallangle approximation. We denote the ordinate of the wavefunction as Y, and we consider it to be a function of position x and time t. The equation of motion is then
 (1) 
where the speed of propagation is
 (2) 
We can easily calculate the energy in the wave. At any given moment, in any given region, the kinetic energy is given in terms of mass and velocity squared, in the usual way:
 (3) 
where the integral runs over the region of interest.
Meanwhile the potential energy is given by force times distance, in the usual way. The force is proportional to the tension, and to the local curvature. Since the system is linear, the average force is equal to half of the peak force.
 (4) 
The minus sign in equation 4 can be understood as follows: Imagine pulling up on the string. The PE is positive, Y is positive, and the curvature is downward, i.e. negative.
As a check, you can subtract equation 4 from equation 3 to form the Lagrangian, and then use the variational principle to derive the equation of motion (equation 1) in the usual way. This will require integrating the RHS of equation 3 by parts, to convert the squared first derivative into a second derivative.
The string exhibits an interesting gauge invariance: Adding a constant to the ordinate of the wavefunction does not change the energy; it just moves the whole string to a new location. The physics is translationinvariant.
This is consistent with the factor of k^{2} that we will see in equation 6c: in the limit as k goes to zero, the energy goes to zero, even for nonzero wave amplitude.
Let us consider the special case of a monochromatic wave with amplitude A:
 (5) 
Then there is a simple expression for the energy:

where we have used the dispersion relation:
 (7) 
For linear waves, it would be correct to say that for any given shape of wave, the energy scales like the square if the amplitude of the wave. That is, we can write the energy as a functional of Y:
 (8) 
for any uniform scalar scalefactor α.
For nonlinear waves, all bets are off. Sound waves become nonlinear if the amplitude gets large enough.
As a separate matter, if the medium is dispersive, all bets are off. The electromagnetic field in a waveguide has a nontrivial dispersion relation. Waves on the surface of water have a nontrivial dispersion relation. It is possible to handle this case, but it’s a lot of work, and probably beyond the scope of the introductory course.
Even in a situation where plane waves are nondispersive, the wave equation becomes effectively dispersive if you write it in polar coordinates. This is relevant to the physics of sound coming from a point source.
We must take care to distinguish the electromagnetic field F from the electromagnetic potential A. We define the electromagnetic 4vector potential A such that the electromagnetic field bivector is
 (9) 
This works in four dimensions but not three, as discussed in reference 1. In this document, all operators, vectors, and bivectors reside in four dimensional spacetime. This includes A, F, and ∇.
The potential exhibits gauge invariance. We now choose a gauge within the Lorenz gauge family. That is,
 (10) 
That means we can rewrite equation 9 as:
 (11) 
We now bring in the Maxwell equation for a region of space devoid of charge and current, and an obvious corollary thereof:
 (12) 
Using equation 11 to substitute for F, we obtain:

We recognize equation 13b as the wave equation for the fourvector potential A. This is a wellknown result: if we choose a gauge within the Lorenz gauge family, the Maxwell equations imply that the potential satisfies a wave equation.
Within the Lorenz gauge family, some gauge invariance remains. In particular, adding a constant to A does not change the energy, and indeed does not change the physics at all. This is analogous to the gauge invariance exhibited by the string.
We now turn from the electromagnetic potential (A) to the electromagnetic field (F). The Maxwell equation implies that F itself satisfies a wave equation, in sourcefree space. Starting from equation 12, take the divergence of both sides:

We recognize equation 14b as the wave equation for the electromagnetic field.
As a corollary, if we choose a frame (so that we can define E and B), there is a similar wave equation for the Efield by itself, and for the Bfield by itself. In this frame, the energy is proportional to E^{2}. It is also proportional to B^{2}. This is the Poynting energy, as discussed in reference 1.
It is sometimes amusing to rewrite equation 14b, expressing the field in terms of the potential:
 (15) 
Note that gauge invariance does not allow us to change the field F. It only allows us to shift the potential A.
For ideal plane waves in air, the wave equation is closely analogous to equation 14b. The energy is proportional to the square of the amplitude of the wave.
There is no gauge invariance here.
Note the following contrast:
For strings and for the electromagnetic potential, there is a factor of ω^{2} in equation 6d. The wave energy depends on the amplitude of the wave, and on the wavelength.  For plane waves in the electromagnetic field, and for ideal acoustic plane waves in air, no such factor appears. The energy is simply proportional to the squared magnitude of the wave, and the constant of proportionality does not depend on frequency (or wavelength). 
The ordinate of the wavefunction can be shifted by a gauge transforation.  The ordinate of the wavefunction cannot be shifted by a gauge transforation. 
To organize things the other way around:
factor of ω^{2}  no factor of ω^{2}  
string  air  
electromagnetic potential  electromagnetic field 
In particular, the extra derivative in equation 15 (compared to equation 14b) guarantees that there will be an extra factor of k^{2} in the expression for energy in terms of potential (compared to energy in terms of field).
Alas, in reference 2 it says
The energy in a wave is proportional to the square of its amplitude. For a wave of complex shape, the energy in one period will be proportional to ∫_{0}^{T} f^{2}(t) dt.
In other words:
 (16) 
You can get away with that for waves in the electromagnetic field and for ideal acoustic waves in free air ... but not for waves on a string nor for waves in the electromagnetic potential.
As a qualitative check on what I’m saying:
If you shift the position of the entire string, it does not change the energy. Similarly, if you add a constant to the electromagnetic potential, it does not change the energy.  If you add a constant to the electromagnetic field, it changes the energy. Similarly, if you add a constant to the pressure of the air, it changes the energy. 
This is consistent with energy being proportional to ω^{2}f^{2}, not simply f^{2}.  This is consistent with the energy being proportional to f^{2}. 
It should be obvious that whenever the ordinate (f) of the wave equation exhibits some kind of gauge invariance, the energy cannot be simply proportional to f squared.
In more detail: Let f denote the ordinate of the wavefunction, as in equation 16. Possibilities include:
The book goes on to say:
We can also relate this energy to the Fourier coefficients. We write
∫_{0}^{T} f^{2}(t) dt =
∫_{0}^{T} ⎡
⎢
⎢
⎣a_{0} +
∞ ∑ n=1 a_{n} cos(nωt) +
∞ ∑ n=1 b_{n} sin(nωt) ⎤
⎥
⎥
⎦
2
dt (50.22) (17)
Hypertechnically speaking, if we take equation 17 out of context, the equation is mathematically correct, in the sense that the LHS is equal to the RHS. This is basically Parseval’s identity, which can be considered a generalization of the Pythagorean theorem. The problem is that in context, the book is claiming this is proportional to the energy, which is just not correct in general. Evidently Feynman was thinking about waves in air and/or waves in the electromagnetic field, not waves on a string or waves in the electromagnetic potential.
This leaves us with a question: How best to fix these bugs?
This has the advantage that it adds only a few words.
However, this would make the discussion less useful, incomparably less elegant, and possibly misleading. It would leave readers with the burden of figuring out whether and how to generalize the result to other kinds of waves. As discussed in section 3.2, this is the only result in the whole chapter that is restricted to acoustics.

This would require a sentence or two of additional text, to explain the relationship between the two equations.
In cases where we care about (f′)^{2}, we can differentiate the Fourier expansion of f term by term. Applying this idea to equation 17 we get:
 (19) 
In any case, we need some words restricting the discussion to nonlinear, nondispersive situations.
Note that the factor of ρ that appears in equation 6d is (by hypothesis) a uniform scalar, so it drops out of equation 17, and does not affect the claim that the energy is “proportional” to the square of f.
Because the book is integrating over dt rather than dx, we need to do a little bit of extra work to tighten up the proof that equation 17 is not correct. So, consider a wave with the following “complex shape”:
 (20) 
where є is very small compared to 1 and M is some integer. The idea here is that the “fundamental” component serves to ensure that the period of the wave (T) is independent of M. On the other hand, it is so small compared to the payload that it makes a negligible contribution to the energy.
By changing M in equation 20, we can construct waves on a string that have the same ∫_{0}^{T} f^{2}(t) dt but wildly different energies.
It could be argued that the f(t) that appears in section 505 is restricted to acoustics only. This is how it is introduced in section 502, namely air pressure as a function of time. However:
Maybe this generalization was intended to apply to equation 502 only, but even so, it means the definition of f was at least temporarily generalized.
so that our formula will be completely general
for a wide class of functions, in fact for all
that are of interest to physicists
so once again, the definition of f has been generalized.
energy theorem
Usually theorems are completely general, unless the restrictions are clearly stated, or obvious from context.
the same equations have the same solutions
The equations for many different physical situations
have exactly the same appearance.
Of course, the symbols may be different–one letter
is substituted for another–but the mathematical
form of the equations is the same. This means that
having studied one subject, we immediately have a
great deal of direct and precise knowledge about
the solutions of the equations of another.
To summarize:
Maybe I’m oldfashioned, but when I am serving as writer, reviewer, or editor, I look at things from the readers’ point of view. I don’t see how an ordinary mortal reader is supposed to divine that the squarelaw energy formula is the only result in the whole chapter that is restricted to acoustics.
To be fair to students, we need to be consistent. Let me explain what I mean, by giving an example of inconsistency:
Sometimes we complain that students take things too literally, when they should be generalizing. I quote from Bruce Sherwood’s blog, expressing an idea that was also discussed on the physl discussion list.
His problem was that he knew a canned procedure that if you have an x, and there’s an exponent, you put the exponent in front and reduce the exponent by one, and that thing is called “dy/dx” but has no meaning. There is no way to evaluate dS/dE starting from aE^0.5, because there is no x, there is no y, and nowhere in calculus is there a thing called dS/dE.
In other words, the student was being too literal, analyzing things at the lexical level, not the semantic level. For context and additional details, see the blog.
In the same vein, a lot of people (including me!) like to emphasize the unity, power, and grandeur of physics. We quote Feynman’s dictum:
The same equations have the same solutions.
On the other hand ... Suppose a student takes some formula – such as the one that says wave energy is proportional to the square of the amplitude – and applies it outside its range of validity. Then we accuse them of “equation hunting”.
The twin problems of overgeneralization and undergeneralization go away if the student understands what’s going on.
However, the squarelaw energy formula was presented without proof, without derivation, without explanation, without any alternative, and without any expressed limits on its range of validity. Just a bold assertion.
In such a case, we are in no position to criticize the student for not fully understanding the formula. We are in no position to complain about overgeneralization or undergeneralization.
As I see it, the most interesting issues here are
When is the energy proportional to the square of the amplitude, and when is it not? How do we know? How do we teach people to think about such things properly?
As a very minor followon point, while we’re fixing the physics we might as well fix the terminology. Calling equation 17 “the energy theorem” makes no sense. As far as I can tell by googling, nobody uses that terminology. It would make more sense to call it an example of Parseval’s identity.