A lot of people think there is a unique, welldefined notion of “the” kinetic energy ... but in fact there is a range of different concepts all of which are sometimes called “the" KE:
KE[microscopic] = ∑½ p_{i}^{2} / m_{i} (1) 
where p_{i} is the momentum of the ith particle, m_{i} is the mass of the ith particle, and the sum runs over all particles.
KE[holoscopic] = ½ P^{2} / M (2) 
On the other side of the same coin: If somebody asks you to calculate the thermal KE, you aren’t expected to include the organized rotational KE. So you would have to evaluate KE[microscopic] and subtract off KE[mesoscopic].
Note that we don’t consider the flywheel to be mesoscopic; rather, we partition an ordinary flywheel into mesoscopic cells when we evaluate KE[mesoscopic].
A moregeneral approach would be to specify a lengthscale “λ" specifying the resolution, i.e. how closely we are going to look at things. Then we partition the object into cells of size λ, and define
KE[λ] := ∑½ p_{k}^{2} / m_{k} (3) 
where the sum runs over all cells.
Remarkably, the value of KE[λ] is not very sensitive to the choice of λ, over a wide range, as we now discuss.
Consider a flywheel in the form of a solid cube with edgelength L = 1 meter. Choose λ = 1 cm; that is, partition the object in to a million cubelets each 1 cm on a side.
We can make a scaling argument. The moment of inertia scales like r^{2} m. Since the cube and cubelets all have similar shape (similar in the strict sense of Euclidean geometry), we don’t need to worry about dimensionless factors in front of the scaling formula.
The moment of inertia of each cubelet scales like λ^{5} ... three factors of λ for the mass and two factors for the r^{2} in r^{2} dm. The number of cubelets scales like 1/λ^{3}, so when we sum over cubelets we find that the total KE[є] tied up strictly inside the cubelets scales like λ^{2} (not including the centerofmass motion of the cubelet). That is,
KE[є] − KE[λ] ≈ (λ/L)^{2} KE[є] (4) 
where є is some lengthscale small compared to λ but large enough to wash out any ultramicroscopic motions (e.g. thermal agitation).
In our numerical example, λ = 1 cm, so
KE[1cm] ≈ 99.99% KE[є] (5) 
for any є that is small compared to 1 cm but still large compared to atoms.
In mechanics, the definition of work is ambiguous, but only mildly so. (By way of contrast, in thermodynamics, the ambiguities are more numerous and much more serious, as discussed in reference 3. Thermodynamics is beyond the scope of this document.)
In mechanics, all notions of “work” have something to do with force times distance.
The conventional definition of work done on an object is:
work =  ∫ 
 F · ds (6) 
where the integral runs along some path Γ, namely the path taken by the point of application of the force, and where the displacement ds is a step along the path.
The laws of physics require us to know the direction and magnitude of the force, and also the point of application of the force.
Beware: If you want to calculate the work, it is generally not safe to multiply the “average” force by the “average” displacement. Sometimes you can get away with that, but sometimes you can’t. For example, consider the wheel shown in figure 1. The hands pull on the strings (shown in blue) which in turn pull on the wheel, causing it to spin faster. The average force on the wheel is zero and the average displacement is zero, but the work being done on the wheel is definitely nonzero.
work = 
 ∫ 
 F_{i} · ds_{i} (7) 
where the sum runs over all cells in the object. The idea here is that as each cell becomes smaller, the energy associated with internal motion within the cell becomes small ... not just small, but disproportionately small, so that even after summing over all cells the internal motions make a negligible contribution to the total energy. For example, in figure 1, the center of mass of the wheel as a whole is not moving, and the rotational kinetic energy of the wheel is considered “internal” to the wheel. In contrast, if we break the wheel into small cells, the center of mass of each cell is moving, and these centerofmass motions carry the lion’s share of the kinetic energy. (Each cell is also rotating, but these “internal” rotational energies are disproportionately small, and don’t add up to much, in accordance with the scaling argument that leads to equation 4.)
We now introduce another definition of work. This definition is somewhat more sophisticated.
Rather than talking about the work done on the object, we talk about the work done on the boundary of the object, and more specifically on various parts of the boundary.
For example: suppose I am pushing a car up a ramp at a steady rate. I am pushing forward on the car, while other forces (notably a component of the gravitational force) are pushing backwards. It certainly feels to me like I am doing work. Indeed I am doing work, in the sense that energy is crossing the boundary from me into the car, via the part of the car I am pushing on. This energy flows through the car without accumulating in the car. That is, as quickly as the energy flows into the car (from me) it flows out again (into the gravitational field). If we consider the car as a whole, the work is zero in this situation, but if we divide the boundary into parts, there can be nonzero work on thisorthat part.
For each lengthscale λ, we can establish a work[λ]/KE[λ] theorem. Specifically, for each cell, we define work using equation 6, and the total work[λ] is just the sum over cells in the obvious way. As we shall see in section 7, the theorem states
Δ KE[λ] = work[λ] (8) 
We speak of the work/KE theorems, plural, because there is a separate theorem for each lengthscale λ.
When λ is large, work[λ] is sometimes called the pseudowork. See e.g. reference 4.
Returning to the example of pushing a car up a ramp: The KE of the car is not changing, which is consistent with the pseudowork/KE theorem, because the total work done on the car is zero ... even though my local contribution to the work is nonzero.
Another bit of terminology that may be helpful. For any object (or cell or subcell):
KE_{cm} = ½ P^{2} / M. (9) 
V = P/M (10) 
The differentialmode momentum of the ith particle is
p′_{i} = p_{i} − V m_{i} (11) 
The differentialmode KE of the particle is ½ p′_{i}^{2} / m_{i} and the differentialmode KE of the object is
KE_{dm} := ∑½ p′_{i}^{2} / m_{i} (12) 
This is useful because
KE = KE_{cm} + KE_{dm} (13) 
which in turn helps you understand the work[λ]/KE[λ] theorem.
Starting from equation 7, we can reexpress the total work as:
 (14) 
where v_{i} is the velocity of the point of application of the ith force, i.e. the force applied to the ith cell.
Here we have written a tilde over the F~_{i}, to remind us that force is extensive, so that the force on an average cell is that cell’s share of the total force. Ditto for w~_{i}, which represents the cell’s share of power. This stands in contrast to the velocity, which gets no tilde because it is intensive.
We can define the average velocity as:
 (15) 
Similarly we can define the average share of the force:
 (16) 
We now define the variations:
 (17) 
Hence
 (18) 
and of course
 (19) 
Plugging equation 18 into equation 14 we find:
 (20) 
where F is the total force (not the average share of force). Note that the two terms that were linear in the variations dropped out, because of the sum rule, equation 19.
The last term in equation 20 can be considered an inner product twice over. The explicit dot product is an ordinary threedimensional realspace inner product ... but the sum over i can also be considered an inner product, in an abstract Ndimensional space. This term goes to zero if the variations in force are perpendicular (in real space) to the variations in velocity, in which case every term in the sum over i is separately zero ... but it also goes to zero if these terms are individually nonzero but add up to zero, which is to say that the terms are uncorrelated.
In any case, whenever the sum over i goes to zero, it means we can think about the system in macroscopic terms. That is, we can calculate the work using the total force F and the average velocity ⟨v⟩.
Perhaps the simplest situation in which the sum goes to zero is the situation where all the cells in the system are moving with the same velocity, so that δv_{i} = 0 for all i.
We can learn something new from equation 14 if we divide the force by mass, and multiply the velocity by mass:
 (21) 
where a_{i} is the acceleration of the ith particle, m~_{i} is its share of the mass, and p~_{i} is its share of the momentum.
The rest of the calculation runs closely parallel to section 4.2. The only difference is that we are using differentlyweighted averages. We obtain:
 (22) 
where p is the total momentum of the system.
Perhaps the simplest situation in which this sum over i vanishes is when all cells have the same acceleration, such as might result from a uniform gravitational field. In such a case we can multiply and divide by the total mass to obtain:
 (23) 
where F is the total force and v_{cm} is the centerofmass velocity.
Let’s take a slight detour to talk about momentum.
Energy is important. Momentum is important. Each obeys a local conservation law. The two concepts are intimately related, but they are not the same.
For instance, consider a box containing 13 particles moving to the left, plus 13 particles moving to the right, all with comparable speeds.
Each of the 26 particles has some momentum, but the momentum of the leftmoving particles is opposite to the momentum of the rightmoving particles, so the system as a whole has little if any overall momentum.  Each of the 26 particles has some kinetic energy, and all of them make a positive contribution to the total energy of the system. 
If we apply a force to the system, it changes the momentum. If all we care about is the momentum of the system, it doesn’t matter where we apply the force; any force applied anywhere in the system has the same effect on the overall momentum.  If we care about the energy, it matters a great deal where we apply the force. A leftward force applied to a leftwardmoving particle increases the system energy; the same force applied to a rightwardmoving particle decreases the system energy. 
Specifically, the change in total momentum is:
where the index i runs over all particles in the system, p_{i} is the momentum of the ith particle, F_{i} is the force applied to that particle, and where F_{tot} := ∑F_{i} refers to the total force on the system. 
Specifically, the change in microscopic kinetic
energy is
where x_{i} is the position of the ith particle, and m_{i} is its mass. This is called the work / kineticenergy theorem. The notion of work is discussed in section 7; see also reference 3. 
To summarize: Momentum is denoted p and is related to force times time.  Kinetic energy is p^{2}/2m and is related to force times distance. 
As discussed in section 1, we divide the system into cells. Let’s look at the square of the momentum of one of the cells, and see how it changes when we apply a force:
 (26) 
or, simply,
 (27) 
where m is the total mass of this cell, v := p/m is the velocity of its center of mass, x is the distance traveled by the center of mass, and F is the total force (i.e. net force) applied to the cell.
We rearrange it so it has dimensions of energy:
 (28) 
And then sum over all cells
 (29) 
where the sum runs over all cells.
This proves the work/KE theorem, in particular the work[λ]/KE[λ] theorem, equation 8.
Equation 24 is useful if you know a certain force is applied for a certain time; equation 27 is useful if you know a certain force is applied while the cell moves a certain distance.
You might be tempted to take the limit of ultramicroscopic cells (λ → 0). In theory, this would make the theorem very powerful, in the sense that you would have accounted for all the kinetic energy. In practice, the drawback is that such a theorem would be very hard to apply, because it would require knowing every detail of the motion and every detail of what force is applied at which point. For example, when calculating the KE of a flywheel, do you include the KE of electrons whizzing around inside individual atoms? You could, but it would be unconventional and almost certainly not worth the effort.
At the other extreme (large λ) the theorem is easy to apply, but you have to remind yourself (and remind all other stakeholders) that you are calculating the commonmode kinetic energy. You shouldn’t call it “the” kinetic energy unless it is superclear from context that that’s what you mean.
Remarks:
Here’s another interesting contrast:


All the forces are summed (and all the displacements are summed) before multiplying. The sum over displacements is a weighted sum, weighted by mass.  Each microscopic force is multiplied by the corresponding microscopic displacement before summing. 
In general, it makes a huuuuge difference whether the multiplication occurs before or after the summation. It also makes a huuuuge difference whether or not a weighted sum is used.