For the sake of completeness, this section makes explicit the limits of validity for equation 2.2, and explains how to handle the unusual situations where it is not valid.
Equation 2.2 is almost the most general formulation of the entropy idea.  Equation 27.6 is truly the most general formulation. 
If you are using an ordinary computer and an ordinary communication channel, measuring bits and the probability of bits, equation 2.2 serves just fine.  If you are using a quantum computer and/or a quantum communication channel, measuring qubits and the amplitude of qubits, you presumably didn’t get into that situation by accident, and you will be well aware of the need to use equation 27.6. 
If you are dealing with a nondegenerate macroscopic system, equation 2.2 should serve just fine. 
If you are dealing with
(say) the heat capacity of a superfluid, superconductor, or other
system that exhibits macroscopic quantum coherence, you will
presumably be aware of the need to use equation 27.6. Most commonly, equation 27.6 is used in conjunction with ultramicroscopic systems. As an example, consider the sp^{3} atomic orbitals, which are a coherent superposition of the {s,p_{x},p_{y},p_{y}} orbitals. 
It is moreorless impossible to formulate a theory of thermodynamics without a concept of microstate. Microstates are best understood in terms of quantum states, which is the approach taken throughout this document.
There is, of course, more to quantum mechanics than the notion of state. There is also the notion of probability amplitude (or simply amplitude); the absolute square of the amplitude is the probability.
For many purposes, the probability tells us everything we need to know, so we don’t directly care about the amplitude.  However there are situations where the system exhibits coherence between the quantum states. We sometimes say the system is in an entangled state. Schrödinger’s Cat is a wellknown example, perhaps an unnecessarilycomplicated example. 
When a system exhibits both coherent superposition and incoherent (thermal) mixing, the best way to represent it is in terms of a density matrix.
Any pure quantum state ψ⟩ is represented by a density matrix ρ which can be expressed as an outer product:
ρ = ψ⟩⟨ψ (27.1) 
That means that for an Ndimensional state space, the density matrix will be an N×N matrix.
Let’s look at some examples. Suppose the statespace of the system is spanned by two basis states, 1⟩ and 2⟩. Each of these states can be represented by a state vector, or by the corresponding density matrix.
 (27.2) 
Things get more interesting when we consider a state that is a coherent superposition of the two basis states:
 (27.3) 
Note that the diagonal elements of the density matrix can be interpreted as the probability of the basis states, and they necessarily sum to unity, as you can see in each of the examples. The offdiagonal elements represent correlations between the the basis states.
Things get even more interesting if we allow an arbitrary phase, as follows:
 (27.4) 
It turns out that in many situations, especially macroscopic situations, there are physical processes that perturb the phase of a superposition such as this. If we take the average over all phases, we get:
 (27.5) 
which for the first time shows us the power of the densitymatrix formalism. The object in equation 27.5 does not represent a pure quantum state, but rather the incoherent (thermal) mixing of states. This stands in contrast to previous equations such as equation 27.4 which did represent pure quantum states.
Note that equation 27.5 could have been obtained by taking a 50/50 mixture of ρ_{1} and ρ_{2} as given in equation 27.2. This is an example of the general rule that thermal mixtures can be computed by averaging the density matrices of the ingredients.
Notice the great power of the density matrix: Whereas a quantum state vector a⟩ represents a microstate, a suitable density matrix ρ can fully represent a macrostate.
Reference 35 contains many more examples of density matrices.
There is a wonderfully simple test to detect pure states, by looking at the square of the density matrix. If and only if ρ^{2} = ρ, the density matrix represents a pure state; otherwise it represents a mixture. Pure states have zero entropy; mixtures have entropy greater than zero, as we shall see in connection with equation 27.6.
Note that equation 27.4 represents a pure state while equation 27.5 does not – even though they have the same diagonal elements, i.e. the same statebystate probabilities for the basis states. The offdiagonal terms, i.e. the correlations, make a significant contribution to the entropy.
In all generality, for a system characterized by a density matrix ρ, the entropy is given by
S := − Tr ρ log ρ (27.6) 
This is the most robust definition of entropy. This is the gold standard. For many cases, i.e. when we can ignore quantum entanglement, it reduces to equation 2.2. Other expressions may be useful in morerestricted cases (as in section 9.6 for example) but you can never go wrong using equation 27.6.
Since the expectation value of any observable operator O is given by Tr ρ O, equation 27.6 can be interpreted as the expectation value of the surprisal, as discussed in section 2.7.1, where we define the operator form of the surprisal to be:
$ := − log ρ (27.7) 
In case you are wondering how to take the logarithm of a matrix, here’s one way to do it: Expand log(x) in a Taylor series. (It is smarter to expand about x=1 than about x=0.) Then you can evaluate log(x) in terms of powers of x, which requires nothing beyond matrix multiplication, scalar multiplication, addition, and other wellunderstood operations. Specifically,
log(ρ) = − 

 (27.8) 
Furthermore, in any basis where the density matrix is diagonal – i.e. where the offdiagonal elements vanish – there is an even easier way to evaluate the logarithm: just take the log of each diagonal element separately, element by element.
Also: In any basis where the density matrix is diagonal, equation 27.6 is manifestly identical to equation 2.2. Note that any matrix can be made diagonal by a suitable change of basis. Also note that the value of the trace operator is unaffected by a change of basis; this can be seen as an immediate consequence of the “cyclic property” of the trace.
In a practical sense, what this section is saying is that if your density matrix ρ is not diagonal, it might be a good idea to perform a change of basis so that ρ becomes diagonal, and then evaluate equation 27.6 (or equivalently equation 2.2) in that basis. Equation 27.6 is just a compact way of saying this.