ML for dummies: QM for dummies

"I will take just this one experiment, which has been designed to contain all of the mystery of quantum mechanics, to put you up against the paradoxes and mysteries and peculiarities of nature one hundred per cent. Any other situation in quantum mechanics, it turns out, can always be explained by saying, 'You remember the case of the experiment with the two holes? It's the same thing.'" -- Richard Feynman

Intro

Quantum Mechanics is hard. To use it expertly, you're going to have to understand math that makes my eyes bleed:

Your only other option seems to be reading popular renditions about half-dead zombie cats which give you no insight at all. I hope to bridge the gap and give you a useful understanding with minimal math.

credit thelifeofpsi.com. their site seems pretty rad.

Quantum states

Suppose you have a particle and some experiment you can run on it that can yield only two possible outcomes. For example, an electron can be spin-up or spin-down. A photon can be vertically polarized or horizontally polarized (it turns out that other forms of polarization, such as circular or elliptical, can be written in terms of those states). We'd like a mathematical model to describe the state of the particle and help us make predictions about it.

It turns out that we can represent a particle's state as a vector. Since there are two possible outcomes, we use a 2-dimensional space. And for reasons we'll see later, we want the states representing those two outcomes to be orthogonal (perpendicular).

Let's stick with the example of spin, which can be measured in any of the spatial directions (x, y, z). If we measure it along the z-axis, the possible outcomes are spin-up ($z_+$) or spin-down ($z_-$). We'll assign those states their own orthogonal vectors (which we write as letters with arrows on top):

$\vec{z_+}$

$\vec{z_-}$

In actuality, QM requires complex numbers instead of real numbers, and for some properties the number of dimensions is infinite. But my hands got tired trying to draw infinitely many complex dimensions, so you'll have to settle for two real ones. Surprisingly, it turns out to be enough to understand most of the important stuff.

What's important to understand is that this is just an abstract, mathematical description of our state. The fact that the states are orthogonal has nothing to do with orthogonality in physical space. For example, spin-up and spin-down would are represented like this even though "up" and "down" aren't perpendicular in physical space. Similarly, you may be used to thinking of the above lines as the x- and y- axes, but resist that temptation: here we'll call them the $z_+$ and $z_-$ axes.

This 2D plane is called the "state space" of the particle, and represents all possible values of the states it can have. Do not think of it as physical space in any way.

Superposition

In classical physics, a particle can only be in one state or the other. In QM, it can be in a superposition. It's a fancy word for any vector that's not directly along one axis:

$\frac{\sqrt{2}}{2}(\vec{z_+} + \vec{z_-})$

The $\frac{\sqrt{2}}{2}$ is there just to make the length equal to 1.

To understand what states "mean," we have to understand what we do with states.

Measurement

If we have a particle in the above state ($\frac{\sqrt{2}}{2}(\vec{z_+} + \vec{z_-})$), we can measure it to see if it's "really" in state $\vec{z_+}$ or state $\vec{z_-}$. The probability of that measurement resulting in state $\vec{z_+}$ is given by projecting the state vector onto the $z_+$-axis and squaring the resulting length. You can think of it as "dropping a perpendicular" (technically, it's taking the inner product and taking the square of the norm):

You can see that the green line touches the $z_+$-axis at $\frac{\sqrt{2}}{2}$, which squared gives $\frac{{1}}{2}$. If you do the same with the other axis, you'll get the same result. This tells us that the particle has a 50-50 chance of being measured as either $z_+$ or $z_-$. Once the state has been measured as (say) $z_+$, the state becomes simply $z_+$ no matter what it was before. This is sometimes called a "collapse."

Another thing to notice is that if the particle state is $\vec{z_-}$, then measuring it as $\vec{z_+}$ (or vice versa) will happen 0% of the time, since projecting onto an orthogonal vector gives you a zero vector. This is why we require the corresponding states to be orthogonal: if the particle is definitely in one state, it is definitely not in the other state, and this is how we make the math work out.

A key point is that we chose to measure it with respect to the $z_+$/$z_-$ axis. But as we shall soon see, that's not our only choice.

Change of basis

In a 2-dimensional plane, there are many choices of axes. For example, we could choose the blue and green lines:

A different axis

What shall we call this axis? Well, it turns out that we got lucky: just like in our previous example, this particular axis has a physical interpretation. The blue line corresponds to spin-up in the (physical) x-direction, and the green line to spin-down.

Why do I say that? The answer, as always, is that it produces a model that agrees with physical experiments. Specifically, if we have an electron that's just been measured to be $x_+$ (i.e., spin-up in the x-direction, corresponding to the blue line), and we measure it in the $z_+$/$z_-$ basis, we actually get each result with 50% probability -- just as the result from the previous section suggests. You can see that the model predicts the same result (50%) for each of the following experiments:

Particle starts in $\vec{x_+}$. What are the odds of it being measured as $z_+$?
Particle starts in $\vec{z_+}$. What are the odds of it being measured as $x_+$?
Particle starts in $\vec{x_-}$. What are the odds of it being measured as $z_+$?
Particle starts in $\vec{z_+}$. What are the odds of it being measured as $x_-$?
Particle starts in $\vec{x_+}$. What are the odds of it being measured as $z_-$?
Particle starts in $\vec{z_-}$. What are the odds of it being measured as $x_+$?
Particle starts in $\vec{x_-}$. What are the odds of it being measured as $z_-$?
Particle starts in $\vec{z_-}$. What are the odds of it being measured as $x_-$?

And sure enough, these predictions match up with experimental observations. And that is why we call the blue-green axis the $x_+$/$x_-$ axis: because it makes the math match the experiments, and that's what we want out of a model!

As an aside, the above is a rough demonstration of the Heisenberg Uncertainty Principle: knowing the value of a property along one axis makes it uncertain with respect to other axes in the same state space. Momentum and position are similarly just different bases for the same infinite-dimensional space, but that's harder to see visually.

In general, we could pick any axis we want, and it may correspond to some interesting physical property. This just happens to be an easily drawn case with easily interpreted physical significance.
(If you're wondering which axis corresponds to the $y_+$/$y_-$ states, the answer is that we'd have to introduce complex numbers (i.e., numbers with imaginary parts), which can't be drawn on our real 2D plane.)

Interference

The interesting thing about the two-slit experiment is the interference pattern, which we sometimes see and sometimes don't. It turns out that we can model interference with our greatly simplified system, even though it won't look anything like the pretty pattern we see on the screen.

Recall from above that we can write out a state vector in different ways depending on the basis we choose. For example, $\vec{x_+}$ can also be rewritten as $\frac{\sqrt{2}}{2}(\vec{z_+} + \vec{z_-})$. It's the same state, just rewritten.

Now, we can ask about the odds of measuring that state as $x_-$. From the former description ($\vec{x_+}$) it should be immediately obvious that the probability is zero: the state in question is orthogonal to the outcome we want. A particle that's spin-up in the x direction is by definition not spin-down in the same direction.

But if we write it the second way, it may not be so obvious:

What is the probability that a particle in state $\frac{\sqrt{2}}{2}(\vec{z_+} + \vec{z_-})$ is measured as $x_-$?

The answer is the same, because rewriting a vector in a different basis doesn't actually change anything. But if we're not being careful, we might try to reason as follows:

Hmm... well the particle is either in state $\vec{z_+}$ or $\vec{z_-}$. In both cases, the probability of being measured as $x_-$ is 50%. So on average, we still have a 50% chance!

The above reasoning would be correct in classical mechanics, but it fails in quantum mechanics! Somehow the two "components" ($\vec{z_+}$ and $\vec{z_-}$) have "destructively interfered" to give a result of zero! Similarly, if the question had been to calculate the probability of getting result $x_+$ (for which the result is 100%) we might say that they "constructively interfere."

And that is the essence of the difference between classical and quantum physics: the either/or reasoning fails, because the rules involve these (generally complex-valued) vectors that can represent superpositions whose behavior differs from anything we know in the classical world. Interference is the manifestation of this fact.

Also note that in the first way of writing it ($\vec{x_+}$), it's not a "superposition" of anything. This means that "superposition" is not an intrinsic feature of a system, but of our way of thinking about it. Similarly, the word "interference" implies that the "components" in the superposition are interfering, and is thus just one way of thinking about things.

(For a more mathematical treatment, see details in footnote [1].)

Entanglement

Another interesting property of quantum systems is entanglement. If you want to understand it properly, you have to become familiar with tensor products.

"If you really want to impress your friends and confound your enemies, you can invoke tensor products… People run in terror from the $\otimes$ symbol." -- some Stanford professor, apparently

They're really not that scary, but they're beyond the scope of this guide. So we'll have to make do with another simplification.

Suppose we have two systems: on the left, a particle whose z-spin we care about, and on the right, a detector that's going to measure the z-spin of the particle. Let's call the detector states $\vec{dz_+}$ and $\vec{dz_-}$. To represent the joint state of the two systems, we introduce this operator ($\otimes$) that is vaguely analogous to multiplication:

Joint state = particle state $\otimes$ detector state

So, for example, we might have:
$$(\vec{z_+} + \vec{z_-}) \otimes (\vec{dz_+} + \vec{dz_-})$$

Just like regular multiplication, it distributes over addition, so we can rewrite this:

$$(\vec{z_+} \otimes \vec{dz_+}) + (\vec{z_+} \otimes \vec{dz_-}) + (\vec{z_-} \otimes \vec{dz_+}) + (\vec{z_-} \otimes \vec{dz_-})$$

We can interpret the above in the following way. There are four possible outcomes, corresponding to each term in the sum:

$\vec{z_+} \otimes \vec{dz_+}$: particle is spin-up and detector reads spin-up
$\vec{z_+} \otimes \vec{dz_-}$: particle is spin-up and detector reads spin-down
$\vec{z_-} \otimes \vec{dz_+}$: particle is spin-down and detector reads spin-up
$\vec{z_-} \otimes \vec{dz_-}$: particle is spin-down and detector reads spin-down

If the particle has not yet encountered the detector, it's okay that we have some states (2 and 3) where the particle and detector don't agree. But if we have a functioning detector, then after the measurement, we will get only two components in our joint state:

$$(\vec{z_+} \otimes \vec{dz_+}) + (\vec{z_-} \otimes \vec{dz_-})$$

It turns out that the distributive law doesn't help us reduce this state to one of the form:

(some state for the particle) $\otimes$ (some state for the detector)

So the two states are said to be entangled. The physical interpretation is that knowing something about the particle's state gives us information about the detector's state (and vice versa) -- which is just what you'd expect out of a working detector.

Loss of interference

In the previous section, we saw an example where the systems were perfectly unentangled (knowing information about one system told you nothing about the other) and perfectly entangled (knowing the state of one system told you exactly the state of the other). Given different weights for the four terms we looked at, a system can also be partially entangled.

It turns out that the more entangled the system is, the less interference we can exhibit within the individual systems. When it is unentangled, and it is as though the particle has never interacted with the detector, it is free to exhibit interference. When it becomes perfectly entangled, it shows no interference. And with partial entanglement, there will be partial loss of interference.

This is all well understood and easy to show mathematically (though I won't do it here). For "spookiness" we will have to look elsewhere.

Collapse

You'll recall that earlier I said that if a particle is in $\vec{z_+} + \vec{z_-}$ and we measure it, it collapses to one of those two basis states. But this presents two difficulties.

First, this process is not deterministic: there seems to be no way of knowing, a priori, whether the result of the collapse is going to be $\vec{z_+}$ or $\vec{z_-}$.

Second, it is not invertible. Whatever the state was before the collapse, the state is now (say) $\vec{z_+}$. You can't tell where it came from; some information is lost.

These are not just aesthetic concerns. Everything else we know about the dynamical evolution of quantum systems is deterministic and invertible (among other properties of unitary operators), so we're left with the question: what constitutes a collapse? After all, if our theory is complete, it needs to tell us when we should apply the general unitary evolution rules, and when we have to invoke this special one-off rule.

Did the particle collapse when it met the detector, or did it just become entangled (which is a unitary operation)? With small examples (for example, two particles), we can do experiments that show that the system is in a superposition. In fact, the whole field of quantum computing is based on this kind of behavior. And yet in the macroscopic world we never see superpositions -- whatever that would even mean.

So then when does the collapse really happen?

The heart of the matter

Let's back up and figure out why we're even talking of a "collapse."

Recall the two-slit experiment. In one case, the light was spread out more like a wave, and in the other, it behaved more like a particle -- as though the broad wave function "collapsed" into just two possible paths (one for each slit). In that latter case, the interference pattern disappeared.

From physics stackechange

But this fact can be explained through entanglement: the particle becomes entangled with the detector, and so the particle (by itself) will not show interference. As we said, that's easily shown with the math. So is that it? Is the problem solved?

Not quite. The particle still could have taken either of two paths. The same math describes the system as a whole as being in a superposition of states -- one in which the particle went left (and the detector says left) and one in which it went right (and the detector says right):
$$(\vec{particle_{left}} \otimes \vec{detector_{left}}) + (\vec{particle_{right}} \otimes \vec{detector_{right}})$$

So how come we only see one state? This fact can surely still be called a "collapse," and so the question remains: when does it happen?

Founders of QM to the rescue!

According to the Copenhagen Interpretation -- which for most of the 20th century was by far the most accepted, and continues to be the preferred school of thought -- the answer is basically a giant shrug.

The next best thing: a giant slug

Niels Bohr (one of the founders of QM and the Copenhagen school) never mentioned "collapse". Heisenberg (the other main dude who developed the interpretation) basically just said that it's whenever an apparatus measures it.

So much for that excursion.

Decoherence

Unsurprisingly, some in the physics community found this giant question mark a little troubling. One of the key developments in the latter half of the century was decoherence. If there's anything surprising about it, it's that it wasn't developed much sooner.

Basically, in any realistic system, the particles are going to encounter lots of other particles in the surrounding environment. Each interaction only leads to partial entanglement, since any one random particle is unlikely to give you perfect information about the rest of the system. In aggregate, however, trillions of particles will cause enough entanglement that any hope of exhibiting interference will be effectively lost. This process is known as decoherence. Even if it doesn't happen in well-controlled lab settings, your own body and brain are going to cause it.

This explains why the macroscopic world doesn't look spooky. It also has a big advantage over the "collapse" interpretation, which, being binary (collapsed vs not-collapsed), has a hard time accounting for partial loss of interference. But in order to answer the core question of why you see only one outcome, it has to do something a little funny.

You see, dear reader, when you observe the system, you become entangled with it:

$$(\vec{z_+} \otimes \vec{dz_+} \otimes \vec{reader_+}) + (\vec{z_-} \otimes \vec{dz_-} \otimes \vec{reader_-})$$

In other words, no collapse ever "really" takes place. Instead, there are now two copies of you, inhabiting roughly parallel universes. In one, the particle is in $\vec{z_+}$, and in the other, it's in $\vec{z_-}$.

Infinite parallel universes!

The explanation is known as the Many Worlds Interpretation, and it's very much in vogue:

The Many-Worlds Interpretation (MWI) of quantum mechanics holds that there are many worlds which exist in parallel at the same space and time as our own. The existence of the other worlds makes it possible to remove randomness and action at a distance from quantum theory and thus from all physics.

According to this explanation, each time there is a measurement with multiple possible outcomes, the universe splits so that all of them happen. When you encounter the system, you become entangled with it, and thus you also split into two copies, each inhabiting universes with different results, that effectively cannot interfere with each other.

This may leave you feeling a little dissatisfied. If there are two (actually, infinitely many!) copies of you inhabiting parallel universes, which one is "the real you?" Why are you you inhabiting this one and not some other one where things went differently? Have we really solved anything?

Such questions belong to philosophy, and not physics, they say. A common explanation is that there's no reason to privilege "this you" over all the "other you"s: in each parallel universe, there's probably another you is asking why she inhabits that universe. So, you see, there's nothing very strange going on after all! Problem solved!

An experimental test

That's all well and good, but it still feels meaningful to ask why "I" ended up in this universe and not that one, doesn't it? Sure, all those copies of me may technically also be "me," but not in the way that really seems to matter. And so the original itch isn't really scratched.

So maybe you start thinking: do I have any freedom of choice over which path I take? And so I leave you with the following challenge.

Suppose you are sitting in front of a machine that is generating electrons in the $\frac{\sqrt{2}}{2}(\vec{z_+}+ \vec{z_-})$ state once a second. Each time, you measure it in the $z_+ / z_-$ basis, and the universe splits:

The solid red circle represents the "real you" (from your perspective), and the empty ones are all the "other yous." (Ignore the fact that it splits in three in the upper-right. I stole this picture.)

Looking at the diagram from bottom to top, you might ask, at any given branching point: which path will the "real me" take? And such a question might be discarded as meaningless, since as we said, maybe in some funny sense they are all really you.

On the other hand, the "you" that is reading this can meaningfully ask, looking from the top down: what happened along the path that "this me" took? In other words, what does my history look like? How many times did the experiment yield $z_+$ as opposed to $z_-$? That's well-defined physically.

If the above description of QM is true, you should expect that in the long run, the results become arbitrarily close to 50-50. Conversely, if your history looks like it was basically a bunch of random and unsurprising events -- not just for electron experiments, but also macroscopic ones, which after all started out in the quantum domain -- then maybe it's not so interesting to know "why this path?" Because randomness, that's why.

But if not -- if the odds diverge significantly -- then you might start to wonder: why did "this me" end up in a relatively unexpected path? Was it pure chance? Or do I have some control over the path?

Such a question is meaningless within the structure of QM and the Many Worlds Interpretation. The fact that "this" you keeps getting lucky is neither here nor there; they're all you, and on average they didn't beat house odds. If you happen to have some way to choose your path and the quantum police come a-knockin', you can just plead ignorance. Some of "you" had to get lucky, after all.

So, that's it! Get crackin' on your psychic, reality-shifting abilities! If you do manage to somehow choose your path, you've got nobody to answer to!

Footnotes

[1] Recall that to determine the probability of some state $\psi$ being measured as some outcome $\phi$, we take the inner product (and then square its absolute value). With real vectors we call this the dot product, and if you write out the vectors in component form it works like so:
$$(x_1, y_1) \cdot (x_2, y_2) = x_1 * x_2 + y_1 * y_2$$
So to determine the probability of transition from $(\frac{\sqrt{2}}{2}, \frac{\sqrt{2}}{2})$ to $(\frac{\sqrt{2}}{2}, -\frac{\sqrt{2}}{2})$ we get:
$$(\frac{\sqrt{2}}{2}, \frac{\sqrt{2}}{2}) \cdot (\frac{\sqrt{2}}{2}, -\frac{\sqrt{2}}{2}) = \frac{1}{2} + \frac{-1}{2} = 0$$
This canceling of positive and negative terms can be considered interference. Why? Because we're deciding to think of the components $(a, b)$ as representing two distinct possibilities, one of which produces a positive term ($\frac{1}{2}$) and the other a negative ($\frac{-1}{2}$) which cancel.

Note, however, that if we had written those vectors in the $x_+$/$x_-$ basis, it would have looked like this:
$$(1, 0) \cdot (0, 1) = 0 + 0 = 0$$
Same result, but no reason to call it "interference" here.

ML for dummies

QM for dummies