blog-header-image-size-56
Intro to Differentiable Swift, Part 3: Differentiable API Introduction
1xkfq3sjbvpifskojy1g_2w

Code going forward, and then going backwards. (This is from an earlier version of autodiff).

In Part 2, we got an introduction to Automatic Differentiation (aka AutoDiff).

Let’s get a feel for how to use it beyond a trivial example. In order to do so, we’ll need to understand a few things about how it works.

Disclaimer: There’s a risk that I over-simplify here. But an intuition is the first step to a rigorous mathematical understanding, right?

And just between you and me, you don’t actually need a perfectly rigorous understanding to use AutoDiff effectively. I certainly don’t! Don’t tell the mathematicians…

Here’s a start if you want the detailed version: https://github.com/apple/swift/blob/main/docs/DifferentiableProgramming.md

We’ll keep this high-level: when you ask AutoDiff for derivatives to the inputs of your function, it starts by calling your function normally, with the inputs you gave it. This call is the ‘forward pass’ in AutoDiff terms. After the forward pass completes, the output of the function is known. Then the ‘reverse pass’ can happen, which will go through your function backwards, starting at the output, and partial derivatives will propagate all the way back to the inputs. (Yes, that’s why it’s called backpropagation in Deep Learning. It’s propagating backwards, from the end of the function to the beginning.).

Once it’s done, you have derivatives for all the inputs!

So here are the corresponding API bits:

Let’s start with a simple function, which we’ll label with @differentiable(reverse).

let input: Double = 3
@differentiable(reverse)
func f(x: Double) -> Double{
    return x * x
}

We’ve already seen gradient(at: of:). It gives you the derivatives for all inputs.

let gradientAtThree = gradient(at: input, of: f)

Here’s how to do the same thing, but also get the function output:

let (output, gradientAtThree) = valueWithGradient(at: input, of: f)

You now know enough to be dangerous!

Here’s a little more detail, if you’re interested:

It’s possible to split up the forward and reverse pass into separate calls. To understand the API, you have to know what a ‘pullback’ is. There’s not necessarily anything to understand except the word: the reverse pass is done by a function called the pullback. It’s the function that embodies the reverse pass. The pullback of your entire function is actually made up of many small pullbacks, a matching one for each operation in your function. It’s all composed automatically into a reverse “mirror” of the forward function that you wrote.*

Here we’ll do the same thing as above, a forward pass and reverse pass, but split it into two calls:

forward pass:

let (output, pullback) = valueWithPullback(at: input, of: f)

and reverse pass:

let gradientAtThree = pullback(1)

You’ll notice I called pullback with an argument of 1, which might have looked like a magic number. It’s not. The partial derivatives that propagate backwards through the function during the reverse pass have to start with something, which is the unit vector by definition. If you’ll recall, the unit vector has a magnitude of one. So that’s why I called the pullback with 1.

(When you call gradient(at: of:), autodiff is calling the pullback with 1 as well, it’s just implicit.)

If you had a function with two outputs, you’d call the pullback with two unit vectors, like this: pullback(1, 1). When you’re using the dervatives for optimization purposes though, you almost always end your function with a cost/loss calculation, which is a single number.

You now understand a good chunk of the high-level API. Well done!

In Part 4, we’ll get some more details about the AutoDiff API.

Automatic Differentiation encompasses at least two ‘modes’, or methods of getting derivatives. Here we’ve only talked about Reverse Mode. The other is Forward Mode. Reverse Mode is the one that gives you derivatives to all the inputs in an efficient way (assuming you’ve got relatively few outputs). Forward mode is cool too, but that’s for another day. Here’s a useful article.
Automatic Differentiation in Swift is still in beta. You can download an Xcode toolchain with import _Differentiable included from here (You must use a toolchain under the title “Snapshots -> Trunk Development (main)”).
When the compiler starts giving you errors you don’t recognize, check out this guide on the less mature aspects of Differentiable Swift. (Automatic Differentiation in Swift has come a long way, but there are still sharp edges. They are slowly disappearing, though!)
Automatic Differentiation in Swift exists thanks to the Differentiable Swift authors (Richard WeiDan ZhengMarc RasiBrad Larson et al.) and the Swift Community!
See the latest pull requests involving AutoDiff here.

*It’s not a one-to-one mirror (because of something called the chain rule), but it’s good enough to think about it that way for your first intuition.

Previous post
12 / 24
Next post
Twitter%20X