We will start by discussing what a differential is. In some ways this is familiar, but in other ways it might not be.
Let us start by revisiting an example in the one-dimensional case. We proceed informally for now. Intuitively, a differential is just a very small change in some function $\varphi$ caused by some very small change in its input. That is, when $\varphi$ is a function evaluated at $x$ and when $dx$ is very small, we have that:
$$\varphi(x + dx) - \varphi(x) \approx d\varphi_x$$Where the subscript $x$ specifies the point at which we are differentiating. Let us work out a simple example. Suppose we have the function $\varphi(x) = x^3$. It is easy to compute the differential using the binomial theorem. Since $dx \approx 0$ and the squares of small numbers, less than 1, are much smaller than the original number, higher order $dx$ terms are negligible, so:
$$ \begin{align*} \varphi(x + dx) - \varphi(x) &= (x + dx)^3 - x^3\\ &= x^3 + 3x^2\,dx + 3x\,dx^2 + dx^3 - x^3\\ &\approx 3x^2\,dx \end{align*} $$Therefore $d\varphi_x = 3x^2\,dx$. What it means for a function to be differentiable is that when we do this sort of thing, we always end up with a sensible result like this one for the differential: it is just some number, the derivative $\varphi'(x)$, multiplied by the size of the tiny nudge $dx$, or $d\varphi_x = \varphi'(x)\,dx$
To understand the generalization into higher dimensions, it is crucial to understand a few details:
With that out of the way, we can finally ask: what does it take to generalize the derivative and differential to higher dimensions? The answer is basically nothing. A computational example should help illustrate this. Consider a function $\varphi: \mathbb{R}^3 \rightarrow \mathbb{R}^2$ defined by:
$$\varphi\begin{pmatrix}x \\ y \\ z \\ \end{pmatrix} = \begin{pmatrix}x^2y\\ y^2 + z\end{pmatrix}$$We can find the differential by fixing $a = (x, y, z)$ and considering a nudge vector $da = (dx, dy, dx)$, then doing the same thing we did last time.
$$ \begin{align*} \varphi(a + da) - \varphi(a) &= \begin{pmatrix} (x + dx)^2(y + dy) \\ (y + dy)^2 + (z + dz) \end{pmatrix} - \begin{pmatrix}x^2y\\ y^2 + z\end{pmatrix}\\ &= \begin{pmatrix} (x^2 + 2x\,dx + dx^2)(y + dy) \\ y^2 + 2y\,dy + dy^2 + z + dz \end{pmatrix} - \begin{pmatrix}x^2y\\ y^2 + z\end{pmatrix}\\ &= \begin{pmatrix} 2xy\,dx + y\,dx^2 + x^2\,dy + 2x\,dx\,dy + dx^2\,dy \\ 2y\,dy + dy^2 + dz \end{pmatrix}\\ &\approx \begin{pmatrix} 2xy\,dx + x^2\,dy \\ 2y\,dy + dz \end{pmatrix}\\ &= \begin{pmatrix} 2xy & x^2 & 0 \\ 0 & 2y & 1 \end{pmatrix}\begin{pmatrix} dx \\ dy \\ dx \end{pmatrix}\\ &= \varphi'(a)\,da \end{align*} $$Once again, the same sort of thing happens. How can we make this more formal? [TODO: Add detailed motivation]