Week 2: Exercises#

The exercises are intended to be done by hand unless otherwise stated (such as when you are asked to plot a graph or run a script).

Exercises – Long Day#

1: Jacobian Matrices for Various Functions#

We define functions below of the form \(\pmb{f}: \operatorname{dom}(\pmb{f}) \to \mathbb{R}^k\), where \(\operatorname{dom}(\pmb{f}) \subseteq \mathbb{R}^n\), and where \(n\) and \(k\) can be read from the functional expression. In this exercise, we will not concern ourselves with determining the precise domain \(\operatorname{dom}(\pmb{f})\), but simply mention that if, for example, \(\ln(x_3)\) appears in the functional expression, it of course is a requirement that \(x_3 > 0\).

Question a#

  1. Let \({f}(x_1, x_2, x_3) = x_1^2x_2 + 2x_3\). Compute the Jacobian matrix \(J_{f}(\pmb{x})\) and evaluate it at the point \(\pmb{x} = (1, -1, 3)\). Confirm that the Jacobian matrix of a scalar function of multiple variables has only one row.

  2. Let \(\pmb{f}(x) = (3x, x^2, \sin(2x))\). Compute the Jacobian matrix \(J_{\pmb{f}}(x)\) and evaluate it at the point \(x = 2\). Confirm that the Jacobian matrix of a vector function of a single variable has only one column.

  3. Let \(\pmb{f}(x_1, x_2) = (x_1^2, -3x_2, 12x_1)\). Compute the Jacobian matrix \(J_{\pmb{f}}(\pmb{x})\) and evaluate it at the point \(\pmb{x} = (2, 0)\).

  4. Let \(\pmb{f}(x_1, x_2, x_3) = (x_2 \sin(x_3), 3x_1x_2 \ln(x_3))\). Compute the Jacobian matrix \(J_{\pmb{f}}(\pmb{x})\) and evaluate it at the point \(\pmb{x} = (-1, 3, 2)\).

  5. Let \(\pmb{f}(x_1, x_2, x_3) = (x_1 e^{x_2}, 3x_2 \sin(x_2), -x_1^2 \ln(x_2 + x_3))\). Compute the Jacobian matrix \(J_{\pmb{f}}(\pmb{x})\) and evaluate it at the point \(\pmb{x} = (1, 0, 1)\).

Question b#

All the functions from the previous question are differentiable. How can this be argued? For which of the functions can we compute the Hessian matrix? Compute the Hessian matrix for the functions where it is defined.

Question c#

Let \(\pmb{v} = (1,1,1)\). Normalize the vector \(\pmb{v}\) and denote the result by \(\pmb{e}\). Check that \(||\pmb{e}||=1\). Calculate the directional derivative of the scalar function \({f}(x_1, x_2, x_3) = x_1^2x_2 + 2x_3\) at the point \(\pmb{x} = (1, -1, 3)\) in the direction given by \(\pmb{v}\). Then calculate \(J_f(\pmb{x}) \pmb{e}\). Compare it with the directional derivative. Are they equal? If so, is that a coincidence?

2: The Jacobian Matrix of a Neural Network#

We consider a neural network \(\Phi: \mathbb{R}^2 \to \mathbb{R}^3\) with one hidden layer. The network sends an input \(\pmb{x}\) to an output \(\pmb{y}\) via the following steps:

  1. Affine transformation (layer 1): \(\pmb{z} = W_1 \pmb{x} + \pmb{b}_1\). We write \(T_{W_1,\pmb{b}_1}(\pmb{x}) = W_1 \pmb{x} + \pmb{b}_1\).

  2. Activation (ReLU): \(\pmb{h} = \operatorname{ReLU}(\pmb{z})\)

  3. Linear transformation (output without activation): \(\pmb{y} = W_2 \pmb{h}\). We write \(L_{W_2}(\pmb{h}) = W_2 \pmb{h}\).

So, the last activation function is the identity.

The parameters are given by:

\[\begin{split} W_1 = \begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix}, \quad \pmb{b}_1 = \begin{bmatrix} -1 \\ 0 \end{bmatrix}, \quad W_2 = \begin{bmatrix} 2 & 1 \\ 0 & -2 \\ -1 & 3 \end{bmatrix} \end{split}\]

We wish to determine the Jacobian matrix \(J_{\Phi}(\pmb{x})\).

Question a#

Let us first consider the scalar ReLU function \(\sigma = \text{ReLU}\) from \(\mathbb{R}\) to \(\mathbb{R}\). Find an expression for \(\sigma'(z)\) given by a piecewise-defined function. The expression is not defined for \(z=0\). Why not? Then find an expression for \(\pmb{J}_{\pmb{\operatorname{ReLU}}}\) for the ReLU vector function from \(\mathbb{R}^2\) to \(\mathbb{R}^2\).

We write \(\Lambda = \pmb{J}_{\pmb{\operatorname{ReLU}}}\) in the rest of this exercise.

Question b#

Find the Jacobian matrices for the two functions from, respectively, steps 1 and 3.

Question c#

Use the chain rule to establish a general expression for the Jacobian matrix \(J_{\Phi}(\pmb{x})\).

Question d#

We now consider the specific input \(\pmb{x}_0 = \begin{bmatrix} 0 \\ 2 \end{bmatrix}\). Find the Jacobian matrix \(J_{\Phi}(\pmb{x}_0)\).

Question e#

We have now found the Jacobian matrix \(J_{\Phi}(\pmb{x}_0)\). If we change the input \(x_1\) by a tiny amount \(\epsilon\) (that is, \(\Delta \pmb{x} = [\epsilon, 0]^T\)), by about how much will the output vector \(\pmb{y}\) then change? Use your Jacobian matrix to answer this question.

Note

The Jacobian matrix describes the sensitivity of the network: It explains precisely how much and in which direction the output will change when we make small input adjustments. This insight is important in neural networks in order for us to understand which input parameters that have the largest influence on the result.

Question f (Optional)#

Verify the calculation symbolically with Sympy. You can use this example for inspiration:

from sympy import symbols, Matrix, Max

x1, x2 = symbols('x1 x2')
x = Matrix([x1, x2])

W1 = Matrix([[1, -1], [2, 1]])
b1 = Matrix([-1, 0])
W2 = Matrix([[2, 1], [0, -3]])

z = W1 * x + b1
h = Matrix([Max(0, z[0]), Max(0, z[1])]) # ReLU(z)

y = W2 * h

J = y.jacobian(x)
J_val = J.subs({x1: 0, x2: 2})

print("Evaluated Jacobian matrix:")
display(J_val)
Evaluated Jacobian matrix:
\[\begin{split}\displaystyle \left[\begin{matrix}2 & 1\\-6 & -3\end{matrix}\right]\end{split}\]

3: Description of Sets in the Plane#

In each of the four cases below, draw a sketch of the given set \(\,A\,\), its interior \(\,A^{\circ}\,\), its boundary \(\,\partial A\,\) and its closure \(\,\bar{A}\,\). Furthermore, determine whether \(\,A\,\) is open, closed or neither. Finally, specify whether \(\,A\,\) is bounded or unbounded.

  1. \(\{(x,y) \mid xy\neq 0\}\)

  2. \(\{(x,y) \mid 0<x<1 \wedge 1\leq y\leq 3\}\)

  3. \(\{(x,y) \mid y\geq x^2 \wedge y<2 \}\)

  4. \(\{(x,y) \mid x^2+y^2-2x+6y\leq 15 \}\)

4: All Linear Maps from \(\mathbb{R}^n\) to \(\mathbb{R}\)#

Let \(L: \mathbb{R}^n \to \mathbb{R}\) be a (arbitrary) linear mapping. Let \(e = \pmb{e}_1, \pmb{e}_2, \dots, \pmb{e}_n\) be the standard basis for \(\mathbb{R}^n\), and let \(\beta\) be the standard basis for \(\mathbb{R}\). Recall the standard basis from Mathematics 1a. Since the dimension of \(\mathbb{R}\) (over \(\mathbb{R}\)) is one, the standard basis for \(\mathbb{R}\) is simply the number \(1\).

Show that there exists a column vector \(\pmb{c} \in \mathbb{R}^n\) such that

\[\begin{equation*} L(\pmb{x}) = \pmb{c}^T \pmb{x} = \langle \pmb{x}, \pmb{c} \rangle \end{equation*}\]

where \(\langle \cdot, \cdot \rangle\) denotes the usual inner product on \(\mathbb{R}^n\). (The column vector is uniquely determined, but proving this is not part of this question.)

5: Linear(?) Vector Functions#

We consider the following two functions:

  1. \(f: \mathbb{R}^{2 \times 2} \to \mathbb{R}^{2 \times 2}, f(X) = C X B\), where \(C = \operatorname{diag}(2,1) \in \mathbb{R}^{2 \times 2}\) and \(B = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}\).

  2. \(g: \mathbb{R}^n \to \mathbb{R}, g(\pmb{x}) = \pmb{x}^T A \pmb{x}\), where \(A\) is an \(n \times n\) matrix (and isn’t the zero matrix).

Determine for each function whether it is a linear map. If the map is linear, find the mapping matrix with respect to:

  1. the standard basis \(E=\begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}, \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}, \begin{bmatrix} 0 & 0 \\ 1 & 0 \end{bmatrix}, \begin{bmatrix} 0 & 0 \\ 0 & 1 \end{bmatrix}\) in \(\mathbb{R}^{2 \times 2}\). Recall this example from Math1a

  2. the standard basis \(e\) in \(\mathbb{R}^n\). Recall this result from Math1a

6: The Simple Chain Rule#

In this exercise we will be working with the simple chain rule given here.

We first consider a real function of two real variables given by the expression

\[\begin{equation*} g(x,y)=\ln(9-x^2-y^2). \end{equation*}\]

Question a#

Determine the largest possible domain of \(g\), and characterize it using concepts such as open, closed, bounded, and unbounded.

We now consider a parametrized curve \(\pmb{r}\) in the \((x,y)\) plane given by

\[\begin{equation*} \pmb{r}(u)=(u,u^3)\,,\,u\in \left[-1.2\,,\,1.2\right]. \end{equation*}\]

Question b#

Which curve are we talking about (you are familiar with its equation)?

We now consider the composite function

\[\begin{equation*} h(u) = g(\pmb{r}(u)). \end{equation*}\]

Question c#

What are the domain and co-domain of \(h = g \circ \pmb{r}\)?

Question d#

Determine \(h'(1)\) using two different approaches:

  1. Determine a functional expression for \(h(u)\) and differentiate it as usual.

  2. Use the chain rule from Section 3.7.

7: Partial Derivatives but not Differentiable#

We start with a simple function \(f\), which is differentiable everywhere. Let \(f:\mathbb{R}^2 \to \mathbb{R}\) be given by

\[\begin{equation*} f(x_1,x_2)=x_1^2-4x_1+x_2^2. \end{equation*}\]

Question a#

Let \(\pmb{x}_0 = (x_1,x_2) \in \mathbb{R}^2\) be an arbitrary point. Show that \(f\) is differentiable at \(\pmb{x}_0\), and calculate the gradient of \(f\) at \(\pmb{x}_0\).

Soft version: Use the result in this theorem

Hard version: Solve the question directly using the definition of differentiability in Section 3.6. We will be following this latter approach in hints and answer below:

Question b#

To conclude differentiability from the partial derivatives, see this theorem, it is required that the partial derivatives are continuous. Why is it not enough for the partial derivatives to exist? We will investigate this with an example. But first, we generalize a well-known theorem (from high school) about a function of one variable: If it is differentiable at a point, it is also continuous at that point.

Show that if a function of two variables is differentiable at a point \(\pmb{x}_0\), then it is also continuous at that point.

And now to the example that has named the exercise. We consider the function

\[\begin{equation*} g(x_1,x_2) = \begin{cases} \frac{x_1^2x_2}{x_1^4+x_2^2}, & \text{for } (x_1,x_2) \neq (0,0) \\ 0, & \text{for } (x_1,x_2)=(0,0) \end{cases} \end{equation*}\]

Question c#

Show that the partial derivatives of \(g\) exist at \((0,0)\), but that \(g\) is not differentiable at this point

8: The Generalized Chain Rule#

In this exercise we will be using this theorem: Generalized chain rule

We are given the functions:

  1. \(\pmb{f} : \mathbb{R}^3 \to \mathbb{R}^2\) defined by \(\pmb{f}(x_1, x_2, x_3) = (f_1(x_1, x_2, x_3), f_2(x_1, x_2, x_3))\), where

    \[\begin{align*} f_1(x_1, x_2, x_3) &= x_1^2 + x_2^2 + x_3^2, \\ f_2(x_1, x_2, x_3) &= e^{x_1 + x_2} \, \cos(x_3). \end{align*}\]
  2. \(g : \mathbb{R}^2 \to \mathbb{R}\) defined by \(g(y_1, y_2) = y_1 \, \sin(y_2)\).

  3. The composition of these two functions: \(h = g \circ \pmb{f}\).

In the task, we will calculate the Jacobian matrix of \(h\) (with respect to the variables \(x_1, x_2,\) and \(x_3\)) using the generalized chain rule. You are welcome to do the calculations in SymPy.

Question a#

Find a functional expression for \(h\) as well as the domain and co-domain. Calculate the gradient of \(h\).

Question b#

Calculate the Jacobian matrix of \(\pmb{f}\). Calculate the Jacobian matrix of \(g\). What is the connection between the gradient and the Jacobian matrix of \(g\)?

Question c#

Now apply the chain rule and the Jacobian matrices from the previous questions to find the Jacobian matrix of \(h\). Compare it with the answer in question a.

9: Level Curves and Directional Derivatives of Scalar Functions#

A function \(f:\mathbb{R}^2\rightarrow\mathbb{R}\) is given by the expression

\[\begin{equation*} f(x,y)=x^2+y^2. \end{equation*}\]

Another function \(g:\mathbb{R}^2\rightarrow\mathbb{R}\) is given by the expression

\[\begin{equation*} g(x,y)=x^2-4x+y^2. \end{equation*}\]

Question a#

Describe the level curves given by \(f(x,y)=c\) for the values \(c\in\{1,2,3,4,5\}\).

Question b#

Determine the gradient of \(f\) at the point \((1,1)\) and find the directional derivative of \(f\) at \((1,1)\) in the direction given by the unit direction vector \(\pmb{e}=(1,0)\).

Question c#

Describe the level curves given by \(g(x,y)=c\) for the values \(c \in\{-3,-2,-1,0,1\}\).

Question d#

Determine the gradient of \(g\) at the point \((1,2)\) and find the directional derivative of \(g\) at \((1,2)\) in the direction towards the origin, \((0,0)\).

10: Gradient Vector Fields and the Hessian Matrix#

Question a#

The gradient vector of \(f(x_1, x_2) = x_1^2 \sin(x_2)\) is \(\nabla f(\pmb{x}) = (2x_1 \sin(x_2), x_1^2 \cos(x_2))\). The gradient vector can therefore be considered as a map \(\nabla f : \operatorname{dom}(f) \to \mathbb{R}^2\). Write down this map as a function (where you specify \(\operatorname{dom}(f)\)), and plot it as a vector field.

Question b#

Now calculate the Jacobian matrix \(\pmb{J}_{\nabla f}(x_1,x_2)\) of \(\nabla f : \mathbb{R}^2 \to \mathbb{R}^2\) at the point \((x_1,x_2)\).

Question c#

Calculate the Hessian matrix \(\pmb{H}_{f}(x_1,x_2)\) of \(f : \mathbb{R}^2 \to \mathbb{R}\) at the point \((x_1,x_2)\) and compare it to the answer to the previous question.


Theme Exercise – Short Day#

This day is dedicated to the Theme Exercise: Theme 1: The Gradient Method.