Week 1: Exercises#

The exercises are intended to be done by hand unless otherwise stated (such as when you are asked to plot a graph or run a script).

Exercises – Long Day#

1: Function or Not?#

Consider the following correspondence between \(a\) and \(b\) values:

\(a\)

\(b\)

1

0

2

1

0

3

1

2

We consider functions whose domain is a subset of \(\{0,1,2,3\}\) and whose co-domain is \(\{0,1,2,3\}\).

We need to determine whether \(f\) and \(g\) define functions. We let \(f\) follow the rule that the first column (the \(a\) values) contains the input and the second column (the \(b\) values) contains the output of the function \(f\), with the domain being \(\{0,1,2\}\). And we let \(g\) follow the rule that the second column is the input and the first column should be the output of the function \(g\), with the domain being \(\{0,1,2,3\}\).

Does \(f\) define a function? Does \(g\)? If so, determine the range/image of the function, and decide whether the function is injective and/or surjective.

2: Same Functional Expressions?#

We consider functions \(f_i : \mathbb{R} \to \mathbb{R}\) given by:

\[\begin{gather*} f_1(x) = |x| \\ f_2(x) = \begin{cases} x & ,x > 0 \\ -x & ,x \le 0 \end{cases} \\ f_3(x) = \max(x,0) \\ f_4(x) = \operatorname{ReLU}(x) + \operatorname{ReLU}(-x) \end{gather*}\]

where \(x \in \mathbb{R}\).

Some of the functions are the same function. Which?

3: Unknown Functional Expression#

Consider a function \(f: \mathbb{R} \to \mathbb{R}\) for which \(\lim_{x \to 2} f(x) = 5\) and \(f(2) = 3\). Which of the following propositions is true?

  1. The function is continuous at \(x=2\).

  2. The function is differentiable at \(x=2\).

  3. The function is discontinuous at \(x=2\).

  4. The function is not well-defined at \(x=2\).

  5. The truth value of the above propositions cannot be determined without knowing the functional expression.

4: Non-Linearity of ReLU#

Consider the ReLU function \(\operatorname{ReLU}: \mathbb{R}^n \to \mathbb{R}^n\). Show that this function is not linear.

5: Possible Visualizations#

Discuss whether it is possible to visualize the following functions – if so, plot them with Python:

  1. A scalar function of two variables \(f: \mathbb{R}^2 \to \mathbb{R}, \, f(x_1,x_2) = \sqrt{\vert x_1 x_2 \vert}\)

  2. A scalar function of four variables \(f: \mathbb{R}^4 \to \mathbb{R}, \, f(x_1,x_2,x_3,x_4) = \sqrt{\vert x_1 x_2 x_3 x_4 \vert}\)

  3. A complex(-valued) scalar function of two variables \(f: \mathbb{R}^2 \to \mathbb{C}, \, f(x_1,x_2) = \sqrt{\vert x_1 x_2 \vert} + i \cos(x_1 + x_2)\)

  4. A vector field in 2D \(\pmb{f}: \mathbb{R}^2 \to \mathbb{R}^2, \, \pmb{f}(x_1,x_2) = (-x_2/3, x_1/3)\)

  5. A vector field in 3D \(\pmb{f}: \mathbb{R}^3 \to \mathbb{R}^3, \, \pmb{f}(x,y,z)= (x^3+yz^2, y^3-xz^2, z^3)\)

  6. A function of the form \(\pmb{r}: [0,10] \to \mathbb{R}^3, \, \pmb{r}(t) = (\cos(t),\sin(t),t)\)

Note

The following Python commands from the dtumathtools package are useful: dtuplot.plot3d, dtuplot.plot_vector, dtuplot.plot3d_parametric_line.

6: Evaluating a Neural Network#

Consider a simple “shallow” neural network \(\pmb{\Phi}: \mathbb{R}^2 \to \mathbb{R}\) with one hidden layer (\(L=2\)). The network is defined by the parameters:

\[\begin{split} A_1 = \begin{bmatrix} 2 & 0 \\ -1 & 1 \end{bmatrix}, \quad \pmb{b}_1 = \begin{bmatrix} -1 \\ 0 \end{bmatrix}, \quad A_2 = \begin{bmatrix} -1 & 2 \end{bmatrix}, \quad b_2 = 0, \end{split}\]

where the activation function in the hidden layer is the ReLU function, \(\pmb{\sigma}(\pmb{z}) = \operatorname{ReLU}(\pmb{z})\), while the activation function in the last layer is the identity map \(\pmb{\sigma}(\pmb{z}) = \pmb{z}\). The network function is thus given by:

\[ \pmb{\Phi}(\pmb{x}) = A_2 \, \operatorname{ReLU}(A_1 \pmb{x} + \pmb{b}_1) + b_2. \]

Question a#

Calculate the value of the network at the point \(\pmb{x} = \begin{bmatrix} 0.5 \\ 1 \end{bmatrix}\).

Question b#

Find a point \(\pmb{x}\) where the output \(\pmb{\Phi}(\pmb{x})\) of the network is negative. Justify your answer.

Question c#

How many adjustable parameters (so-called weights and bias-values) does this network have in total?

Question d#

We now replace the activation function (ReLU) with a hard-limiter function (a variant of the signum function), which we will call \(\sigma_{\text{step}}\).

As a scalar function, \(\sigma_{\text{step}}: \mathbb{R} \to \mathbb{R}\), it is defined by:

\[\begin{split} \sigma_{\text{step}}(x) = \begin{cases} 1 & \text{if } x \ge 0 \\ -1 & \text{if } x < 0 . \end{cases} \end{split}\]

As a vector function, \(\pmb{\sigma}_{\text{step}}: \mathbb{R}^n \to \mathbb{R}^n\), it is defined by the scalar function being applied on each coordinate:

\[\begin{split} \pmb{\sigma}_{\text{step}}(\pmb{z}) = \begin{bmatrix} \sigma_{\text{step}}(z_1) \\ \vdots \\ \sigma_{\text{step}}(z_n) \end{bmatrix}. \end{split}\]

The network with this new activation function looks like this: \(\pmb{\Phi}(\pmb{x}) = A_2 \pmb{\sigma}_{\text{step}}(A_1 \pmb{x} + \pmb{b}_1) + b_2\).

State the subset of the domain over which the network function \(\pmb{\Phi}(\pmb{x})\) is discontinuous.

7: Visualizing the Network#

In this exercise you should use Python to visualize the function \(\pmb{\Phi}(\pmb{x})\) from the previous Exercise 6: Evaluating a Neural Network (with ReLU as the activation function).

You are to plot the graph of the network over the region \(x_1, x_2 \in [-2, 2]\). We will here use a numerical approach (with the matplotlib package) instead of SymPy in order to achieve a better 3D visualization. Run the following code (you do not have to understand all of the Python-technical details in these code lines).

import numpy as np
import matplotlib.pyplot as plt

# Parameters
A1 = np.array([[2, 0], 
               [-1, 1]])
b1 = np.array([[-1], 
               [0]])
A2 = np.array([[-1, 2]]) 
b2 = 0

def ReLU(x):
    return np.maximum(x, 0)

def shallow_nn(x1, x2):
    # Gathering the input in a vector x
    vals = np.zeros(x1.shape)
    
    for i in range(x1.shape[0]):
        for j in range(x1.shape[1]):
            x_vec = np.array([[x1[i,j]], [x2[i,j]]])
            
            # Layer 1: Affine transformation + ReLU
            z1 = A1 @ x_vec + b1
            a1 = ReLU(z1)
            
            # Layer 2: Affine transformation
            output = A2 @ a1 + b2
            
            vals[i,j] = output[0,0]
            
    return vals

x_range = np.linspace(-2, 2, 80)
y_range = np.linspace(-2, 2, 80)
X, Y = np.meshgrid(x_range, y_range)
Z = shallow_nn(X, Y)

# Plot
fig = plt.figure(figsize=(12, 8))
ax = fig.add_subplot(111, projection='3d')

surf = ax.plot_surface(X, Y, Z, cmap='viridis', 
                       edgecolor='k', linewidth=0.3, alpha=0.5,
                       rcount=20, ccount=20, antialiased=True)

# Lines where the ReLU has a "kink" (where it changes suddenly from 0 to positive values)
y_vals = np.linspace(-2, 2, 100)
x_vals_1 = np.full_like(y_vals, 0.5)
z_vals_1 = shallow_nn(np.array([x_vals_1]), np.array([y_vals]))[0]
x_vals_2 = np.linspace(-2, 2, 100)
y_vals_2 = x_vals_2
z_vals_2 = shallow_nn(np.array([x_vals_2]), np.array([y_vals_2]))[0]

ax.plot(x_vals_1, y_vals, z_vals_1, color='red', linewidth=3, label='$2x_1 - 1 = 0$ (Kink line 1)')
ax.plot(x_vals_2, y_vals_2, z_vals_2, color='orange', linewidth=3, label='$-x_1 + x_2 = 0$ (Kink line 2)')

ax.set_xlabel('$x_1$')
ax.set_ylabel('$x_2$')
ax.set_zlabel('$\Phi(x)$')
ax.legend()
ax.view_init(elev=15, azim=-100) 

plt.tight_layout()
plt.show()
Hide code cell output
../_images/96a9a7ae44b8de9f43fff471609a45e6bf47da75321f91be44fcb07a300fd3b8.png

Note

Note how the graph consists of plane surfaces that are “cut off” and stitched together. This is due to the ReLU function that is piecewise linear. In large networks used in real-life applications, millions of such subspaces are stitched together.

Can you plot the graph of the same neural network where \(\pmb{\sigma}_{\text{step}}\) is used in place of ReLU? The planes in this new graph should not tough (why not?).

8: Linear Vector Function#

Let \(A \in \mathsf{M}_{3 \times 5}(\mathbb{R})\) be given by

\[\begin{equation*} A = \begin{bmatrix} 1 & 0 & 2 & 3 & 4 \\ 0 & -1 & 5 & 6 & 7 \\ 0 & 0 & -3 & 8 & 9 \end{bmatrix}. \end{equation*}\]

Consider the vector function \(\pmb{f}: \mathbb{R}^5 \to \mathbb{R}^3\) given by \(\pmb{f} = \pmb{x} \mapsto A\pmb{x}\), where \(\pmb{x}\) is a column vector in \(\mathbb{R}\)^5.

Question a#

State the three coordinate functions of \(\pmb{f}\).

Question b#

Provide the image set \(\operatorname{im}(\pmb{f})\) of \(\pmb{f}\).

Question c#

Is the vector function \(\pmb{f}\) surjective and/or injective?

9: Next-Prime Function (Optional)#

This is an optional extra exercise for those who seek extra training.

Let \(f: \mathbb{N} \to \mathbb{N}\) be a function that returns the next prime number (strictly) greater than a given natural number \(n\). In this question, you should first determine the value of the function for two specific inputs, and then show that the function is well-defined before implementing it in Python.

Question a#

Find, with simple reasoning, \(f(10)\) and \(f(13)\).

Question b#

Argue that the function \(f(n)\) is well-defined.

Question c#

Can a functional expression for \(f(n)\) be formulated? Discuss why it is or isn’t possible.

Question d#

Implement the function \(f(n)\) in Python, such that it takes an integer \(n\) as input and returns the next prime number greater than \(n\). Define an auxiliary function (a “helper function”) is_prime(x) to determine if a number is a prime.

from math import sqrt  

def is_prime(x: int) -> bool:  
    """Returns True if x is a prime, otherwise False."""  
    if x < 2:  
        return False  
    # Three lines of code missing 
    return True  

def f(n: int) -> int:  
    """Returns the next prime number greater than n."""  
    candidate = n + 1  
    # Two lines of code missing
    return candidate  

Question e#

Can you update your Python function from the previous question so that the domain of \(f\) is extended from \(\mathbb{N}\) to \(\mathbb{R}\)? E.g., \(f(\pi)\) should return 5.


Exercises – Short Day#

1: Magnitude of Vectors#

Consider the following three vectors in \(\mathbb{R}^3\):

\[\begin{equation*} \pmb{v}_1 = \left[\begin{matrix}-10\\ -10\\ -10\end{matrix}\right], \, \pmb{v}_2 = \left[\begin{matrix}-10\\ -4\\ 14\end{matrix}\right], \, \pmb{v}_3 = \left[\begin{matrix}-10\\ -8\\-12\end{matrix}\right] . \end{equation*}\]

Which vector is the longest? Which vectors are orthogonal to each other? Which two vectors are closest to each other?

Note

We can imagine the vectors as (geometrical) position vectors with their origin at \(\pmb{0}=[0,0,0]^T\) and the endpoint at \(\pmb{v}_i\) for \(i = 1, 2, 3\). Sometimes this is written as \(\overrightarrow{\pmb{0}\pmb{v}_i}\).

2: Partial Derivatives of a Simple Scalar Function#

Find the partial derivatives \(\frac{\partial f}{\partial x_1}\) and \(\frac{\partial f}{\partial x_2}\) of \(f(x_1, x_2) = x_1^3 + 3x_1 x_2 + x_2^3\). Determine the value of the partial derivatives at the point \((x_1, x_2) = (1, 2)\).

3: Different(?) Quadratic Forms#

Let \(\pmb{x} = [x_1,x_2]^T\) be a column vector in \(\mathbb{R}^2\). Define:

\[\begin{equation*} A_1 =\left[\begin{matrix}11 & -12 \\ -12 & 4\end{matrix}\right], \, A_2 =\left[\begin{matrix}11 & 0 \\ -24 & 4\end{matrix}\right], \, A_3 =\left[\begin{matrix} 73/5 & -36/5 \\ -36/5 & 52/5 \end{matrix}\right], \, \end{equation*}\]

and

\[\begin{equation*} \pmb{b}_1 = \left[\begin{matrix}-20\\ 40\end{matrix}\right], \, \pmb{b}_2 = \pmb{b}_1, \, \pmb{b}_3 = \left[\begin{matrix}-44\\ 8\end{matrix}\right], \, c = -60. \end{equation*}\]

Let \(q_i: \mathbb{R}^2 \to \mathbb{R}\) be given by:

\[\begin{equation*} q_i(\pmb{x}) = \pmb{x}^T A_i \pmb{x} + \pmb{b}_i^T \pmb{x} + c \end{equation*}\]

for \(i=1,2,3\). Such functions are called quadratic forms, see this definition.

Question a#

Expand the expression for \(q_1(x_1,x_2)\). First by hand, then using Python. Also expand the expressions for \(q_2(x_1,x_2)\) and \(q_3(x_1,x_2)\) (either by hand or using Python).

Question b#

Is the square matrix \(A\) in a quadratic form (such as \(\pmb{x}^T A \pmb{x}\)) uniquely determined?

Question c#

Plot the graph of the function \(q_1\). Then plot some level curves. What geometric shape do the level curves have? Do the same for \(q_3\).

Question d#

One of the functions has a minimum. Which one? Where is it approximately located? What is this point called for the functions that do not have a minimum?

4: The Softmax Function#

In this exercise we will investigate the softmax function (see the textbook section that covers this function).

Question a#

Calculate softmax of the following three vectors in \(\mathbb{R}^3\). You may do this by hand (use a calculator for values of the exponential function) or by use of Python. State the answers with at least three decimals.

  1. \(\pmb{x}_1=[1,2,-5]^T\)

  2. \(\pmb{x}_2=[10,2,-5]^T\)

  3. \(\pmb{x}_3=[100,2,-5]^T\)

Question b#

What do you observe when the difference between the largest value in the input and the other values increases (such as at the shift from \(\pmb{x}_1\) to \(\pmb{x}_2\) and \(\pmb{x}_3\))? Why might the function be called softmax?

Question c#

Is the softmax function continuous?

Question d#

Consider softmax as a map from \(\mathbb{R}^n\) to \(\mathbb{R}^n\). Is the function injective?

Is the function surjective if we consider the co-domain \(\mathbb{R}^n\)?

5: Quadratic Forms with Symmetric Matrices#

Let \(A\) be an arbitrary \(n \times n\) matrix, and let \(\pmb{x}\) be a column vector in \(\mathbb{R}^n\). Define \(B\) by \(B = (A + A^T)/2\).

Question a#

Show that the matrix \(B\) is symmetric.

Question b#

Show that \(\pmb{x}^T A \pmb{x} = \pmb{x}^T B \pmb{x}\).

Question c#

Conclude that it is always possible to define quadratic forms of the type \(q(\pmb{x}) = \pmb{x}^T A \pmb{x} + \pmb{b}^T \pmb{x} + c\) using a symmetric matrix \(A\).