Deeplearn week 4

Fourth week

This is just a reminder, that material is not my own and comes from the course Deep learning by Andrew Ng. Posts are just my notes and digressions which help me to memorise the material.

This week will about stepping up from shallow network which we used for classifying images to deep networks with more than two layers.

Deep neural networks

Deep neural networks are like shallow networks but with more layers \(l\).

Notation will be similar but with parameter denoting layer $$z^{[l]} = W^{[l]}a^{[l]} + b^{[l]} \; \forall \; l \in \{1\ldots L\},\\a^{[l]} = \sigma^{[l]}(z^{[l]})$$

where \(x=a^{[0]}\) and \(\hat{y}=a^{[L]}\). This is because we assume input \(x\) to be activation of zero layer and \(\hat{y}\) is the activation of output layer.

Vectorized notation for multiple layers is

$$ \mathbf{Z}^{[l]} = \mathbf{W}^{[l]}\mathbf{A}^{[l-1]}+\mathbf{b}^{[l]},\\ \mathbf{A}^{[l]} = \sigma^{[l]}(\mathbf{Z}^{[l]})$$

Dimensions of the matrices

A graphical representation of example deep neural network https://media.githubusercontent.com/media/mashu/hugo-blog/master/static/img/deeplearn-w4-fig1.png

In this network we find each layer with the following sizes \(n^{[0]}=2, n^{[1]}=3, n^{[2]}=5, n^{[3]}=4, n^{[4]}=2, n^{[5]}=1 \).

For each layer, starting with \(\mathbf{w}\) and \(\mathbf{a}\) we look at the dimensions which are as follow $$\underbrace{\mathbf{z}^{[1]}}_{(3,1)}= \underbrace{\mathbf{w}^{[1]}}_{(3,2)}\underbrace{\mathbf{a}^{[1]}}_{(2,1)}+ \underbrace{\mathbf{b}^{[1]}}_{(3,1)}$$ $$\underbrace{\mathbf{z}^{[2]}}_{(5,1)}= \underbrace{\mathbf{w}^{[2]}}_{(5,3)}\underbrace{\mathbf{a}^{[2]}}_{(3,1)}+ \underbrace{\mathbf{b}^{[2]}}_{(5,1)}$$ $$\underbrace{\mathbf{z}^{[3]}}_{(4,1)}= \underbrace{\mathbf{w}^{[3]}}_{(4,5)}\underbrace{\mathbf{a}^{[3]}}_{(5,1)}+ \underbrace{\mathbf{b}^{[3]}}_{(4,1)}$$ $$\underbrace{\mathbf{z}^{[4]}}_{(2,1)}= \underbrace{\mathbf{w}^{[4]}}_{(2,4)}\underbrace{\mathbf{a}^{[4]}}_{(4,1)}+ \underbrace{\mathbf{b}^{[4]}}_{(2,1)}$$ $$\underbrace{\mathbf{z}^{[5]}}_{(1,1)}= \underbrace{\mathbf{w}^{[5]}}_{(1,2)}\underbrace{\mathbf{a}^{[5]}}_{(2,1)}+ \underbrace{\mathbf{b}^{[5]}}_{(1,1)}$$

So the general rule when figuring out \(\mathbf{w}^{[l]}\) dimensions for layer \(l\) is $$ \underbrace{\mathbf{w}^{[l]}}_{(n^{[l]}, n^{[l-1]})} $$ So the dimensions of \(\mathbf{w}\) are the number of rows corresponding to number of nodes in \(l\) layer and columns are number of nodes in \(l-1\) layer. The dimensions of \(\mathbf{z}\) follow from previous layer \(l-1\), and then its easy to figure out remaining dimensions.

In the back-propagation step, obviously the dimensions are the same, the difference is that we just compute the derivatives.

For the vectorized case we get dimensions as $$\underbrace{\mathbf{Z}^{[l]}}_{(n^{[l]},m)}= \underbrace{\mathbf{W}^{[l]}}_{(n^{[l]},n^{[l-1]})}\underbrace{\mathbf{A}^{[l-1]}}_{(n^{[l-1]},m)}+ \underbrace{\mathbf{b}^{[l]}}_{(n^{[l]},1)}$$

Intuition of what neural networks compute

  • Initial layers detect simple functions (edges)
  • Later layers combine functions to detect more complex functions (shapes)

Simpler shapes combined into more complex shapes (hierarchical representation).

 Share!

 
comments powered by Disqus