6. Multi Layer Perceptron

Can you calculate this by hand? ✍️

Jan 08, 2024

One of the struggles to teach a large course on deep learning is to figure out how to set up a unified coding environment for all the students. Should students use our department's own cloud infrastructure? Should students use the free Google Colab, Hugging Face, AWS? Should students install the environment in their own laptops (i.e., BYOD)?

Questions like these are important. Without a unified environment, it would be a nightmare for me and my TAs to support the variety of environments students may be using. But, these questions often distract us from the main goal of learning the key programming concepts.

Is it possible to go old-school? Can students practice coding a deep learning framework using pen and paper, but still connecting theories to practices in a meaningful way? It would certainly lower the barrier of entry to learning AI, if we don't even need computers.

Here's a hands-on coding exercise I created for this purpose.

𝗪𝗮𝗹𝗸𝘁𝗵𝗿𝗼𝘂𝗴𝗵

1️⃣ Given a code template (left), implement the multi layer perception as depicted (right).

2️⃣ 𝗙𝗶𝗿𝘀𝘁 𝗹𝗶𝗻𝗲𝗮𝗿 𝗹𝗮𝘆𝗲𝗿. The size of input features is 3. The size of output features is 4. We can see the size of the weight matrix is 4 by 3. Also, there is an extra column for the biases (bias = T).

3️⃣ The activation function is ReLU. We can see the effect of ReLU on the first feature (-1 -> 0).

4️⃣ 𝗦𝗲𝗰𝗼𝗻𝗱 𝗹𝗶𝗻𝗲𝗮𝗿 𝗹𝗮𝘆𝗲𝗿. The size of input features is 4, which is the same as the size of output features from the previous layer. The size of output features is 2. We can see the size of the weight matrix is 2 by 4. But, there isn't an extra column for the biases (bias = F).

5️⃣ The activation function is ReLU.

6️⃣ 𝗙𝗶𝗻𝗮𝗹 𝗹𝗶𝗻𝗲𝗮𝗿 𝗹𝗮𝘆𝗲𝗿. The size of input features is 2, which is the same as the size of output features from the previous layer. The size of output features is 5. We can see the size of the weight matrix is 5 by 2. Also, there is an extra column for the biases (bias = T).

7️⃣ The activation function is Sigmoid. We can see the effect of Sigmoid, which is a non-linear mapping from raw scores (3, 0, -2, 5, -5) to probability values between 0 and 1.

Thank you for reading AI by Hand ✍️. This post is public so feel free to share it.

6 Code Mlp Export

272KB ∙ PDF file

Download

AI by Hand ✍️

6. Multi Layer Perceptron

Can you calculate this by hand? ✍️

𝗪𝗮𝗹𝗸𝘁𝗵𝗿𝗼𝘂𝗴𝗵

Download

Discussion about this post