I would like to share with you my "full-stack" Excel implementation of the famous Transformer architecture, brining together all the following components:
Input Embeddings
Output Embeddings
Decoder
Encoder
Positional Encoding
Self-Attention
Cross-Attention
Multi-head Attention
Casual Masking
Scaled Dot Product
Skip Connection
LayerNorm
ReLU Activation
Feed Forward
Softmax
Output Probabilities
Demo Video
Download
Very informative
Hope to learn from your course quickly. Thank you so much!