Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Understanding Neural Network Architectures with Attention and Diffusion

Understanding Neural Network Architectures with Attention and Diffusion

Neural networks have revolutionized AI, enabling machines to learn from data and make intelligent decisions. In this talk, we'll explore two popular architectures: Attention models and Diffusion models.

First up, we'll discuss Attention models and how they've contributed to the success of large language models like ChatGPT. We'll explore how the Attention mechanism helps GPT focus on specific parts of a text sequence and how this mechanism has been applied to different tasks in natural language processing.

Next, we'll dive into Diffusion models, a class of generative models that have shown remarkable performance in image synthesis. We'll explain how they work and their potential applications in the creative industry.

By the end of the talk, you'll have a better understanding of these cutting-edge neural network architectures.

Michał Karzyński

July 19, 2023
Tweet

More Decks by Michał Karzyński

Other Decks in Programming

Transcript

  1. ATTENTION AND DIFFUSION
    Understanding Neural Network Architectures
    Michał Karzyński
    EuroPython 2023

    View Slide

  2. THE SPEAKER
    Michał Karzyński (@postrational)
    Software Architect
    ONNX Operators group

    View Slide

  3. THE TALK: MODELS
    Transformers Diffusion
    Natural Language Images
    GPT
    BERT
    T5
    Stable Diffusion
    Midjourney
    DALL-E
    Attention Convolution & Attention

    View Slide

  4. CHAT GPT

    View Slide

  5. DIFFUSION MODELS
    Stable Diffusion Midjourney DALL-E

    View Slide

  6. THE TALK: OPERATIONS
    Linear
    a.k.a.: Dense, Fully-connected

    Convolution Filter scan to produce feature maps
    Attention Key-value store lookup

    View Slide

  7. MULTI-LAYER PERCEPTRON

    View Slide

  8. linear
    linear input
    input
    Multi-layer Perceptron
    Multi-layer Perceptron

    View Slide

  9. linear
    linear
    Multi-layer Perceptron
    Multi-layer Perceptron

    View Slide

  10. OPERATION: CONVOLUTION

    View Slide

  11. OPERATION:
    CONVOLUTION

    View Slide

  12. CONVOLUTIONAL NETWORKS

    View Slide

  13. convolution
    convolution max pool
    max pool linear
    linear
    VGG-16
    VGG-16

    View Slide

  14. ENCODER-DECODER ARCHITECTURE

    View Slide

  15. convolution
    convolution max pool
    max pool deconvolution
    deconvolution
    Convolutional Autoencoder
    Convolutional Autoencoder
    unpooling
    unpooling

    View Slide

  16. SKIP CONNECTIONS

    AND RESIDUAL NETWORKS

    View Slide

  17. convolution
    convolution max pool
    max pool linear
    linear
    ResNet-18
    ResNet-18
    average pool
    average pool add
    add
    +
    + +
    + +
    + +
    + +
    + +
    + +
    + +
    +
    +
    +

    View Slide

  18. U-NET

    View Slide

  19. convolution
    convolution max pool
    max pool deconvolution
    deconvolution
    Convolutional U-Net
    Convolutional U-Net
    unpooling
    unpooling concatenate
    concatenate
    c
    c
    c
    c
    c
    c
    c
    c
    c
    c

    View Slide

  20. OPERATION:
    ATTENTION
    store = {
    'key1': 'value1',
    'key2': 'value2',
    'key3': 'value3',
    }
    query = 'key1'
    value = store[query]

    View Slide

  21. Vaswani, Ashish, et al. "Attention Is All You Need." arXiv:1706.03762v5 [cs.CL], 2017

    View Slide

  22. TRANSFORMER ARCHITECTURE

    View Slide

  23. linear
    linear
    inputs
    inputs
    Nx
    Nx
    append word to output
    append word to output
    K
    K
    V
    V
    V
    V
    Q
    Q
    multi-head attention
    multi-head attention
    Transformer
    Transformer
    positional encoding
    positional encoding
    embedding
    embedding add
    add
    +
    + +
    + +
    + +
    + +
    +
    +
    +
    +
    +
    +
    +
    +
    + +
    + +
    + +
    +
    Nx
    Nx

    View Slide

  24. FORWARD AND BACKWARD
    DIFFUSION

    View Slide

  25. FORWARD
    BACKWARD
    +
    +
    +
    +
    +
    +
    +
    + add genrated noise
    add genrated noise substract estimated noise
    substract estimated noise
    -
    -
    -
    -

    View Slide

  26. LATENT DIFFUSION

    View Slide

  27. Latent Space
    Latent Space
    +
    + +
    +
    +
    + +
    +
    "Logo for
    "Logo for
    EuroPython
    EuroPython
    in Prague"
    in Prague"
    +
    + +
    + +
    + +
    +
    K
    K
    V
    V
    V
    V
    Q
    Q
    +
    + +
    +
    K
    K
    V
    V
    V
    V
    Q
    Q
    +
    + +
    +
    K
    K
    V
    V
    V
    V
    Q
    Q
    +
    + +
    +
    BERT Encoder
    BERT Encoder
    ResBlock
    ResBlock Spatial Transformer
    Spatial Transformer
    linear
    linear
    multi-head attention
    multi-head attention
    Latent Diffusion
    Latent Diffusion
    positional encoding
    positional encoding
    embedding
    embedding
    deconvolution
    deconvolution
    convolution
    convolution up/down sample
    up/down sample

    View Slide

  28. CLOSING REMARKS

    View Slide

  29. THANK YOU

    View Slide