Skip to main content

Blog

  1. The Gumbel-max trick for the Bernoulli distribution
  2. Zero truncated count distributions and their negative log likelihoods.
  3. Hacking "vanilla" FlashAttention for variable-length inputs
  4. Visualizing equivariances in transformer neural networks
  5. Mapping travels with Folium