Skip to main content

Tagged “research”

  1. The Gumbel-max trick for the Bernoulli distribution
  2. Zero truncated count distributions and their negative log likelihoods.
  3. Hacking "vanilla" FlashAttention for variable-length inputs
  4. Visualizing equivariances in transformer neural networks

See all tags.