Tagged “research”
- The Gumbel-max trick for the Bernoulli distribution
- Zero truncated count distributions and their negative log likelihoods.
- Hacking "vanilla" FlashAttention for variable-length inputs
- Visualizing equivariances in transformer neural networks
See all tags.