Skip to main content
Blog
-
The Gumbel-max trick for the Bernoulli distribution
-
Zero truncated count distributions and their negative log likelihoods.
-
Hacking "vanilla" FlashAttention for variable-length inputs
-
Visualizing equivariances in transformer neural networks
-
Mapping travels with Folium