andy's blog

euclidean vs cosine distance : an experiment with t-SNE from scratch.

i was learning about t-SNE in one of the ML modules recently at my college, and something caught my attention.

the algorithm models the pairwise similarities between points in high-dimensional space using a probability distribution, with euclidean distance as the default metric.

it was fascinating to see how it optimizes for minimal information loss while projecting high-dimensional data to a lower-dimensional space, all while preserving local structure.

but that got me thinking; why not cosine distance?

in high-dimensional spaces, euclidean distances often “flatten out,” with most points ending up nearly equidistant. cosine similarity, on the other hand, measures angles between vectors and might preserve different structures

so, in this post, i’m building t-SNE from scratch, swapping in cosine distance, and seeing how the visualizations compare.

a teaser of the experiment :

complete blog post coming soon! thank you for your patience!