This is a brief summary of paper for me to study and organize it, From word embeddings to document distances (Kusner et al., ICML 2015) I read and studied.

WMD use word embeddings to calculate the distance so that it can calculate even though there is no common word. The assumption is that similar words should have similar vectors.

Captured from Kusner et al. publication

First of all, lower case and removing stopwords is an essential step to reduce complexity and preventing misleading.

Sentence 1: obama speaks media illinois
Sentence 2: president greets press chicago

Retrieve vectors from any pre-trained word embeddings models. It can be GloVe, word2vec, fasttext or custom vectors. After that it using normalized bag-of-words (nBOW) to represent the weight or importance. It assumes that higher frequency implies that it is more important.

Captured from Kusner et al. publication

Strengths of WMD:

Hyperparameter-free
Straight-forward to understand and use, highly interpretable
leads to unprecedented low k-nearest neighbor document classification error rates

Note(Abstract): They present the Word Mover’s Distance (WMD), a novel distance function between text documents. Their work is based on recent results in word embeddings that learn semantically meaningful representations for words from local cooccurrences in sentences. The WMD distance measures the dissimilarity between two text documents as the minimum amount of distance that the embedded words of one document need to “travel” to reach the embedded words of another document.

Download URL:
The paper: From word embeddings to document distances (Kusner et al., ICML 2015)

From Word Embeddings To Document Distances

Title of paper - From Word Embeddings To Document Distances

From Word Embeddings To Document Distances

Title of paper - From Word Embeddings To Document Distances

Reference