This paper,Enriching Word Vectors with Subword Information (Bojanowski et al., arXiv 2017) have been arguing it is better on inference of words out of Vocabulary by using character levels n-gram.

i.e. word2vec treats each word in corpus like an atomic entity and generates a vector for each word.

But, in this paper, treats each word as compossed of character ngrams.

So the vector for a word is made fo the sum of this character n grams.

Let’s see an example like this :

the word of apple is a sum of the vectors of the n-grams

“<ap”, “app”, “appl”, “apple”, “apple>”, “ppl”, “pple”, “pple>”, “ple”, “ple>”, “le>”

”<” and “>” is intended for order of word.

On the example above, using hyperparameters for smallest ngram is 3 and largest ngram is 6. i.e. minimum is 3 and maximum is 6.

that is why they argue on small size of corpus the vector using the character level n grams is better than the vector using word as atomic entity.

They present a word by the sum of vector representations of its n-grams.

Tip: be careful of how to handle n gram of character level
they dealt with the following about "apple" :
> "<ap", "app", "appl", "apple", "apple>", "ppl", "pple", "pple>", "ple", "ple>", "le>"
i.e. ngram size is between 3 and 6 on the example above.

Note: Continuous word representations ingores the morphology of words, by assigning a distinct vector to each word. This is limitation, especially for languages with large vocabularies and many rare words. In this paper, they propose a enw approach based on the skip gram model, where each word is represented as a bag of character n-grams. A vector representation is associated to each character n-gram; words being represented as the sum of these representations.

Download URL:
The paper: Enriching Word Vectors with Subword Information (Bojanowski et al., arXiv 2017)

Reference

Paper
- arXiv Version: Enriching Word Vectors with Subword Information (Bojanowski et al., arXiv 2017)
How to use html for alert
- how to use icon
Quora
- What is the main difference between word2vec and fastText?

Enriching Word Vectors with Subword Information

Title of paper - Enriching Word Vectors with Subword Information

Enriching Word Vectors with Subword Information

Title of paper - Enriching Word Vectors with Subword Information

Reference