Gargantext.Core.Text.Metrics.Count

Token and occurrence

An occurrence is not necessarily a token. Considering the sentence: "A rose is a rose is a rose". We may equally correctly state that there are eight or three words in the sentence. There are, in fact, three word types in the sentence: "rose", "is" and "a". There are eight word tokens in a token copy of the line. The line itself is a type. There are not eight word types in the line. It contains (as stated) only the three word types, a, is and rose, each of which is unique. So what do we call what there are eight of? They are occurrences of words. There are three occurrences of the word type a, two of is and three of rose. Source : https://en.wikipedia.org/wiki/Type%E2%80%93token_distinction#Occurrences

