Copyright | (c) CNRS 2017-Present |
License | AGPL + CECILL v3 |
Maintainer | |
Stability | experimental |
Portability | POSIX |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
Token and occurrence
An occurrence is not necessarily a token. Considering the sentence:
"A rose is a rose is a rose". We may equally correctly state that there
are eight or three words in the sentence. There are, in fact, three word
types in the sentence: "rose", "is" and "a". There are eight word tokens
in a token copy of the line. The line itself is a type. There are not
eight word types in the line. It contains (as stated) only the three
word types, a
, is
and rose
, each of which is unique. So what do we
call what there are eight of? They are occurrences of words. There are
three occurrences of the word type a
, two of is
and three of rose
Source :
- type Occ a = Map a Int
- type Cooc a = Map (a, a) Int
- type FIS a = Map (Set a) Int
- data Group
- = ByStem
- | ByOntology
- type Grouped = Stems
- type Occs = Int
- type Coocs = Int
- type Threshold = Int
- removeApax :: Threshold -> Map ([Text], [Text]) Int -> Map ([Text], [Text]) Int
- cooc :: [[Terms]] -> Map ([Text], [Text]) Int
- coocOnWithLabel :: (Ord label, Ord b) => (a -> b) -> (b -> label) -> [[a]] -> Map (label, label) Coocs
- mkLabelPolicy :: Map Grouped (Map Terms Occs) -> Map Grouped [Text]
- useLabelPolicy :: Map Grouped [Text] -> Grouped -> [Text]
- coocOn :: Ord b => (a -> b) -> [[a]] -> Map (b, b) Int
- coocOn' :: Ord b => (a -> b) -> [a] -> Map (b, b) Int
- coocOnContexts :: (a -> [Text]) -> [[a]] -> Map ([Text], [Text]) Int
- coocOnSingleContext :: (a -> [Text]) -> [a] -> [(([Text], [Text]), Int)]
- occurrences :: [Terms] -> Map Grouped (Map Terms Int)
- occurrencesOn :: (Ord a, Ord b) => (a -> b) -> [a] -> Map b (Map a Int)
- occurrencesWith :: (Foldable list, Ord k, Num a, Show k, Show a, Show (list b)) => (b -> k) -> list b -> Map k a
- sumOcc :: Ord a => [Occ a] -> Occ a