gargantext-0.0.7.1.5.3: Search, map, share
Copyright(c) CNRS 2017-Present
LicenseAGPL + CECILL v3
Maintainerteam@gargantext.org
Stabilityexperimental
PortabilityPOSIX
Safe HaskellSafe-Inferred
LanguageHaskell2010

Gargantext.Core.Text.Terms.Mono.Stem

Description

In linguistic morphology and information retrieval, stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form. The stem needs not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root. Source : https://en.wikipedia.org/wiki/Stemming A stemmer for English, for example, should identify the string "cats" (and possibly "catlike", "catty" etc.) as based on the root "cat", and "stems", "stemmer", "stemming", "stemmed" as based on "stem". A stemming algorithm reduces the words "fishing", "fished", and "fisher" to the root word, "fish". On the other hand, "argue", "argued", "argues", "arguing", and "argus" reduce to the stem "argu" (illustrating the case where the stem is not itself a word or root) but "argument" and "arguments" reduce to the stem "argument".

Synopsis

Types

data StemmingAlgorithm #

A stemming algorithm. There are different stemming algorithm, each with different tradeoffs, strengths and weaknesses. Typically one uses one or the other based on the given task at hand.

Constructors

PorterAlgorithm

The porter algorithm is the classic stemming algorithm, possibly one of the most widely used.

LancasterAlgorithm

Slight variation of the porter algorithm; it's more aggressive with stemming, which might or might not be what you want. It also makes some subtle chances to the stem; for example, the stemming of "dancer" using Porter is simply "dancer" (i.e. it cannot be further stemmed). Using Lancaster we would get "dant", which is not a prefix of the initial word anymore.

GargPorterAlgorithm

A variation of the Porter algorithm tailored for Gargantext.

Universal stemming function

stem :: Lang -> StemmingAlgorithm -> Text -> Text #

Stems the input Text based on the input Lang and using the given StemmingAlgorithm.

Handy re-exports

data Lang #

Language of a Text For simplicity, we suppose text has an homogenous language

  • EN == english
  • FR == french
  • DE == deutch
  • IT == italian
  • ES == spanish
  • PL == polish
  • ZH == chinese

... add your language and help us to implement it (:

All languages supported NOTE: Use international country codes https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes TODO This should be deprecated in favor of iso-639 library

Constructors

DE 
EL 
EN 
ES 
FR 
IT 
PL 
PT 
RU 
UK 
ZH 

Instances

Instances details
Arbitrary Lang # 
Instance details

Defined in Gargantext.Core

Methods

arbitrary :: Gen Lang #

shrink :: Lang -> [Lang] #

FromJSON Lang # 
Instance details

Defined in Gargantext.Core

ToJSON Lang # 
Instance details

Defined in Gargantext.Core

Bounded Lang # 
Instance details

Defined in Gargantext.Core

Enum Lang # 
Instance details

Defined in Gargantext.Core

Methods

succ :: Lang -> Lang #

pred :: Lang -> Lang #

toEnum :: Int -> Lang #

fromEnum :: Lang -> Int #

enumFrom :: Lang -> [Lang] #

enumFromThen :: Lang -> Lang -> [Lang] #

enumFromTo :: Lang -> Lang -> [Lang] #

enumFromThenTo :: Lang -> Lang -> Lang -> [Lang] #

Generic Lang # 
Instance details

Defined in Gargantext.Core

Associated Types

type Rep Lang :: Type -> Type #

Methods

from :: Lang -> Rep Lang x #

to :: Rep Lang x -> Lang #

Read Lang # 
Instance details

Defined in Gargantext.Core

Show Lang # 
Instance details

Defined in Gargantext.Core

Methods

showsPrec :: Int -> Lang -> ShowS #

show :: Lang -> String #

showList :: [Lang] -> ShowS #

HasDBid Lang # 
Instance details

Defined in Gargantext.Core

Methods

toDBid :: Lang -> Int #

lookupDBid :: Int -> Maybe Lang #

Eq Lang # 
Instance details

Defined in Gargantext.Core

Methods

(==) :: Lang -> Lang -> Bool #

(/=) :: Lang -> Lang -> Bool #

Ord Lang # 
Instance details

Defined in Gargantext.Core

Methods

compare :: Lang -> Lang -> Ordering #

(<) :: Lang -> Lang -> Bool #

(<=) :: Lang -> Lang -> Bool #

(>) :: Lang -> Lang -> Bool #

(>=) :: Lang -> Lang -> Bool #

max :: Lang -> Lang -> Lang #

min :: Lang -> Lang -> Lang #

Hashable Lang # 
Instance details

Defined in Gargantext.Core

Methods

hashWithSalt :: Int -> Lang -> Int #

hash :: Lang -> Int #

FromHttpApiData Lang # 
Instance details

Defined in Gargantext.Core

ToHttpApiData Lang # 
Instance details

Defined in Gargantext.Core

GQLType Lang # 
Instance details

Defined in Gargantext.Core

Associated Types

type KIND Lang :: DerivingKind #

ToSchema Lang # 
Instance details

Defined in Gargantext.Core

type Rep Lang # 
Instance details

Defined in Gargantext.Core

type Rep Lang = D1 ('MetaData "Lang" "Gargantext.Core" "gargantext-0.0.7.1.5.3-inplace" 'False) (((C1 ('MetaCons "DE" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "EL" 'PrefixI 'False) (U1 :: Type -> Type)) :+: (C1 ('MetaCons "EN" 'PrefixI 'False) (U1 :: Type -> Type) :+: (C1 ('MetaCons "ES" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "FR" 'PrefixI 'False) (U1 :: Type -> Type)))) :+: ((C1 ('MetaCons "IT" 'PrefixI 'False) (U1 :: Type -> Type) :+: (C1 ('MetaCons "PL" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "PT" 'PrefixI 'False) (U1 :: Type -> Type))) :+: (C1 ('MetaCons "RU" 'PrefixI 'False) (U1 :: Type -> Type) :+: (C1 ('MetaCons "UK" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "ZH" 'PrefixI 'False) (U1 :: Type -> Type)))))
type KIND Lang # 
Instance details

Defined in Gargantext.Core

type KIND Lang = TYPE