Copyright | (c) CNRS 2017 |
---|---|
License | AGPL + CECILL v3 |
Maintainer | team@gargantext.org |
Stability | experimental |
Portability | POSIX |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
Gargantext enables analyzing semi-structured text that should be parsed in order to be analyzed.
The parsers suppose we know the format of the Text (TextFormat data type) according to which the right parser is chosen among the list of available parsers.
This module mainly describe how to add a new parser to Gargantext, please follow the types.
Synopsis
- data FileFormat
- data FileType
- newtype ParseFormatError = ParseFormatError {}
- clean :: ByteString -> ByteString
- parseFile :: FileType -> FileFormat -> FilePath -> IO (Either Text [HyperdataDocument])
- cleanText :: Text -> Text
- parseFormatC :: forall m. MonadBaseControl IO m => FileType -> FileFormat -> ByteString -> m (Either ParseFormatError (Integer, ConduitT () HyperdataDocument IO ()))
- splitOn :: NgramsType -> Maybe Text -> Text -> [Text]
- etale :: [HyperdataDocument] -> [HyperdataDocument]
Documentation
data FileFormat #
Instances
According to the format of Input file, different parser are available.
newtype ParseFormatError #
Instances
clean :: ByteString -> ByteString #
parseFile :: FileType -> FileFormat -> FilePath -> IO (Either Text [HyperdataDocument]) #
Parse file into documents TODO manage errors here TODO: to debug maybe add the filepath in error message
parseFormatC :: forall m. MonadBaseControl IO m => FileType -> FileFormat -> ByteString -> m (Either ParseFormatError (Integer, ConduitT () HyperdataDocument IO ())) #
etale :: [HyperdataDocument] -> [HyperdataDocument] #