Introduction
fs2-data
consists of several modules, handling a different data format, and sub-modules for each data format, adding more features (but also more dependencies). The core module for each data format has no other dependency than fs2
and provides tools to parse and transform data in a streaming manner.
For each module, the entry point is an fs2 Pipe
transforming a stream of some input (typically Char
, String
, or Byte
) into some tokens or events. All parsers are streaming parsers, which means that the parsed tree is never constructed in memory by the parser, but instead, is emitted as a sequence of tokens or events (similar to SAX), and that the input data is not fully held in memory either. The parsers read the input as it comes, and emit the tokens as soon as possible, discarding the input data that is not useful anymore.
Parsing textual formats
The general pattern for the text parsers (JSON, XML, CSV, ...) is to have a pipe with this signature:
import fs2._
import fs2.data.text.CharLikeChunks
// the Token type depends on the format
trait Token
def tokens[F[_], T](implicit T: CharLikeChunks[F, T]): Pipe[F, T, Token] =
s => Stream.suspend(???)
The input stream can be of any type T
which can be used to read characters from. fs2-data
provides by default implementations for types Char
and String
, so that you can write:
// Stream of `Char`
Stream.emits("Some input string").through(tokens[Pure, Char])
// res0: Stream[[x]Pure[x], Token] = Stream(..)
// Stream of `String`
Stream.emit("Some input string").through(tokens[Pure, String])
// res1: Stream[[x]Pure[x], Token] = Stream(..)
Reading text inputs from a file
A common pattern when using this library to read data from a file is to start by build a Stream[F, Byte]
as follows:
import cats.effect._
import fs2._
import fs2.io.file.{Files, Flags, Path}
Files[IO]
.readAll(Path("/some/path/to/a/file.data"), 1024, Flags.Read)
// perform your decoding, parsing, and transformation here
.compile
.drain
For textual data formats (JSON, XML, CSV, ...) this stream needs to be decoded according to the file encoding.
Decoding textual inputs
If your file is encoded using UTF-8 or a common single-byte encoding, you can use the built-in support fs2-data
has for these encodings, which lives in the fs2.data.text package.
// for instance if your input is encoded in ISO-8859-1 aka latin1
import fs2.data.text.latin1._
// if you have UTF-8 instead:
// import fs2.data.text.utf8._
Files[IO]
.readAll(Path("/some/path/to/a/file.data"), 1024, Flags.Read)
// decoding is done by the now in scope `CharLikeChunks[IO, Byte]` instance
.through(tokens)
.compile
.drain