SPaC (short for "Streaming Parser Combinators") is a library for building stream consumers in a declarative style, specialized for tree-like data types like XML and JSON.
SPaC (short for "Streaming Parser Combinators") is a library for building stream consumers in a declarative style, specialized for tree-like data types like XML and JSON.
Many utilities for handling XML and JSON data involve parsing the entire "document" to some DOM model, then inspecting and transforming that model to extract information. The downside to these utilities is that when the document is very large, the DOM may not fit in memory. The workaround for this type of problem is to treat the document as a stream of "events", e.g. "StartElement" and "EndElement" for XML, or "StartObject" and "EndObject" for JSON. The downside to this workaround is that writing code to handle these streams can be complicated and error-prone, especially when the DOM is complicated.
SPaC's goal is to drastically simplify the process of creating code to handle these streams.
This package contains the "core" SPaC traits; Parser
, Transformer
, Splitter
, and ContextMatcher
.
See the xml
and json
subpackages (provided by the xml-spac
and json-spac
libraries respectively)
for specific utilities related to handling XML and JSON event streams.
Provides implicits to allow for interop between the core SPaC classes and fs2 / cats-effect.
Provides implicits to allow for interop between the core SPaC classes and fs2 / cats-effect.
- Parser
gets toPipe
and parseF
- Transformer
gets toPipe
- Source
gets toResource
and toStream
This package provides extensions to the core "spac" library which allow for the handling of JSON data.
This package provides extensions to the core "spac" library which allow for the handling of JSON data.
Rather than creating explicit classes that extend Parser
, Transformer
, and Splitter
,
this package provides type aliases and implicit extensions.
For example, JsonParser[A]
is just a type alias for Parser[JsonEvent, A]
,
and JsonParser
is just a call to Parser[JsonEvent]
.
Implicit JsonParsers are available for each of the JSON primitive types:
string
number
(expressed as Int
, Long
, Float
, or Double
)boolean
null
(expressed as None.type
)Helpers are available for parsing JSON arrays and objects:
JsonParser.listOf[A]
to parse an array
where each value is an
A
JsonParser.objectOf[A]
to parse an object
where the value for each field an A
JsonParser.objectOfNullable[A]
to parse an object
where the value for each field is either null
or an A
, filtering out the null
sJsonParser.fieldOf[A](fieldName)
to parse a specific field from an objectA DSL for creating json-specific ContextMatchers is provided to make it more convenient to call Splitter.json
.
For example:
Splitter.json("foo" \ "bar").as[String].parseFirst
Can be used to capture rootJson.foo.bar
as a String in
{ "foo": { "bar": "hello" } }
To "split" values inside arrays, index-related context matchers are available, e.g.
Splitter.json("foo" \ anyIndex).as[Int].parseToList
Can be used to capture each of the numbers in the "foo" array in
{ "foo": [1, 2, 3] }
A note about JsonEvents in spac: JSON doesn't have any explicit markers for when a field ends, or when an array index starts or ends; those context changes are essentially inferred by the presence of some other event. For example, instead of a "field end" event, typically there will be either a new "field start" or a token representing the end of the current object. With spac, splitters and context matchers generally operate under the assumption that a "stack push" event (like a field start) will eventually be followed by a corresponding "stack pop" event (i.e. field end).
To allow for this, these "inferred" events (FieldEnd, IndexStart, IndexEnd) are explicitly represented as JsonEvents in the stream being parsed. Keep this in mind when creating JSON ContextMatchers:
field
-related matchers will match a stack like case ObjectStart :: FieldStart(_) :: _
index
-related matchers will match a stack like case ArrayStart :: IndexStart(_) :: _
This package provides extensions to the core "spac" library which allow for the handling of XML data.
This package provides extensions to the core "spac" library which allow for the handling of XML data.
Rather than creating explicit classes that extend Parser
, Transformer
, and Splitter
,
this package provides type aliases and implicit extensions.
For example, XmlParser[A]
is just a type alias for Parser[XmlEvent, A]
,
and XmlParser
is just a call to Parser[XmlEvent]
.
Three main Parser methods are added to Parser[XmlEvent]
via the XmlParserApplyOps
implicit class:
XmlParser.forText
- for capturing raw textXmlParser.attr
- for capturing mandatory attributes from elementsXmlParser.attrOpt
- for capturing optional attributes from elementsOne main Splitter constructor method is added to Splitter
via the XmlSplitterApplyOps
implicit class:
Splitter.xml
- for creating splitters based on an inspection of an "element stack"Three main Splitter member methods are added to Splitter[XmlEvent, C]
via the XmlSplitterOps
implicit class:
.attr
- alias for .joinBy(XmlParser.attr(...))
.attrOpt
- alias for .joinBy(XmlParser.attrOpt(...))
.text
- alias for .joinBy(XmlParser.forText)
A DSL for creating xml-specific ContextMatchers is provided to make it more convenient to call Splitter.xml
.
For example:
Splitter.xml("things" \ "thing").attr("foo").parseToList
Can be used to capture a list of the "foo" attributes in the <thing>
elements in
<things> <thing foo="hello" /> <thing foo="Goodbye"> <extra>junk</extra> </thing> </thing>