Packages

  • package root
    Definition Classes
    root
  • package io
    Definition Classes
    root
  • package dylemma
    Definition Classes
    io
  • package spac

    SPaC (short for "Streaming Parser Combinators") is a library for building stream consumers in a declarative style, specialized for tree-like data types like XML and JSON.

    SPaC (short for "Streaming Parser Combinators") is a library for building stream consumers in a declarative style, specialized for tree-like data types like XML and JSON.

    Many utilities for handling XML and JSON data involve parsing the entire "document" to some DOM model, then inspecting and transforming that model to extract information. The downside to these utilities is that when the document is very large, the DOM may not fit in memory. The workaround for this type of problem is to treat the document as a stream of "events", e.g. "StartElement" and "EndElement" for XML, or "StartObject" and "EndObject" for JSON. The downside to this workaround is that writing code to handle these streams can be complicated and error-prone, especially when the DOM is complicated.

    SPaC's goal is to drastically simplify the process of creating code to handle these streams.

    This package contains the "core" SPaC traits; Parser, Transformer, Splitter, and ContextMatcher.

    See the xml and json subpackages (provided by the xml-spac and json-spac libraries respectively) for specific utilities related to handling XML and JSON event streams.

    Definition Classes
    dylemma
  • package interop
  • package json

    This package provides extensions to the core "spac" library which allow for the handling of JSON data.

    This package provides extensions to the core "spac" library which allow for the handling of JSON data.

    Rather than creating explicit classes that extend Parser, Transformer, and Splitter, this package provides type aliases and implicit extensions. For example, JsonParser[A] is just a type alias for Parser[JsonEvent, A], and JsonParser is just a call to Parser[JsonEvent].

    Implicit JsonParsers are available for each of the JSON primitive types:

    • string
    • number (expressed as Int, Long, Float, or Double)
    • boolean
    • null (expressed as None.type)

    Helpers are available for parsing JSON arrays and objects:

    • JsonParser.listOf[A] to parse an array where each value is an A
    • JsonParser.objectOf[A] to parse an object where the value for each field an A
    • JsonParser.objectOfNullable[A] to parse an object where the value for each field is either null or an A, filtering out the nulls
    • JsonParser.fieldOf[A](fieldName) to parse a specific field from an object

    A DSL for creating json-specific ContextMatchers is provided to make it more convenient to call Splitter.json. For example:

    Splitter.json("foo" \ "bar").as[String].parseFirst

    Can be used to capture rootJson.foo.bar as a String in

    {
      "foo": {
        "bar": "hello"
      }
    }

    To "split" values inside arrays, index-related context matchers are available, e.g.

    Splitter.json("foo" \ anyIndex).as[Int].parseToList

    Can be used to capture each of the numbers in the "foo" array in

    {
      "foo": [1, 2, 3]
    }

    A note about JsonEvents in spac: JSON doesn't have any explicit markers for when a field ends, or when an array index starts or ends; those context changes are essentially inferred by the presence of some other event. For example, instead of a "field end" event, typically there will be either a new "field start" or a token representing the end of the current object. With spac, splitters and context matchers generally operate under the assumption that a "stack push" event (like a field start) will eventually be followed by a corresponding "stack pop" event (i.e. field end).

    To allow for this, these "inferred" events (FieldEnd, IndexStart, IndexEnd) are explicitly represented as JsonEvents in the stream being parsed. Keep this in mind when creating JSON ContextMatchers:

    • field-related matchers will match a stack like case ObjectStart :: FieldStart(_) :: _
    • index-related matchers will match a stack like case ArrayStart :: IndexStart(_) :: _
  • package xml

    This package provides extensions to the core "spac" library which allow for the handling of XML data.

    This package provides extensions to the core "spac" library which allow for the handling of XML data.

    Rather than creating explicit classes that extend Parser, Transformer, and Splitter, this package provides type aliases and implicit extensions. For example, XmlParser[A] is just a type alias for Parser[XmlEvent, A], and XmlParser is just a call to Parser[XmlEvent].

    Three main Parser methods are added to Parser[XmlEvent] via the XmlParserApplyOps implicit class:

    • XmlParser.forText - for capturing raw text
    • XmlParser.attr - for capturing mandatory attributes from elements
    • XmlParser.attrOpt - for capturing optional attributes from elements

    One main Splitter constructor method is added to Splitter via the XmlSplitterApplyOps implicit class:

    • Splitter.xml - for creating splitters based on an inspection of an "element stack"

    Three main Splitter member methods are added to Splitter[XmlEvent, C] via the XmlSplitterOps implicit class:

    • .attr - alias for .joinBy(XmlParser.attr(...))
    • .attrOpt - alias for .joinBy(XmlParser.attrOpt(...))
    • .text - alias for .joinBy(XmlParser.forText)

    A DSL for creating xml-specific ContextMatchers is provided to make it more convenient to call Splitter.xml. For example:

    Splitter.xml("things" \ "thing").attr("foo").parseToList

    Can be used to capture a list of the "foo" attributes in the <thing> elements in

    <things>
       <thing foo="hello" />
       <thing foo="Goodbye">
          <extra>junk</extra>
       </thing>
    </thing>
  • CallerPos
  • ContextChange
  • ContextLocation
  • ContextMatcher
  • ContextPop
  • ContextPush
  • ContextTrace
  • HasLocation
  • LowPriorityTypeReduceImplicits
  • Parser
  • ParserApplyWithBoundInput
  • Signal
  • SingleItemContextMatcher
  • Source
  • SpacException
  • SpacTraceElement
  • Splitter
  • SplitterApplyWithBoundInput
  • StackInterpretation
  • StackLike
  • Transformer
  • TransformerApplyWithBoundInput
  • TypeReduce
  • Unconsable

package spac

SPaC (short for "Streaming Parser Combinators") is a library for building stream consumers in a declarative style, specialized for tree-like data types like XML and JSON.

Many utilities for handling XML and JSON data involve parsing the entire "document" to some DOM model, then inspecting and transforming that model to extract information. The downside to these utilities is that when the document is very large, the DOM may not fit in memory. The workaround for this type of problem is to treat the document as a stream of "events", e.g. "StartElement" and "EndElement" for XML, or "StartObject" and "EndObject" for JSON. The downside to this workaround is that writing code to handle these streams can be complicated and error-prone, especially when the DOM is complicated.

SPaC's goal is to drastically simplify the process of creating code to handle these streams.

This package contains the "core" SPaC traits; Parser, Transformer, Splitter, and ContextMatcher.

See the xml and json subpackages (provided by the xml-spac and json-spac libraries respectively) for specific utilities related to handling XML and JSON event streams.

Source
package.scala
Linear Supertypes
AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. spac
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Package Members

  1. package interop
  2. package json

    This package provides extensions to the core "spac" library which allow for the handling of JSON data.

    This package provides extensions to the core "spac" library which allow for the handling of JSON data.

    Rather than creating explicit classes that extend Parser, Transformer, and Splitter, this package provides type aliases and implicit extensions. For example, JsonParser[A] is just a type alias for Parser[JsonEvent, A], and JsonParser is just a call to Parser[JsonEvent].

    Implicit JsonParsers are available for each of the JSON primitive types:

    • string
    • number (expressed as Int, Long, Float, or Double)
    • boolean
    • null (expressed as None.type)

    Helpers are available for parsing JSON arrays and objects:

    • JsonParser.listOf[A] to parse an array where each value is an A
    • JsonParser.objectOf[A] to parse an object where the value for each field an A
    • JsonParser.objectOfNullable[A] to parse an object where the value for each field is either null or an A, filtering out the nulls
    • JsonParser.fieldOf[A](fieldName) to parse a specific field from an object

    A DSL for creating json-specific ContextMatchers is provided to make it more convenient to call Splitter.json. For example:

    Splitter.json("foo" \ "bar").as[String].parseFirst

    Can be used to capture rootJson.foo.bar as a String in

    {
      "foo": {
        "bar": "hello"
      }
    }

    To "split" values inside arrays, index-related context matchers are available, e.g.

    Splitter.json("foo" \ anyIndex).as[Int].parseToList

    Can be used to capture each of the numbers in the "foo" array in

    {
      "foo": [1, 2, 3]
    }

    A note about JsonEvents in spac: JSON doesn't have any explicit markers for when a field ends, or when an array index starts or ends; those context changes are essentially inferred by the presence of some other event. For example, instead of a "field end" event, typically there will be either a new "field start" or a token representing the end of the current object. With spac, splitters and context matchers generally operate under the assumption that a "stack push" event (like a field start) will eventually be followed by a corresponding "stack pop" event (i.e. field end).

    To allow for this, these "inferred" events (FieldEnd, IndexStart, IndexEnd) are explicitly represented as JsonEvents in the stream being parsed. Keep this in mind when creating JSON ContextMatchers:

    • field-related matchers will match a stack like case ObjectStart :: FieldStart(_) :: _
    • index-related matchers will match a stack like case ArrayStart :: IndexStart(_) :: _
  3. package xml

    This package provides extensions to the core "spac" library which allow for the handling of XML data.

    This package provides extensions to the core "spac" library which allow for the handling of XML data.

    Rather than creating explicit classes that extend Parser, Transformer, and Splitter, this package provides type aliases and implicit extensions. For example, XmlParser[A] is just a type alias for Parser[XmlEvent, A], and XmlParser is just a call to Parser[XmlEvent].

    Three main Parser methods are added to Parser[XmlEvent] via the XmlParserApplyOps implicit class:

    • XmlParser.forText - for capturing raw text
    • XmlParser.attr - for capturing mandatory attributes from elements
    • XmlParser.attrOpt - for capturing optional attributes from elements

    One main Splitter constructor method is added to Splitter via the XmlSplitterApplyOps implicit class:

    • Splitter.xml - for creating splitters based on an inspection of an "element stack"

    Three main Splitter member methods are added to Splitter[XmlEvent, C] via the XmlSplitterOps implicit class:

    • .attr - alias for .joinBy(XmlParser.attr(...))
    • .attrOpt - alias for .joinBy(XmlParser.attrOpt(...))
    • .text - alias for .joinBy(XmlParser.forText)

    A DSL for creating xml-specific ContextMatchers is provided to make it more convenient to call Splitter.xml. For example:

    Splitter.xml("things" \ "thing").attr("foo").parseToList

    Can be used to capture a list of the "foo" attributes in the <thing> elements in

    <things>
       <thing foo="hello" />
       <thing foo="Goodbye">
          <extra>junk</extra>
       </thing>
    </thing>

Type Members

  1. case class CallerPos(filename: String, line: Int) extends Product with Serializable

    Represents a location in code that called a method.

    Represents a location in code that called a method. An implicit instance of this class will be automatically derived by a macro on-demand. CallerPos's ultimate purpose is to be present in certain SpacTraceElement classes, helping to point to specific splitters or parse calls in the event of a parsing error.

  2. sealed trait ContextChange[+In, +C] extends AnyRef

    Represents either entering (ContextPush) or exiting (ContextPop) some matched context within a stream of inputs.

    Represents either entering (ContextPush) or exiting (ContextPop) some matched context within a stream of inputs.

    ContextChanges will generally be used to designate "sub-stream" boundaries, i.e. a selection of xml elements from within a stream, but may be used more generally to attach a stack-like state to stream transformers.

    In

    The value type of the elements in the stream being inspected

    C

    The type of the matched context

  3. trait ContextLocation extends AnyRef

    A map-like representation of some location in a stream, used like stack trace elements for reporting errors in stream processing.

  4. trait ContextMatcher[Elem, +A] extends AnyRef

    An object responsible for inspecting a stack of StartElement events and determining if they correspond to some "context" value of type A.

    An object responsible for inspecting a stack of StartElement events and determining if they correspond to some "context" value of type A.

    ContextMatchers play a primary role in splitting an XML event stream into "substreams", i.e. each substream is defined as the series of consecutive events during which the XML tag stack matches a context.

    ContextMatchers are intended to be transformed and combined with each other in order to build up more complex matching functionality. See also: SingleElementContextMatcher, which contains additional combination methods and some specialized transformation methods.

    A

    The type of the matched context.

  5. case class ContextPush[+In, +C](location: ContextTrace[In], context: C) extends ContextChange[In, C] with Product with Serializable

  6. case class ContextTrace[+A](elems: Chain[(ContextLocation, A)]) extends Product with Serializable

  7. trait HasLocation extends AnyRef

    Marker trait used by SpacTraceElement.InInput to extract location information from inputs that cause parsing exceptions.

  8. trait LowPriorityTypeReduceImplicits extends AnyRef

  9. trait Parser[-In, +Out] extends AnyRef

    Primary "spac" abstraction which represents a sink for data events.

    Primary "spac" abstraction which represents a sink for data events.

    Parsers are responsible for interpreting a stream of In events as a single result of type Out. The actual interpretation is performed by a Parser.Handler which the Parser is responsible for constructing. Handlers may be internally-mutable, and so they are generally only constructed by the parse helper methods or by other handlers. Parsers themselves are immutable, acting as "handler factories", and so they may be freely reused.

    A parser differs from typical "fold" operations in that it may choose to abort early with a result, leaving the remainder of the data stream untouched.

    In

    event/input type

    Out

    result type

  10. class ParserApplyWithBoundInput[In] extends AnyRef

    Convenience version of the Parser companion object, which provides parser constructors with the In type already specified.

    Convenience version of the Parser companion object, which provides parser constructors with the In type already specified. Integrations for XML and JSON will generally create implicit classes to add methods to this class for In = XmlEvent and In = JsonEvent respectively.

  11. sealed trait Signal extends AnyRef

    Value used by Transformer.Handler to indicate to its upstream producer whether or not the handler wants to continue receiving values.

  12. trait SingleItemContextMatcher[Item, +A] extends ContextMatcher[Item, A]

    Specialization of ContextMatcher which only checks the first element in the stack for matching operations.

    Specialization of ContextMatcher which only checks the first element in the stack for matching operations. Transformation operations on single-element matchers will yield other single-element matchers (rather than the base ContextMatcher type). Combination operations involving other single-element matchers will also yield single-element matchers. SingleElementContextMatchers form the building blocks of more complex matchers.

    A

    The type of the matched context.

  13. trait Source[+A] extends AnyRef

    A Source[A] is like an Iterable[A] but with a built-in assumption that the iterator may be closeable, intended for use as a convenient argument to a Parser's parse method.

    A Source[A] is like an Iterable[A] but with a built-in assumption that the iterator may be closeable, intended for use as a convenient argument to a Parser's parse method.

    The spac core library avoids depending on Cats-Effect and FS2 (to avoid introducing "dependency hell" situations for projects that must depend on pre-3.0 versions of those projects), so this class acts as a stand-in for both cats.effect.Resource and fs2.Stream for non-async usage.

    A

    Type of item emitted by Iterators from this Source

  14. abstract class SpacException[Self <: SpacException[Self]] extends Exception with NoStackTrace

    Base class for all exceptions thrown by Spac parsers.

    Base class for all exceptions thrown by Spac parsers. A SpacException holds a spacTrace, which is similar to a *stack* trace, but uses a specialized element type to hold helpful debug information about the cause and context of the exception, and the input that caused it.

    SpacException uses NoStackTrace to suppress the usual stack trace, since exceptions thrown by a Parser will not have useful stack trace information for end users of the Spac framework.

    Self

    self-type used in the type signature of withSpacTrace

  15. trait SpacTraceElement extends AnyRef

    A play on words vs StackTraceElement, a *Spac* trace element represents some contextual location inside the logic of a spac Parser, or the location of an input to that parser.

    A play on words vs StackTraceElement, a *Spac* trace element represents some contextual location inside the logic of a spac Parser, or the location of an input to that parser. SpacTraceElements are used by SpacException to provide useful debugging information for when a Parser fails.

  16. trait Splitter[In, +C] extends AnyRef

    Primary "spac" abstraction that acts as a selector for sub-streams within a single input stream.

    Primary "spac" abstraction that acts as a selector for sub-streams within a single input stream.

    A "sub-stream" is some series of consecutive values from the original stream, identified by a "context" value. Sub-streams do not overlap with each other.

    For example, when handling a stream of XML events, you might want to create a Splitter that identifies the events representing elements at a specific location within the XML; something like an XPATH that operates on streams. When using xml-spac, you might construct a splitter like Splitter.xml("rootElem" \ "things" \ "thing"). This would identify a new sub-stream for each <thing> element that appears inside a <things> element, inside the <rootElem> element. An example sub-stream for a <thing> element might be ElemStart("thing"), Text("hello"), ElemEnd("thing").

    A Splitter's general goal is to attach a Parser or Transformer to each sub-stream, passing the contents of that sub-stream through the attached Parser or Transformer in order to get an interpretation of that sub-stream (i.e. the Parser's result, or some emitted outputs from a Transformer). With the <thing> example above, you might attach a parser that concatenates the context all Text events it sees. I.e. XmlParser.forText. Since a separate parser handler will run for each sub-stream, this becomes something like "A stream of Strings which each represent the concatenated text from an individual <thing> element".

    In

    Data event type for the input stream

    C

    Context type used to identify each sub-stream

  17. class SplitterApplyWithBoundInput[In] extends AnyRef

  18. sealed trait StackInterpretation[+In, +Elem] extends AnyRef

    Outcome of a StackLike[In, Elem], indicating whether a given input was a stack push/pop, and whether that push/pop should be treated as happening before or after the input that caused it.

  19. trait StackLike[In, +Elem] extends AnyRef

    Typeclass that perceives a subset of In values as either "stack push" or "stack pop" events.

    Typeclass that perceives a subset of In values as either "stack push" or "stack pop" events. For example, with XML, an ElemStart event can be perceived as a "stack push", and a corresponding ElemEnd event can be preceived as a "stack pop".

  20. trait Transformer[-In, +Out] extends AnyRef

    Primary "spac" abstraction which represents a transformation stage for a stream of data events

    Primary "spac" abstraction which represents a transformation stage for a stream of data events

    Transformers effectively transform a stream of In events into a stream of Out events. The actual stream handling logic is defined by a Transformer.Handler, which a Transformer is responsible for constructing. Handlers may be internally-mutable, and so they are generally only constructed by other handlers. Transformers themselves are immutable, acting as "handler factories", and so they may be freely reused.

    A transformer may choose to abort in response to any input event, as well as emit any number of outputs in response to an input event or the EOF signal.

    In

    The incoming event type

    Out

    The outgoing event type

  21. class TransformerApplyWithBoundInput[In] extends AnyRef

    Convenience version of the Transformer companion object, which provides transformer constructors with the In type already specified.

  22. trait TypeReduce[-In1, -In2] extends AnyRef

    Type-level tuple reduction function that treats Unit as an Identity.

    Type-level tuple reduction function that treats Unit as an Identity. For example:

    TypeReduce[(Unit, Unit)]{ type Out = Unit }
    TypeReduce[(T, Unit)]{ type Out = T }
    TypeReduce[(Unit, T)]{ type Out = T }
    TypeReduce[(L, R)]{ type Out = (L, R) }
  23. trait Unconsable[C[_]] extends AnyRef

    Typeclass for collections that can be efficiently split into a head element and a tail collection as long as they are not empty.

Value Members

  1. object CallerPos extends Serializable

  2. object ContextLocation

  3. object ContextMatcher

  4. object ContextPop extends ContextChange[Nothing, Nothing] with Product with Serializable

  5. object ContextTrace extends Serializable

  6. object Parser

  7. object Signal

  8. object SingleItemContextMatcher

  9. object Source

    Note: this companion object provides a few very basic Source-constructor helpers, but the real useful functionality is provided by the "parser backend" modules like xml-spac-javax and json-spac-jackson, via JavaxSource and JacksonSource.

  10. object SpacException extends Serializable

  11. object SpacTraceElement

  12. object Splitter

  13. object StackInterpretation

  14. object Transformer

  15. object TypeReduce extends LowPriorityTypeReduceImplicits

  16. object Unconsable

Inherited from AnyRef

Inherited from Any

Main Concepts

All event consumers in SPaC are defined in terms of Parser, Splitter, and Transformer. Each of these three classes are interrelated, but with the eventual goal of producing one or more interpreted values given an incoming stream of event data.

Error Handling

SPaC parsers should only ever throw SpacException from their parse and parseF methods. SpacException is a specialized exception type which uses "Spac Trace" elements instead of the usual "Stack Trace"; these provide more useful information like what part of the parser failed, some contextual information about what event caused the parser to fail.

Capturing Context Data

When dealing with tree-like documents, it is often important to be able to express a relative location in that data, or to produce some value based on the current location within the tree. SPaC refers to these locations as "context".

Utility and Supporting Classes

Most of these classes and traits are typeclasses that the primary types operate in terms of. Generally you don't directly interact with these.

Ungrouped