JSON-LD 1.0

A Context-based JSON Serialization for Linking Data

Unofficial Draft 12 September 2011

Editors:
Manu Sporny, Digital Bazaar
Gregg Kellogg, Kellogg Associates
Dave Longley, Digital Bazaar
Authors:
Manu Sporny, Digital Bazaar
Gregg Kellogg, Kellogg Associates
Dave Longley, Digital Bazaar
Mark Birbeck, Backplane Ltd.

This document is also available in this non-normative format: diff to previous version.


Abstract

JSON [RFC4627] has proven to be a highly useful object serialization and messaging format. In an attempt to harmonize the representation of Linked Data in JSON, this specification outlines a common JSON representation format for expressing directed graphs; mixing both Linked Data and non-Linked Data in a single document.

Status of This Document

This document is merely a public working draft of a potential specification. It has no official standing of any kind and does not represent the support or consensus of any standards organisation.

This document is an experimental work in progress.

Table of Contents

1. Introduction

JSON, as specified in [RFC4627], is a simple language for representing data on the Web. Linked Data is a technique for creating a graph of interlinked data across different documents or Web sites. Data entities are described using IRIs, which are typically dereferencable and thus may be used to find more information about an entity, creating a "Web of Knowledge". JSON-LD is intended to be a simple publishing method for expressing not only Linked Data in JSON, but also for adding semantics to existing JSON.

JSON-LD is designed as a light-weight syntax that can be used to express Linked Data. It is primarily intended to be a way to use Linked Data in Javascript and other Web-based programming environments. It is also useful when building interoperable Web services and when storing Linked Data in JSON-based document storage engines. It is practical and designed to be as simple as possible, utilizing the large number of JSON parsers and libraries available today. It is designed to be able to express key-value pairs, RDF data, RDFa [RDFA-CORE] data, Microformats [MICROFORMATS] data, and Microdata [MICRODATA]. That is, it supports every major Web-based structured data model in use today.

The syntax does not necessarily require applications to change their JSON, but allows to easily add meaning by adding context in a way that is either in-band or out-of-band. The syntax is designed to not disturb already deployed systems running on JSON, but provide a smooth upgrade path from JSON to JSON with added semantics. Finally, the format is intended to be easy to parse, efficient to generate, stream-based and document-based processing compatible, and require a very small memory footprint in order to operate.

1.1 How to Read this Document

This document is a detailed specification for a serialization of Linked Data in JSON. The document is primarily intended for the following audiences:

To understand the basics in this specification you must first be familiar with JSON, which is detailed in [RFC4627]. To understand the API and how it is intended to operate in a programming environment, it is useful to have working knowledge of the JavaScript programming language [ECMA-262] and WebIDL [WEBIDL]. To understand how JSON-LD maps to RDF, it is helpful to be familiar with the basic RDF concepts [RDF-CONCEPTS].

Examples may contain references to existing vocabularies and use prefixes to refer to Web Vocabularies. The following is a list of all vocabularies and their prefix abbreviations, as used in this document:

JSON [RFC4627] defines several terms which are used throughout this document:

JSON Object
An object structure is represented as a pair of curly brackets surrounding zero or more name/value pairs (or members). A name is a string. A single colon comes after each name, separating the name from the value. A single comma separates a value from a following name. The names within an object should be unique.
array
An array is an ordered collection of values. An array structure is represented as square brackets surrounding zero or more values (or elements). Elements are separated by commas. Within JSON-LD, array order is not preserved by default, unless specific markup is provided (see Lists). This is because the basic data model of JSON-LD is a directed graph, which is inherently unordered.
string
A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes. A character is represented as a single character string.
number
A number is is similar to that used in most programming languages, except that the octal and hexadecimal formats are not used and that leading zeros are not allowed.
true and false
Boolean values.
null
The use of the null value is undefined within JSON-LD.
Supporting null in JSON-LD might have a number of advantages and should be evaluated. This is currently an open issue.

1.2 Contributing

There are a number of ways that one may participate in the development of this specification:

2. Design

The following section outlines the design goals and rationale behind the JSON-LD markup language.

2.1 Goals and Rationale

A number of design considerations were explored during the creation of this markup language:

Simplicity
Developers need only know JSON and three keywords to use the basic functionality in JSON-LD. No extra processors or software libraries are necessary to use JSON-LD in its most basic form. The language attempts to ensure that developers have an easy learning curve.
Compatibility
The JSON-LD markup must be 100% compatible with JSON. This ensures that all of the standard JSON libraries work seamlessly with JSON-LD documents.
Expressiveness
The syntax must be able to express directed graphs, which have been proven to be able to simply express almost every real world data model.
Terseness
The JSON-LD syntax must be very terse and human readable, requiring as little as possible effort from the developer.
Zero Edits, most of the time
JSON-LD provides a mechanism that allows developers to specify context in a way that is out-of-band. This allows organizations that have already deployed large JSON-based infrastructure to add meaning to their JSON documents in a way that is not disruptive to their day-to-day operations and is transparent to their current customers. At times, mapping JSON to a graph representation can become difficult. In these instances, rather than having JSON-LD support esoteric markup, we chose not to support the use case and support a simplified syntax instead. So, while Zero Edits is a goal, it is not always possible without adding great complexity to the language.
Streaming
The format supports both document-based and stream-based processing.

2.2 Linked Data

The following definition for Linked Data is the one that will be used for this specification.

  1. Linked Data is a set of documents, each containing a representation of a linked data graph.
  2. A linked data graph is an unordered labeled directed graph, where nodes are subjects or objects, and edges are properties.
  3. A subject is any node in a linked data graph with at least one outgoing edge.
  4. A subject should be labeled with an IRI (an Internationalized Resource Identifier as described in [RFC3987]).
  5. An object is a node in a linked data graph with at least one incoming edge.
  6. An object may be labeled with an IRI.
  7. An object may be a subject and object at the same time.
  8. A property is an edge of the linked data graph.
  9. A property should be labeled with an IRI.
  10. An IRI that is a label in a linked data graph should be dereferencable to a Linked Data document describing the labeled subject, object or property.
  11. A literal is an object with a label that is not an IRI

Note that the definition for Linked Data above is silent on the topic of unlabeled nodes. Unlabeled nodes are not considered Linked Data. However, this specification allows for the expression of unlabled nodes, as most graph-based data sets on the Web contain a number of associated nodes that are not named and thus are not directly de-referenceable.

2.3 Linking Data

An Internationalized Resource Identifier (IRI), as described in [RFC3987], is a mechanism for representing unique identifiers on the web. In Linked Data, an IRI is commonly used for expressing a subject, a property or an object.

JSON-LD defines a mechanism to map JSON terms, i.e., keys and values, to IRIs. This does not mean that JSON-LD requires every key or value to be an IRI, but rather ensures that keys and values can be mapped to IRIs if the developer desires to transform their data into Linked Data. There are a few techniques that can ensure that developers will generate good Linked Data for the Web. JSON-LD formalizes those techniques.

We will be using the following JSON markup as the example for the rest of this section:

{
  "name": "Manu Sporny",
  "homepage": "http://manu.sporny.org/",
  "avatar": "http://twitter.com/account/profile_image/manusporny"
}

2.4 The Context

In JSON-LD, a context is used to map terms, i.e., keys and values in an JSON document, to IRIs. A term is a short word that may be expanded to an IRI. The Web uses IRIs for unambiguous identification. The idea is that these terms mean something that may be of use to other developers and that it is useful to give them an unambiguous identifier. That is, it is useful for terms to expand to IRIs so that developers don't accidentally step on each other's Web Vocabulary terms. For example, the term name may map directly to the IRI http://xmlns.com/foaf/0.1/name. This allows JSON-LD documents to be constructed using the common JSON practice of simple name/value pairs while ensuring that the data is useful outside of the page, API or database in which it resides.

These Linked Data terms are typically collected in a context document that would look something like this:

{
    "name": "http://xmlns.com/foaf/0.1/name",
    "homepage": "http://xmlns.com/foaf/0.1/homepage",
    "avatar": "http://xmlns.com/foaf/0.1/avatar"
}

This context document can then be used in an JSON-LD document by adding a single line. The JSON markup as shown in the previous section could be changed as follows to link to the context document:

{
  "@context": "http://example.org/json-ld-contexts/person",
  "name": "Manu Sporny",
  "homepage": "http://manu.sporny.org/",
  "avatar": "http://twitter.com/account/profile_image/manusporny"
}

The addition above transforms the previous JSON document into a JSON document with added semantics because the @context specifies how the name, homepage, and avatar terms map to IRIs. Mapping those keys to IRIs gives the data global context. If two developers use the same IRI to describe a property, they are more than likely expressing the same concept. This allows both developers to re-use each others data without having to agree to how their data will inter-operate on a site-by-site basis. Contexts may also contain datatype information for certain terms as well as other processing instructions for the JSON-LD processor.

Contexts may be specified in-line. This ensures that JSON-LD documents can be processed when a JSON-LD processor does not have access to the Web.

{
  "@context": {
    "name": "http://xmlns.com/foaf/0.1/name",
    "homepage": "http://xmlns.com/foaf/0.1/homepage",
    "avatar": "http://xmlns.com/foaf/0.1/avatar"
  },
  "name": "Manu Sporny",
  "homepage": "http://manu.sporny.org/",
  "avatar": "http://twitter.com/account/profile_image/manusporny"
}

JSON-LD strives to ensure that developers don't have to change the JSON that is going into and being returned from their Web APIs. This means that developers can also specify a context for JSON data in an out-of-band fashion. This is described later in this document.

JSON-LD uses a special type of machine-readable document called a Web Vocabulary to define terms that are then used to describe concepts and "things" in the world. Typically, these Web Vocabulary documents have prefixes associated with them and contain a number of term declarations. A prefix, like a term, is a short word that expands to a Web Vocabulary base IRI. Prefixes are helpful when a developer wants to mix multiple vocabularies together in a context, but does not want to go to the trouble of defining every single term in every single vocabulary. Some Web Vocabularies may have dozens of terms defined. If a developer wants to use 3-4 different vocabularies, the number of terms that would have to be declared in a single context could become quite large. To reduce the number of different terms that must be defined, JSON-LD also allows prefixes to be used to compact IRIs.

For example, the IRI http://xmlns.com/foaf/0.1/ specifies a Web Vocabulary which may be represented using the foaf prefix. The foaf Web Vocabulary contains a term called name. If you join the foaf prefix with the name suffix, you can build a compact IRI that will expand out into an absolute IRI for the http://xmlns.com/foaf/0.1/name vocabulary term. That is, the compact IRI, or short-form, is foaf:name and the expanded-form is http://xmlns.com/foaf/0.1/name. This vocabulary term is used to specify a person's name.

Developers, and machines, are able to use this IRI (plugging it directly into a web browser, for instance) to go to the term and get a definition of what the term means. Much like we can use WordNet today to see the definition of words in the English language. Developers and machines need the same sort of definition of terms. IRIs provide a way to ensure that these terms are unambiguous.

The context provides a collection of vocabulary terms and prefixes that can be used to expand JSON keys and values into IRIs.

2.5 From JSON to JSON-LD

If a set of terms such as, name, homepage, and avatar, are defined in a context, and that context is used to resolve the names in JSON objects, machines are able to automatically expand the terms to something meaningful and unambiguous, like this:

{
  "http://xmlns.com/foaf/0.1/name": "Manu Sporny",
  "http://xmlns.com/foaf/0.1/homepage": "http://manu.sporny.org"
  "http://rdfs.org/sioc/ns#avatar": "http://twitter.com/account/profile_image/manusporny"
}

Doing this allows JSON to be unambiguously machine-readable without requiring developers to drastically change their workflow.

Please note that this JSON-LD document doesn't define the subject and will thus result in an unlabeled or blank node.

3. Basic Concepts

JSON-LD is designed to ensure that Linked Data concepts can be marked up in a way that is simple to understand and author by Web developers. In many cases, regular JSON markup can become Linked Data with the simple addition of a context. As more JSON-LD features are used, more semantics are added to the JSON markup.

3.1 IRIs

Expressing IRIs are fundamental to Linked Data as that is how most subjects and many object are named. IRIs can be expressed in a variety of different ways in JSON-LD.

  1. In general, terms in the key position in a JSON object that have a mapping to an IRI or another key in the context are expanded to an IRI by JSON-LD processors. There are special rules for processing keys in @context and when dealing with keys that start with the @subject character.
  2. An IRI is generated for the value specified using @subject, if it is a string.
  3. An IRI is generated for the value specified using @type.
  4. An IRI is generated for the value specified using the @iri keyword.
  5. An IRI is generated when there are @coerce rules in effect for a key named @iri.

IRIs can be expressed directly in the key position like so:

{
...
  "http://xmlns.com/foaf/0.1/name": "Manu Sporny",
...
}

In the example above, the key http://xmlns.com/foaf/0.1/name is interpreted as an IRI, as opposed to being interpreted as a string.

Term expansion occurs for IRIs if a term is defined within the active context:

{
  "@context": {"name": "http://xmlns.com/foaf/0.1/name"},
...
  "name": "Manu Sporny",
...
}

Prefixes are expanded when used in keys:

{
  "@context": {"foaf": "http://xmlns.com/foaf/0.1/"},
...
  "foaf:name": "Manu Sporny",
...
}

foaf:name above will automatically expand out to the IRI http://xmlns.com/foaf/0.1/name.

An IRI is generated when a value is associated with a key using the @iri keyword:

{
...
  "homepage": { "@iri": "http://manu.sporny.org" }
...
}

If type coercion rules are specified in the @context for a particular vocabulary term, an IRI is generated:

{
  "@context":
  {
    ...
    "@coerce":
    {
      "@iri": "homepage"
    }
  }
...
  "homepage": "http://manu.sporny.org/",
...
}

Even though the value http://manu.sporny.org/ is a string, the type coercion rules will transform the value into an IRI when processed by a JSON-LD Processor

3.2 Identifying the Subject

To be able to externally reference nodes, it is important that each node has an unambiguous identifier. IRIs are a fundamental concept of Linked Data, and nodes should have a de-referencable identifier used to name and locate them. For nodes to be truely linked, de-referencing the identifier should result in a representation of that node. Associating an IRI with a node tells an application that the returned document contains a description of the node requested.

JSON-LD documents may also contain descriptions of other nodes, so it is necessary to be able to uniquely identify each node which may be externally referenced.

A subject of an object in JSON is declared using the @subject key. The subject is the first piece of information needed by the JSON-LD processor in order to create the (subject, property, object) tuple, also known as a triple.

{
...
  "@subject": "http://example.org/people#joebob",
...
}

The example above would set the subject to the IRI http://example.org/people#joebob.

3.3 Specifying the Type

The type of a particular subject can be specified using the @type key. Specifying the type in this way will generate a triple of the form (subject, type, type-iri).

To be Linked Data, types must be uniquely identified by an IRI.

{
...
  "@subject": "http://example.org/people#joebob",
  "@type": "http://xmlns.com/foaf/0.1/Person",
...
}

The example above would generate the following triple if the JSON-LD document is mapped to RDF (in N-Triples notation):

<http://example.org/people#joebob>
   <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
      <http://xmlns.com/foaf/0.1/Person> .

3.4 Strings

Regular text strings, also referred to as plain literals, are easily expressed using regular JSON strings.

{
...
  "name": "Mark Birbeck",
...
}

3.5 String Internationalization

JSON-LD makes an assumption that strings with associated language encoding information are not very common when used in JavaScript and Web Services. Thus, it takes a little more effort to express strings with associated language information.

{
...
  "name": 
  {
    "@literal": "花澄",
    "@language": "ja"
  }
...
}

The example above would generate a plain literal for 花澄 and associate the ja language code with the triple that is generated. Languages must be expressed in [BCP47] format.

3.6 Datatypes

A value with an associated datatype, also known as a typed literal, is indicated by associating a literal with an IRI which indicates the literal's datatype. Typed literals may be expressed in JSON-LD in three ways:

  1. By utilizing the @coerce keyword.
  2. By utilizing the expanded form for specifying objects.
  3. By using a native JSON datatype.

The first example uses the @coerce keyword to express a typed literal:

{
  "@context":
  {
    "modified":  "http://purl.org/dc/terms/modified",
    "dateTime": "http://www.w3.org/2001/XMLSchema#dateTime"
    "@coerce":
    {
      "dateTime": "modified"
    }
  }
...
  "modified": "2010-05-29T14:17:39+02:00",
...
}

The second example uses the expanded form for specifying objects:

{
...
  "modified": 
  {
    "@literal": "2010-05-29T14:17:39+02:00",
    "@datatype": "dateTime"
  }
...
}

Both examples above would generate an object with the literal value of 2010-05-29T14:17:39+02:00 and the datatype of http://www.w3.org/2001/XMLSchema#dateTime.

The third example uses a built-in native JSON type, a number, to express a datatype:

{
...
  "@subject": "http://example.org/people#joebob",
  "age": 31
...
}

The example above would generate the following triple:

<http://example.org/people#joebob>
   <http://xmlns.com/foaf/0.1/age>
      "31"^^<http://www.w3.org/2001/XMLSchema#integer> .

3.7 Multiple Objects for a Single Property

A JSON-LD author can express multiple triples in a compact way by using arrays. If a subject has multiple values for the same property, the author may express each property as an array.

In JSON-LD, multiple objects on a property are not ordered. This is because typically graphs are not inherently ordered data structures. To see more on creating ordered collections in JSON-LD, see Lists.

{
...
  "@subject": "http://example.org/people#joebob",
  "nick": ["joe", "bob", "jaybee"],
...
}

The markup shown above would generate the following triples:

<http://example.org/people#joebob>
   <http://xmlns.com/foaf/0.1/nick>
      "joe" .
<http://example.org/people#joebob>
   <http://xmlns.com/foaf/0.1/nick>
      "bob" .
<http://example.org/people#joebob>
   <http://xmlns.com/foaf/0.1/nick>
      "jaybee" .

3.8 Multiple Typed Literals for a Single Property

Multiple typed literals may also be expressed using the expanded form for objects:

{
...
  "@subject": "http://example.org/articles/8",
  "modified": 
  [
    {
      "@literal": "2010-05-29T14:17:39+02:00",
      "@datatype": "dateTime"
    },
    {
      "@literal": "2010-05-30T09:21:28-04:00",
      "@datatype": "dateTime"
    }
  ]
...
}

The markup shown above would generate the following triples:

<http://example.org/articles/8>
   <http://purl.org/dc/terms/modified>
      "2010-05-29T14:17:39+02:00"^^http://www.w3.org/2001/XMLSchema#dateTime .
<http://example.org/articles/8>
   <http://purl.org/dc/terms/modified>
      "2010-05-30T09:21:28-04:00"^^http://www.w3.org/2001/XMLSchema#dateTime .

3.9 Expansion

Expansion is the process of taking a JSON-LD document and applying a context such that all IRI, datatypes, and literal values are expanded so that the context is no longer necessary. JSON-LD document expansion is typically used as a part of Framing or Normalization.

For example, assume the following JSON-LD input document:

{
   "@context":
   {
      "name": "http://xmlns.com/foaf/0.1/name",
      "homepage": "http://xmlns.com/foaf/0.1/homepage",
      "@coerce":
      {
         "@iri": "homepage"
      }
   },
   "name": "Manu Sporny",
   "homepage": "http://manu.sporny.org/"
}

Running the JSON-LD Expansion algorithm against the JSON-LD input document provided above would result in the following output:

{
   "http://xmlns.com/foaf/0.1/name": "Manu Sporny",
   "http://xmlns.com/foaf/0.1/homepage":
   {
      "@iri": "http://manu.sporny.org/"
   }
}

3.10 Compaction

Compaction is the process of taking a JSON-LD document and applying a context such that the most compact form of the document is generated. JSON is typically expressed in a very compact, key-value format. That is, full IRIs are rarely used as keys. At times, a JSON-LD document may be received that is not in its most compact form. JSON-LD, via the API, provides a way to compact a JSON-LD document.

For example, assume the following JSON-LD input document:

{
   "http://xmlns.com/foaf/0.1/name": "Manu Sporny",
   "http://xmlns.com/foaf/0.1/homepage":
   {
      "@iri": "http://manu.sporny.org/"
   }
}

Additionally, assume the following developer-supplied JSON-LD context:

{
   "name": "http://xmlns.com/foaf/0.1/name",
   "homepage": "http://xmlns.com/foaf/0.1/homepage",
   "@coerce":
   {
      "@iri": "homepage"
   }
}

Running the JSON-LD Compaction algorithm given the context supplied above against the JSON-LD input document provided above would result in the following output:

{
   "@context":
   {
      "name": "http://xmlns.com/foaf/0.1/name",
      "homepage": "http://xmlns.com/foaf/0.1/homepage",
      "@coerce":
      {
         "@iri": "homepage"
      }
   },
   "name": "Manu Sporny",
   "homepage": "http://manu.sporny.org/"
}

The compaction algorithm also enables the developer to map any expanded format into an application-specific compacted format. While the context provided above mapped http://xmlns.com/foaf/0.1/name to name, it could have also mapped it to any arbitrary string provided by the developer.

3.11 Framing

A JSON-LD document is a representation of a directed graph. A single directed graph can have many different serializations, each expressing exactly the same information. Developers typically work with trees, represented as JSON objects. While mapping a graph to a tree can be done, the layout of the end result must be specified in advance. A Frame can be used by a developer on a JSON-LD document to specify a deterministic layout for a graph.

Framing is the process of taking a JSON-LD document, which expresses a graph of information, and applying a specific graph layout (called a Frame).

The JSON-LD document below expresses a library, a book and a chapter:

{
  "@context": {
    "Book":         "http://example.org/vocab#Book",
    "Chapter":      "http://example.org/vocab#Chapter",
    "contains":     "http://example.org/vocab#contains",
    "creator":      "http://purl.org/dc/terms/creator"
    "description":  "http://purl.org/dc/terms/description"
    "Library":      "http://example.org/vocab#Library",
    "title":        "http://purl.org/dc/terms/title",
    "@coerce":
    {
      "@iri": "contains"
    },
  },
  "@subject":
  [{
    "@subject": "http://example.com/library",
    "@type": "Library",
    "contains": "http://example.org/library/the-republic"
  },
  {
    "@subject": "http://example.org/library/the-republic",
    "@type": "Book",
    "creator": "Plato",
    "title": "The Republic",
    "contains": "http://example.org/library/the-republic#introduction"
  },
  {
    "@subject": "http://example.org/library/the-republic#introduction",
    "@type": "Chapter",
    "description": "An introductory chapter on The Republic.",
    "title": "The Introduction"
  }]
}

Developers typically like to operate on items in a hierarchical, tree-based fashion. Ideally, a developer would want the data above sorted into top-level libraries, then the books that are contained in each library, and then the chapters contained in each book. To achieve that layout, the developer can define the following frame:

{
  "@context": {
    "Book":         "http://example.org/vocab#Book",
    "Chapter":      "http://example.org/vocab#Chapter",
    "contains":     "http://example.org/vocab#contains",
    "creator":      "http://purl.org/dc/terms/creator"
    "description":  "http://purl.org/dc/terms/description"
    "Library":      "http://example.org/vocab#Library",
    "title":        "http://purl.org/dc/terms/title"
  },
  "@type": "Library",
  "contains": {
    "@type": "Book",
    "contains": {
      "@type": "Chapter"
    }
  }
}

When the framing algorithm is run against the previously defined JSON-LD document, paired with the frame above, the following JSON-LD document is the end result:

{
  "@context": {
    "Book":         "http://example.org/vocab#Book",
    "Chapter":      "http://example.org/vocab#Chapter",
    "contains":     "http://example.org/vocab#contains",
    "creator":      "http://purl.org/dc/terms/creator"
    "description":  "http://purl.org/dc/terms/description"
    "Library":      "http://example.org/vocab#Library",
    "title":        "http://purl.org/dc/terms/title"
  },
  "@subject": "http://example.org/library",
  "@type": "Library",
  "contains": {
    "@subject": "http://example.org/library/the-republic",
    "@type": "Book",
    "creator": "Plato",
    "title": "The Republic",
    "contains": {
      "@subject": "http://example.org/library/the-republic#introduction",
      "@type": "Chapter",
      "description": "An introductory chapter on The Republic.",
      "title": "The Introduction"
    },
  },
}

The JSON-LD framing algorithm allows developers to query by example and force a specific tree layout to a JSON-LD document.

4. Advanced Concepts

JSON-LD has a number of features that provide functionality above and beyond the core functionality described above. The following sections outline the features that are specific to JSON-LD.

4.1 Vocabulary Prefixes

Vocabulary terms in Linked Data documents may draw from a number of different Web vocabularies. At times, declaring every single term that a document uses can require the developer to declare tens, if not hundreds of potential vocabulary terms that may be used across an application. This is a concern for at least three reasons; the first is the cognitive load on the developer, the second is the serialized size of the context, the third is future-proofing application contexts. In order to address these issues, the concept of a prefix mechanism is introduced.

A prefix is a compact way of expressing a base IRI to a Web Vocabulary. Generally, these prefixes are used by concatenating the prefix and a term separated by a colon (:). The prefix is a short string that identifies a particular Web vocabulary. For example, the prefix foaf may be used as a short hand for the Friend-of-a-Friend Web Vocabulary, which is identified using the IRI http://xmlns.com/foaf/0.1/. A developer may append any of the FOAF Vocabulary terms to the end of the prefix to specify a short-hand version of the full IRI for the vocabulary term. For example, foaf:name would be expanded out to the IRI http://xmlns.com/foaf/0.1/name. Instead of having to remember and type out the entire IRI, the developer can instead use the prefix in their JSON-LD markup.

The ability to use prefixes reduces the need for developers to declare every vocabulary term that they intend to use in the JSON-LD context. This reduces document serialization size because every vocabulary term need not be declared in the context. Prefix also reduce the cognitive load on the developer. It is far easier to remember foaf:name than it is to remember http://xmlns.com/foaf/0.1/name. The use of prefixes also ensures that a context document does not have to be updated in lock-step with an externally defined Web Vocabulary. Without prefixes, a developer would need to keep their application context terms in lock-step with an externally defined Web Vocabulary. Rather, by just declaring the Web Vocabulary prefix, one can use new terms as they're declared without having to update the application's JSON-LD context.

Consider the following example:

{
  "@context": {
    "dc": "http://purl.org/dc/elements/1.1/",
    "ex": "http://example.org/vocab#"
  },
  "@subject": "http://example.org/library",
  "@type": "ex:Library",
  "ex:contains": {
    "@subject": "http://example.org/library/the-republic",
    "@type": "ex:Book",
    "dc:creator": "Plato",
    "dc:title": "The Republic",
    "ex:contains": {
      "@subject": "http://example.org/library/the-republic#introduction",
      "@type": "ex:Chapter",
      "dc:description": "An introductory chapter on The Republic.",
      "dc:title": "The Introduction"
    },
  },
}

In this example, two different vocabularies are referred to using prefixes. Those prefixes are then used as type and property values using the prefix:term notation.

Prefixes, also known as CURIEs, are defined more formally in RDFa Core 1.1, Section 6 "CURIE Syntax Definition" [RDFA-CORE]. JSON-LD does not support the square-bracketed CURIE syntax as the mechanism is not required to disambiguate IRIs in a JSON-LD document like it is in HTML documents.

4.2 Automatic Typing

Since JSON is capable of expressing typed information such as doubles, integers, and boolean values. As demonstrated below, JSON-LD utilizes that information to create typed literals:

{
...
  // The following two values are automatically converted to a type of xsd:double
  // and both values are equivalent to each other.
  "measure:cups": 5.3,
  "measure:cups": 5.3e0,
  // The following value is automatically converted to a type of xsd:double as well
  "space:astronomicUnits": 6.5e73,
  // The following value should never be converted to a language-native type
  "measure:stones": { "@literal": "4.8", "@datatype": "xsd:decimal" },
  // This value is automatically converted to having a type of xsd:integer
  "chem:protons": 12,
  // This value is automatically converted to having a type of xsd:boolean
  "sensor:active": true,
...
}

When dealing with a number of modern programming languages, including JavaScript ECMA-262, there is no distinction between xsd:decimal and xsd:double values. That is, the number 5.3 and the number 5.3e0 are treated as if they were the same. When converting from JSON-LD to a language-native format and back, datatype information is lost in a number of these languages. Thus, one could say that 5.3 is a xsd:decimal and 5.3e0 is an xsd:double in JSON-LD, but when both values are converted to a language-native format the datatype difference between the two is lost because the machine-level representation will almost always be a double. Implementers should be aware of this potential round-tripping issue between xsd:decimal and xsd:double. Specifically objects with a datatype of xsd:decimal must not be converted to a language native type.

4.3 Type Coercion

JSON-LD supports the coercion of values to particular data types. Type coercion allows someone deploying JSON-LD to coerce the incoming or outgoing types to the proper data type based on a mapping of data type IRIs to property types. Using type coercion, one may convert simple JSON data to properly typed RDF data.

The example below demonstrates how a JSON-LD author can coerce values to plain literals, typed literals and IRIs.

{
  "@context":
  {
     "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
     "xsd": "http://www.w3.org/2001/XMLSchema#",
     "name": "http://xmlns.com/foaf/0.1/name",
     "age": "http://xmlns.com/foaf/0.1/age",
     "homepage": "http://xmlns.com/foaf/0.1/homepage",
     "@coerce":
     {
        "xsd:integer": "age",
        "@iri": "homepage"
     }
  },
  "name": "John Smith",
  "age": "41",
  "homepage": "http://example.org/home/"
}

The example above would generate the following triples:

_:bnode1
   <http://xmlns.com/foaf/0.1/name>
      "John Smith" .
_:bnode1
   <http://xmlns.com/foaf/0.1/age>
      "41"^^http://www.w3.org/2001/XMLSchema#integer .
_:bnode1
   <http://xmlns.com/foaf/0.1/homepage>
      <http://example.org/home/> .

4.4 Chaining

Object chaining is a JSON-LD feature that allows an author to use the definition of JSON-LD objects as property values. This is a commonly used mechanism for creating a parent-child relationship between two subjects.

The example shows an two subjects related by a property from the first subject:

{
...
  "name": "Manu Sporny",
  "knows": {
    "@type": "Person",
    "name": "Gregg Kellogg",
  }
...
}

An object definition, like the one used above, may be used as a JSON value at any point in JSON-LD.

4.5 Identifying Unlabeled Nodes

At times, it becomes necessary to be able to express information without being able to specify the subject. Typically, this type of node is called an unlabeled node or a blank node. In JSON-LD, unlabeled node identifiers are automatically created if a subject is not specified using the @subject keyword. However, authors may provide identifiers for unlabeled nodes by using the special _ (underscore) prefix. This allows to reference the node locally within the document but not in an external document.

{
...
  "@subject": "_:foo",
...
}

The example above would set the subject to _:foo, which can then be used later on in the JSON-LD markup to refer back to the unlabeled node. This practice, however, is usually frowned upon when generating Linked Data. If a developer finds that they refer to the unlabeled node more than once, they should consider naming the node using a resolve-able IRI.

4.6 Aliasing Keywords

JSON-LD allows all of the syntax keywords, except for @context, to be aliased. This feature allows more legacy JSON content to be supported by JSON-LD. It also allows developers to design domain-specific implementations using only the JSON-LD context.

{
  "@context":
  {
     "url": "@subject",
     "a": "@type",
     "name": "http://schema.org/name"
  },
  "url": "http://example.com/about#gregg",
  "a": "http://schema.org/Person",
  "name": "Gregg Kellogg"
}

In the example above, the @subject and @type keywords have been given the aliases url and a, respectively.

4.7 Normalization

Normalization is the process of taking JSON-LD input and performing a deterministic transformation on that input that results in a JSON-LD output that any conforming JSON-LD processor would have generated given the same input. The problem is a fairly difficult technical problem to solve because it requires a directed graph to be ordered into a set of nodes and edges in a deterministic way. This is easy to do when all of the nodes have unique names, but very difficult to do when some of the nodes are not labeled.

Normalization is useful when comparing two graphs against one another, when generating a detailed list of differences between two graphs, and when generating a cryptographic digital signature for information contained in a graph or when generating a hash of the information contained in a graph.

The example below is an un-normalized JSON-LD document:

{
   "@context":
   {
      "name": "http://xmlns.com/foaf/0.1/name",
      "homepage": "http://xmlns.com/foaf/0.1/homepage",
      "xsd": "http://www.w3.org/2001/XMLSchema#",
      "@coerce":
      {
         "@iri": ["homepage"]
      }
   },
   "name": "Manu Sporny",
   "homepage": "http://manu.sporny.org/"
}

The example below is the normalized form of the JSON-LD document above:

Whitespace is used below to aid readability. The normalization algorithm for JSON-LD removes all unnecessary whitespace in the fully normalized form.

[{
    "@subject":
    {
        "@iri": "_:c14n0"
    },
    "http://xmlns.com/foaf/0.1/homepage":
    {
        "@iri": "http://manu.sporny.org/"
    },
    "http://xmlns.com/foaf/0.1/name": "Manu Sporny"
}]

Notice how all of the terms have been expanded and sorted in alphabetical order. Also, notice how the subject has been labeled with a blank node identifier. Normalization ensures that any arbitrary graph containing exactly the same information would be normalized to exactly the same form shown above.

5. The Application Programming Interface

This API provides a clean mechanism that enables developers to convert JSON-LD data into a a variety of output formats that are easier to work with in various programming languages. If a JSON-LD API is provided in a programming environment, the entirety of the following API must be implemented.

5.1 JsonLdProcessor

[NoInterfaceObject]
interface JsonLdProcessor {
    object expand (object input, optional object? context) raises (InvalidContext);
    object compact (object input, optional object? context) raises (InvalidContext, ProcessingError);
    object frame (object input, object frame, object options) raises (InvalidFrame);
    object normalize (object input, optional object? context) raises (InvalidContext);
    object triples (object input, JsonLdTripleCallback tripleCallback, optional object? context) raises (InvalidContext);
};

5.1.1 Methods

compact
Compacts the given input according to the steps in the Compaction Algorithm. The input must be copied, compacted and returned if there are no errors. If the compaction fails, an appropirate exception must be thrown.
ParameterTypeNullableOptionalDescription
inputobjectThe JSON-LD object to perform compaction on.
contextobjectThe base context to use when compacting the input.
ExceptionDescription
InvalidContext
INVALID_SYNTAXA general syntax error was detected in the @context. For example, if a @coerce key maps to anything other than a string or an array of strings, this exception would be raised.
MULTIPLE_DATATYPESThere is more than one target datatype specified for a single property in the list of coercion rules. This means that the processor does not know what the developer intended for the target datatype for a property.
ProcessingError
LOSSY_COMPACTIONThe compaction would lead to a loss of information, such as a @language value.
CONFLICTING_DATATYPESThe target datatype specified in the coercion rule and the datatype for the typed literal do not match.
Return type: object
expand
Expands the given input according to the steps in the Expansion Algorithm. The input must be copied, expanded and returned if there are no errors. If the expansion fails, an appropriate exception must be thrown.
ParameterTypeNullableOptionalDescription
inputobjectThe JSON-LD object to copy and perform the expansion upon.
contextobjectAn external context to use additionally to the context embedded in input when expanding the input.
ExceptionDescription
InvalidContext
INVALID_SYNTAXA general syntax error was detected in the @context. For example, if a @coerce key maps to anything other than a string or an array of strings, this exception would be raised.
MULTIPLE_DATATYPESThere is more than one target datatype specified for a single property in the list of coercion rules. This means that the processor does not know what the developer intended for the target datatype for a property.
Return type: object
frame
Frames the given input using the frame according to the steps in the Framing Algorithm. The input is used to build the framed output and is returned if there are no errors. If there are no matches for the frame, null must be returned. Exceptions must be thrown if there are errors.
ParameterTypeNullableOptionalDescription
inputobjectThe JSON-LD object to perform framing on.
frameobjectThe frame to use when re-arranging the data.
optionsobjectA set of options that will affect the framing algorithm.
ExceptionDescription
InvalidFrame
INVALID_SYNTAXA frame must be either an object or an array of objects, if the frame is neither of these types, this exception is thrown.
MULTIPLE_EMBEDSA subject IRI was specified in more than one place in the input frame. More than one embed of a given subject IRI is not allowed, and if requested, must result in this exception.
Return type: object
normalize
Normalizes the given input according to the steps in the Normalization Algorithm. The input must be copied, normalized and returned if there are no errors. If the compaction fails, null must be returned.
ParameterTypeNullableOptionalDescription
inputobjectThe JSON-LD object to perform normalization upon.
contextobjectAn external context to use additionally to the context embedded in input when expanding the input.
ExceptionDescription
InvalidContext
INVALID_SYNTAXA general syntax error was detected in the @context. For example, if a @coerce key maps to anything other than a string or an array of strings, this exception would be raised.
MULTIPLE_DATATYPESThere is more than one target datatype specified for a single property in the list of coercion rules. This means that the processor does not know what the developer intended for the target datatype for a property.
Return type: object
triples
Processes the input according to the RDF Conversion Algorithm, calling the provided tripleCallback for each triple generated.
ParameterTypeNullableOptionalDescription
inputobjectThe JSON-LD object to process when outputting triples.
tripleCallbackJsonLdTripleCallbackA callback that is called whenever a processing error occurs on the given input.
This callback should be aligned with the RDF API.
contextobjectAn external context to use additionally to the context embedded in input when expanding the input.
ExceptionDescription
InvalidContext
INVALID_SYNTAXA general syntax error was detected in the @context. For example, if a @coerce key maps to anything other than a string or an array of strings, this exception would be raised.
MULTIPLE_DATATYPESThere is more than one target datatype specified for a single property in the list of coercion rules. This means that the processor does not know what the developer intended for the target datatype for a property.
Return type: object

5.2 JsonLdTripleCallback

The JsonLdTripleCallback is called whenever the processor generates a triple during the triple() call.

[NoInterfaceObject Callback]
interface JsonLdTripleCallback {
    void triple (DOMString subject, DOMString property, DOMString objectType, DOMString object, DOMString? datatype, DOMString? language);
};

5.2.1 Methods

triple
This callback is invoked whenever a triple is generated by the processor.
ParameterTypeNullableOptionalDescription
subjectDOMStringThe subject IRI that is associated with the triple.
propertyDOMStringThe property IRI that is associated with the triple.
objectTypeDOMStringThe type of object that is associated with the triple. Valid values are IRI and literal.
objectDOMStringThe object value associated with the subject and the property.
datatypeDOMStringThe datatype associated with the object.
languageDOMStringThe language associated with the object in BCP47 format.
No exceptions.
Return type: void

6. Algorithms

All algorithms described in this section are intended to operate on language-native data structures. That is, the serialization to a text-based JSON document isn't required as input or output to any of these algorithms and language-native data structures must be used where applicable.

6.1 Syntax Tokens and Keywords

JSON-LD specifies a number of syntax tokens and keywords that are using in all algorithms described in this section:

@context
Used to set the local context.
@base
Used to set the base IRI for all object IRIs affected by the active context.
@vocab
Used to set the base IRI for all property IRIs affected by the active context.
@coerce
Used to specify type coercion rules.
@literal
Used to specify a literal value.
@iri
Used to specify an IRI value.
@language
Used to specify the language for a literal.
@datatype
Used to specify the datatype for a literal.
:
The separator for JSON keys and values that use the prefix mechanism.
@subject
Sets the active subjects.
@type
Used to set the type of the active subjects.

6.2 Algorithm Terms

initial context
a context that is specified to the algorithm before processing begins.
active subject
the currently active subject that the processor should use when processing.
active property
the currently active property that the processor should use when processing.
active object
the currently active object that the processor should use when processing.
active context
a context that is used to resolve prefixes and terms while the processing algorithm is running. The active context is the context contained within the processor state.
local context
a context that is specified within a JSON object, specified via the @context keyword.
processor state
the processor state, which includes the active context, current subject, and current property. The processor state is managed as a stack with elements from the previous processor state copied into a new processor state when entering a new JSON object.
JSON-LD input
The JSON-LD data structure that is provided as input to the algorithm.
JSON-LD output
The JSON-LD data structure that is produced as output by the algorithm.

6.3 Context

Processing of JSON-LD data structure is managed recursively. During processing, each rule is applied using information provided by the active context. Processing begins by pushing a new processor state onto the processor state stack and initializing the active context with the initial context. If a local context is encountered, information from the local context is merged into the active context.

The active context is used for expanding keys and values of a JSON object (or elements of a list (see List Processing)).

A local context is identified within a JSON object having a key of @context with string or a JSON object value. When processing a local context, special processing rules apply:

  1. Create a new, empty local context.
  2. If the value is a simple string, it must have a lexical form of IRI and used to initialize a new JSON document which replaces the value for subsequent processing.
  3. If the value is a JSON object, perform the following steps:
    1. If the JSON object has a @base key, it must have a value of a simple string with the lexical form of an absolute IRI. Add the base mapping to the local context.

      Turtle allows @base to be relative. If we did this, we would have to add IRI Expansion.

    2. If the JSON object has a @vocab key, it must have a value of a simple string with the lexical form of an absolute IRI. Add the vocabulary mapping to the local context after performing IRI Expansion on the associated value.
    3. If the JSON object has a @coerce key, it must have a value of a JSON object. Add the @coerce mapping to the local context performing IRI Expansion on the associated value(s).
    4. Otherwise, the key must have the lexical form of NCName and must have the value of a simple string with the lexical form of IRI. Merge the key-value pair into the local context.
  4. Merge the of local context's @coerce mapping into the active context's @coerce mapping as described below.
  5. Merge all entries other than the @coerce mapping from the local contextto the active context overwriting any duplicate values.

6.3.1 Coerce

Map each key-value pair in the local context's @coerce mapping into the active context's @coerce mapping, overwriting any duplicate values in the active context's @coerce mapping. The @coerce mapping has either a single prefix:term value, a single term value or an array of prefix:term or term values. When merging with an existing mapping in the active context, map all prefix and term values to array form and replace with the union of the value from the local context and the value of the active context. If the result is an array with a single value, the processor may represent this as a string value.

6.3.2 Initial Context

The initial context is initialized as follows:

  • @base is set using section 5.1 Establishing a Base URI of [RFC3986]. Processors may provide a means of setting the base IRI programatically.
  • @coerce is set with a single mapping from @iri to @type.
{
    "@base": document-location,
    "@context": {
      "@iri": "@type"
    }
}

6.4 IRI Expansion

Keys and some values are evaluated to produce an IRI. This section defines an algorithm for transforming a value representing an IRI into an actual IRI.

IRIs may be represented as an absolute IRI, a term, a prefix:term construct, or as a value relative to @base or @vocab.

The algorithm for generating an IRI is:

  1. Split the value into a prefix and suffix from the first occurrence of ':'.
  2. If the prefix is a '_' (underscore), the IRI is unchanged.
  3. If the active context contains a mapping for prefix, generate an IRI by prepending the mapped prefix to the (possibly empty) suffix using textual concatenation. Note that an empty suffix and no suffix (meaning the value contains no ':' string at all) are treated equivalently.
  4. If the IRI being processed is for a property (i.e., a key's value in a JSON object, or a value in a @coerce mapping) and the active context has a @vocab mapping, join the mapped value to the suffix using textual concatenation.
  5. If the IRI being processed is for a subject or object (i.e., not a property) and the active context has a @base mapping, join the mapped value to the suffix using the method described in [RFC3986].
  6. Otherwise, use the value directly as an IRI.

6.5 IRI Compaction

Some keys and values are expressed using IRIs. This section defines an algorithm for transforming an IRI to a compact IRI using the terms and prefixes specified in the local context.

The algorithm for generating a compacted IRI is:

  1. Search every key-value pair in the active context for a term that is a complete match against the IRI. If a complete match is found, the resulting compacted IRI is the term associated with the IRI in the active context.
  2. If a complete match is not found, search for a partial match from the beginning of the IRI. For all matches that are found, the resulting compacted IRI is the prefix associated with the partially matched IRI in the active context concatenated with a colon (:) character and the unmatched part of the string. If there is more than one compacted IRI produced, the final value is the shortest and lexicographically least value of the entire set of compacted IRIs.

6.6 Value Expansion

Some values in JSON-LD can be expressed in a compact form. These values are required to be expanded at times when processing JSON-LD documents.

The algorithm for expanding a value is:

  1. If the key that is associated with the value has an associated coercion entry in the local context, the resulting expansion is an object populated according to the following steps:
    1. If the coercion target is @iri, expand the value by adding a new key-value pair where the key is @iri and the value is the expanded IRI according to the IRI Expansion rules.
    2. If the coercion target is a typed literal, expand the value by adding two new key-value pairs. The first key-value pair will be @literal and the unexpanded value. The second key-value pair will be @datatype and the associated coercion datatype expanded according to the IRI Expansion rules.

6.7 Value Compaction

Some values, such as IRIs and typed literals, may be expressed in an expanded form in JSON-LD. These values are required to be compacted at times when processing JSON-LD documents.

The algorithm for compacting a value is:

  1. If the local context contains a coercion target for the key that is associated with the value, compact the value using the following steps:
    1. If the coercion target is an @iri, the compacted value is the value associated with the @iri key, processed according to the IRI Compaction steps.
    2. If the coercion target is a typed literal, the compacted value is the value associated with the @literal key.
    3. Otherwise, the value is not modified.

6.8 Expansion

This algorithm is a work in progress, do not implement it.

As stated previously, expansion is the process of taking a JSON-LD input and expanding all IRIs and typed literals to their fully-expanded form. The output will not contain a single context declaration and will have all IRIs and typed literals fully expanded.

6.8.1 Expansion Algorithm

  1. If the top-level item in the JSON-LD input is an array, process each item in the array recursively using this algorithm.
  2. If the top-level item in the JSON-LD input is an object, update the local context according to the steps outlined in the context section. Process each key, expanding the key according to the IRI Expansion rules.
    1. Process each value associated with each key:
      1. If the value is an array, process each item in the array recursively using this algorithm.
      2. If the value is an object, process the object recursively using this algorithm.
      3. Otherwise, check to see the associated key has an associated coercion rule. If the value should be coerced, expand the value according to the Value Expansion rules. If the value does not need to be coerced, leave the value as-is.
    2. Remove the context from the object.

6.9 Compaction

This algorithm is a work in progress, do not implement it.

As stated previously, compaction is the process of taking a JSON-LD input and compacting all IRIs using a given context. The output will contain a single top-level context declaration and will only use terms and prefixes and will ensure that all typed literals are fully compacted.

6.9.1 Compaction Algorithm

  1. Perform the Expansion Algorithm on the JSON-LD input. This removes any existing context to allow the given context to be cleanly applied.
  2. Set the active context to the given context.
  3. If the top-level item is an array, process each item in the array recursively, starting at this step.
  4. If the top-level item is an object, compress each key using the steps defined in IRI Compaction and compress each value using the steps defined in Value Compaction.

6.10 Framing

This algorithm is a work in progress, do not implement it.

A JSON-LD document is a representation of a directed graph. A single directed graph can have many different serializations, each expressing exactly the same information. Developers typically don't work directly with graphs, but rather, prefer trees when dealing with JSON. While mapping a graph to a tree can be done, the layout of the end result must be specified in advance. This section defines an algorithm for mapping a graph to a tree given a frame.

6.10.1 Framing Algorithm Terms

input frame
the initial frame provided to the framing algorithm.
framing context
a context containing the object embed flag, the explicit inclusion flag and the omit default flag.
object embed flag
a flag specifying that objects should be directly embedded in the output, instead of being referred to by their IRI.
explicit inclusion flag
a flag specifying that for properties to be included in the output, they must be explicitly declared in the framing context.
omit missing properties flag
a flag specifying that properties that are missing from the JSON-LD input should be omitted from the output.
match limit
A value specifying the maximum number of matches to accept when building arrays of values during the framing algorithm. A value of -1 specifies that there is no match limit.
map of embedded subjects
A map that tracks if a subject has been embedded in the output of the Framing Algorithm.

6.10.2 Framing Algorithm

The framing algorithm takes JSON-LD input that has been normalized according to the Normalization Algorithm (normalized input), an input frame that has been expanded according to the Expansion Algorithm (expanded frame), and a number of options and produces JSON-LD output. The following series of steps is the recursive portion of the framing algorithm:

  1. Initialize the framing context by setting the object embed flag, clearing the explicit inclusion flag, and clearing the omit missing properties flag. Override these values based on input options provided to the algorithm by the application.
  2. Generate a list of frames by processing the expanded frame:
    1. If the expanded frame is not an array, set match limit to 1, place the expanded frame into the list of frames, and set the JSON-LD output to null.
    2. If the expanded frame is an empty array, place an empty object into the list of frames, set the JSON-LD output to an array, and set match limit to -1.
    3. If the expanded frame is a non-empty array, add each item in the expanded frame into the list of frames, set the JSON-LD output to an array, and set match limit to -1.
  3. Create a match array for each expanded frame in the list of frames halting when either the match limit is zero or the end of the list of frames is reached. If an expanded frame is not an object, the processor must throw a Invalid Frame Format exception. Add each matching item from the normalized input to the matches array and decrement the match limit by 1 if:
    1. The expanded frame has an rdf:type that exists in the item's list of rdf:types. Note: the rdf:type can be an array, but only one value needs to be in common between the item and the expanded frame for a match.
    2. The expanded frame does not have an rdf:type property, but every property in the expanded frame exists in the item.
  4. Process each item in the match array with its associated match frame:
    1. If the match frame contains an @embed keyword, set the object embed flag to its value. If the match frame contains an @explicit keyword, set the explicit inclusion flag to its value. Note: if the keyword exists, but the value is neither true or false, set the associated flag to true.
    2. If the object embed flag is cleared and the item has the @subject property, replace the item with the value of the @subject property.
    3. If the object embed flag is set and the item has the @subject property, and its IRI is in the map of embedded subjects, throw a Duplicate Embed exception.
    4. If the object embed flag is set and the item has the @subject property and its IRI is not in the map of embedded subjects:
      1. If the explicit inclusion flag is set, then delete any key from the item that does not exist in the match frame, except @subject.
      2. For each key in the match frame, except for keywords and rdf:type:
        1. If the key is in the item, then build a new recursion input list using the object or objects associated with the key. If any object contains an @iri value that exists in the normalized input, replace the object in the recusion input list with a new object containing the @subject key where the value is the value of the @iri, and all of the other key-value pairs for that subject. Set the recursion match frame to the value associated with the match frame's key. Replace the value associated with the key by recursively calling this algorithm using recursion input list, recursion match frame as input.
        2. If the key is not in the item, add the key to the item and set the associated value to an empty array if the match frame key's value is an array or null otherwise.
        3. If value associated with the item's key is null, process the omit missing properties flag:
          1. If the value associated with the key in the match frame is an array, use the first frame from the array as the property frame, otherwise set the property frame to an empty object.
          2. If the property frame contains an @omitDefault keyword, set the omit missing properties flag to its value. Note: if the keyword exists, but the value is neither true or false, set the associated flag to true.
          3. If the omit missing properties flag is set, delete the key in the item. Otherwise, if the @default keyword is set in the property frame set the item's value to the value of @default.
    5. If the JSON-LD output is null set it to the item, otherwise, append the item to the JSON-LD output.
  5. Return the JSON-LD output.
The final, non-recursive step of the framing algorithm requires the JSON-LD output to be compacted according to the Compaction Algorithm by using the context provided in the input frame. The resulting value is the final output of the compaction algorithm and is what should be returned to the application.

6.11 Normalization

This algorithm is a work in progress, do not implement it.

Normalization is the process of taking JSON-LD input and performing a deterministic transformation on that input that results in all aspects of the graph being fully expanded and named in the JSON-LD output. The normalized output is generated in such a way that any conforming JSON-LD processor will generate identical output given the same input. The problem is a fairly difficult technical problem to solve because it requires a directed graph to be ordered into a set of nodes and edges in a deterministic way. This is easy to do when all of the nodes have unique names, but very difficult to do when some of the nodes are not labeled.

In time, there may be more than one normalization algorithm that will need to be identified. For identification purposes, this algorithm is named UGNA2011.

6.11.1 Normalization Algorithm Terms

label
The subject IRI associated with a graph node. The subject IRI is expressed using a key-value pair in a JSON object where the key is @subject and the value is a string that is an IRI or a JSON object containing the key @iri and a value that is a string that is an IRI.
list of expanded nodes
A list of all nodes in the JSON-LD input graph containing no embedded objects and having all keys and values expanded according to the steps in the Expansion Algorithm.
alpha and beta values
The words alpha and beta refer to the first and second nodes or values being examined in an algorithm. The names are merely used to refer to each input value to a comparison algorithm.
renaming counter
A counter that is used during the Node Relabeling Algorithm. The counter typically starts at one (1) and counts up for every node that is relabeled. There will be two such renaming counters in an implementation of the normalization algorithm. The first is the labeling counter and the second is the deterministic labeling counter.
serialization label
An identifier that is created to aid in the normalization process in the Deep Comparison Algorithm. The value typically takes the form of s or c.

6.11.2 Normalization State

When performing the steps required by the normalization algorithm, it is helpful to track the many pieces of information in a data structure called the normalization state. Many of these pieces simply provide indexes into the graph. The information contained in the normalization state is described below.

node state
Each node in the graph will be assigned a node state. This state contains the information necessary to deterministically label all nodes in the graph. A node state includes:
node reference
A node reference is a reference to a node in the graph. For a given node state, its node reference refers to the node that the state is for. When a node state is created, its node reference should be to the node it is created for.
outgoing list
Lists the labels for all nodes that are properties of the node reference. This list should be initialized by iterating over every object associated with a property in the node reference adding its label if it is another node.
incoming list
Lists the labels for all nodes in the graph for which the node reference is a property. This list is initialized to an empty list.
outgoing serialization map
Maps node labels to serialization labels. This map is initialized to an empty map. When this map is populated, it will be filled with keys that are the labels of every node in the graph with a label that begins with _: and that has a path, via properties, that starts with the node reference.
outgoing serialization
A string that can be lexicographically compared to the outgoing serializations of other node states. It is a representation of the outgoing serialization map and other related information. This string is initialized to an empty string.
incoming serialization map
Maps node labels to serialization labels. This map is initialized to an empty map. When this map is populated, it will be filled with keys that are the labels of every node in the graph with a label that begins with _: and that has a path, via properties, that ends with the node reference.
incoming serialization
A string that can be lexicographically compared to the outgoing serializations of other node states. It is a representation of the incoming serialization map and other related information. This string is initialized to an empty string.
node state map
A mapping from a node's label to a node state. It is initialized to an empty map.
labeling prefix
The labeling prefix is a string that is used as the beginning of a node label. It should be initialized to a random base string that starts with the characters _:, is not used by any other node's label in the JSON-LD input, and does not start with the characters _:c14n. The prefix has two uses. First it is used to temporarily name nodes during the normalization algorithm in a way that doesn't collide with the names that already exist as well as the names that will be generated by the normalization algorithm. Second, it will eventually be set to _:c14n to generate the final, deterministic labels for nodes in the graph. This prefix will be concatenated with the labeling counter to produce a node label. For example, _:j8r3k is a proper initial value for the labeling prefix.
labeling counter
A counter that is used to label nodes. It is appended to the labeling prefix to create a node label. It is initialized to 1.
map of flattened nodes
A map containing a representation of all nodes in the graph where the key is a node label and the value is a single JSON object that has no nested sub-objects and has had all properties for the same node merged into a single JSON object.

6.11.3 Normalization Algorithm

The normalization algorithm expands the JSON-LD input, flattens the data structure, and creates an initial set of names for all nodes in the graph. The flattened data structure is then processed by a node labeling algorithm in order to get a fully expanded and named list of nodes which is then sorted. The result is a deterministically named and ordered list of graph nodes.

  1. Expand the JSON-LD input according to the steps in the Expansion Algorithm and store the result as the expanded input.
  2. Create a normalization state.
  3. Initialize the map of flattened nodes by recursively processing every expanded node in the expanded input in depth-first order:
    1. If the expanded node is an unlabeled node, add a new key-value pair to the expanded node where the key is @subject and the value is the concatenation of the labeling prefix and the string value of the labeling counter. Increment the labeling counter.
    2. Add the expanded node to the map of flattened nodes:
      1. If the expanded node's label is already in the map of flattened nodes merge all properties from the entry in the map of flattened nodes into the expanded node.
      2. Go through every property associated with an array in the expanded node and remove any duplicate IRI entries from the array. If the resulting array only has one IRI entry, change it from an array to an object.
      3. Set the entry for the expanded node's label in the map of flattened nodes to the expanded node.
    3. After exiting the recursive step, replace the reference to the expanded node with an object containing a single key-value pair where the key is @iri and the value is the value of the @subject key in the node.
  4. For every entry in the map of flattened nodes, insert a key-value pair into the node state map where the key is the key from the map of flattened nodes and the value is a node state where its node reference refers to the value from the map of flattened nodes.
  5. Populate the incoming list for each node state by iterating over every node in the graph and adding its label to the incoming list associated with each node found in its properties.
  6. For every entry in the node state map that has a label that begins with _:c14n, relabel the node using the Node Relabeling Algorithm.
  7. Label all of the nodes that contain a @subject key associated with a value starting with _: according to the steps in the Deterministic Labeling Algorithm.

6.11.4 Node Relabeling Algorithm

This algorithm renames a node by generating a unique new label and updating all references to that label in the node state map. The old label and the normalization state must be given as an input to the algorithm. The old label is the current label of the node that is to be relabeled.

The node relabeling algorithm is as follows:

  1. If the labeling prefix is _:c14n and the old label begins with _:c14n then return as the node has already been renamed.
  2. Generate the new label by concatenating the labeling prefix with the string value of the labeling counter. Increment the labeling counter.
  3. For the node state associated with the old label, update every node in the incoming list by changing all the properties that reference the old label to the new label.
  4. Change the old label key in the node state map to the new label and set the associated node reference's label to the new label.

6.11.5 Deterministic Labeling Algorithm

The deterministic labeling algorithm takes the normalization state and produces a list of finished nodes that is sorted and contains deterministically named and expanded nodes from the graph.

  1. Set the labeling prefix to _:c14n, the labeling counter to 1, the list of finished nodes to an empty array, and create an empty array, the list of unfinished nodes.
  2. For each node reference in the node state map:
    1. If the node's label does not start with _: then put the node reference in the list of finished nodes.
    2. If the node's label does start with _: then put the node reference in the list of unfinished nodes.
  3. Append to the list of finished nodes by processing the remainder of the list of unfinished nodes until it is empty:
    1. Sort the list of unfinished nodes in descending order according to the Deep Comparison Algorithm to determine the sort order.
    2. Create a list of labels and initialize it to an empty array.
    3. For the first node from the list of unfinished nodes:
      1. Add its label to the list of labels.
      2. For each key-value pair from its associated outgoing serialization map, add the key to a list and then sort the list according to the lexicographical order of the keys' associated values. Append the list to the list of nodes to label.
      3. For each key-value pair from its associated incoming serialization map, add the key to a list and then sort the list according to the lexicographical order of the keys' associated values. Append the list to the list of nodes to label.
    4. For each label in the list of labels, relabel the associated node according to the Node Relabeling Algorithm. If any outgoing serialization map contains a key that matches the label, clear the map and set the associated outgoing serialization to an empty string. If any incoming serialization map contains a key that matches the label, clear the map and set the associated incoming serialization to an empty string.
    5. Remove each node with a label that starts with _:c14n from the list of unfinished nodes and add it to the list of finished nodes.
  4. Sort the list of finished nodes in descending order according to the Deep Comparison Algorithm to determine the sort order.

6.11.6 Shallow Comparison Algorithm

The shallow comparison algorithm takes two unlabeled nodes, alpha and beta, as input and determines which one should come first in a sorted list. The following algorithm determines the steps that are executed in order to determine the node that should come first in a list:

  1. Compare the total number of node properties. The node with fewer properties is first.
  2. Lexicographically sort the property IRIs for each node and compare the sorted lists. If an IRI is found to be lexicographically smaller, the node containing that IRI is first.
  3. Compare the values of each property against one another:
    1. The node associated with fewer property values is first.
    2. Create an alpha list by adding all values associated with the alpha property that are not unlabeled nodes.
    3. Create a beta list by adding all values associated with the beta property that is not an unlabeled node.
    4. Compare the length of alpha list and beta list. The node associated with the list containing the fewer number of items is first.
    5. Sort alpha list and beta list according to the Object Comparison Algorithm. For each offset into the alpha list, compare the item at the offset against the item at the same offset in the beta list according to the Object Comparison Algorithm. The node associated with the lesser item is first.
  4. Process the incoming lists associated with each node to determine order:
    1. The node with the shortest incoming list is first.
    2. Sort the incoming lists according to incoming property and then incoming label.
    3. The node associated with the fewest number of incoming nodes is first.
    4. For each offset into the incoming lists, compare the associated properties and labels:
      1. The node associated with a label that does not begin with _: is first.
      2. If the nodes' labels do not begin with _:, then the node associated with the lexicographically lesser label is first.
      3. The node associated with the lexicographically lesser associated property is first.
      4. The node with the label that does not begin with _:c14n is first.
      5. The node with the lexicographically lesser label is first.
  5. Otherwise, the nodes are equivalent.

6.11.7 Object Comparison Algorithm

The object comparison algorithm is designed to compare two graph node property values, alpha and beta, against the other. The algorithm is useful when sorting two lists of graph node properties.

  1. If one of the values is a string and the other is not, the value that is a string is first.
  2. If both values are strings, the lexicographically lesser string is first.
  3. If one of the values is a literal and the other is not, the value that is a literal is first.
  4. If both values are literals:
    1. The lexicographically lesser string associated with @literal is first.
    2. The lexicographically lesser string associated with @datatype is first.
    3. The lexicographically lesser string associated with @language is first.
  5. If both values are expanded IRIs, the lexicographically lesser string associated with @iri is first.
  6. Otherwise, the two values are equivalent.

6.11.8 Deep Comparison Algorithm

The deep comparison algorithm is used to compare the difference between two nodes, alpha and beta. A deep comparison takes the incoming and outgoing node edges in a graph into account if the number of properties and value of those properties are identical. The algorithm is helpful when sorting a list of nodes and will return whichever node should be placed first in a list if the two nodes are not truly equivalent.

When performing the steps required by the deep comparison algorithm, it is helpful to track state information about mappings. The information contained in a mapping state is described below.

mapping state
mapping counter
Keeps track of the number of nodes that have been mapped to serialization labels. It is initialized to 1.
processed labels map
Keeps track of the labels of nodes that have already been assigned serialization labels. It is initialized to an empty map.
serialized labels map
Maps a node label to its associated serialization label. It is initialized to an empty map.
adjacent info map
Maps a serialization label to the node label associated with it, the list of sorted serialization labels for adjacent nodes, and the map of adjacent node serialiation labels to their associated node labels. It is initialized to an empty map.
key stack
A stack where each element contains an array of adjacent serialization labels and an index into that array. It is initialized to a stack containing a single element where its array contains a single string element s1 and its index is set to 0.
serialized keys
Keeps track of which serialization labels have already been written at least once to the serialization string. It is initialized to an empty map.
serialization string
A string that is incrementally updated as a serialization is built. It is initialized to an empty string.

The deep comparison algorithm is as follows:

  1. Perform a comparison between alpha and beta according to the Shallow Comparison Algorithm. If the result does not show that the two nodes are equivalent, return the result.
  2. Compare incoming and outgoing edges for each node, updating their associated node state as each node is processed:
    1. If the outgoing serialization map for alpha is empty, generate the serialization according to the Node Serialization Algorithm. Provide alpha's node state, a new mapping state, outgoing direction to the algorithm as inputs.
    2. If the outgoing serialization map for beta is empty, generate the serialization according to the Node Serialization Algorithm. Provide beta's node state, a new mapping state, and outgoing direction to the algorithm as inputs.
    3. If alpha's outgoing serialization is lexicographically less than beta's, then alpha is first. If it is greater, then beta is first.
    4. If the incoming serialization map for alpha is empty, generate the serialization according to the Node Serialization Algorithm. Provide alpha's node state, a new mapping state with its serialized labels map set to a copy of alpha's outgoing serialization map, and incoming direction to the algorithm as inputs.
    5. If the incoming serialization map for beta is empty, generate the serialization according to the Node Serialization Algorithm. Provide beta's node state, a new mapping state with its serialized labels map set to a copy of beta's outgoing serialization map, and incoming direction to the algorithm as inputs.
    6. If alpha's incoming serialization is lexicographically less than beta's, then alpha is first. If it is greater, then beta is first.

6.11.9 Node Serialization Algorithm

The node serialization algorithm takes a node state, a mapping state, and a direction (either outgoing direction or incoming direction) as inputs and generates a deterministic serialization for the node reference.

  1. If the label exists in the processed labels map, terminate the algorithm as the serialization label has already been created.
  2. Set the value associated with the label in the processed labels map to true.
  3. Generate the next serialization label for the label according to the Serialization Label Generation Algorithm.
  4. Create an empty map called the adjacent serialized labels map that will store mappings from serialized labels to adjacent node labels.
  5. Create an empty array called the adjacent unserialized labels list that will store labels of adjacent nodes that haven't been assigned serialization labels yet.
  6. For every label in a list, where the list the outgoing list if the direction is outgoing direction and the incoming list otherwise, if the label starts with _:, it is the target node label:
    1. Look up the target node label in the processed labels map and if a mapping exists, update the adjacent serialized labels map where the key is the value in the serialization map and the value is the target node label.
    2. Otherwise, add the target node label to the adjacent unserialized labels list.
  7. Set the maximum serialization combinations to 1 or the length of the adjacent unserialized labels list, whichever is greater.
  8. While the maximum serialization combinations is greater than 0, perform the Combinatorial Serialization Algorithm passing the node state, the mapping state for the first iteration and a copy of it for each subsequent iteration, the generated serialization label, the direction, the adjacent serialized labels map, and the adjacent unserialized labels list. Decrement the maximum serialization combinations by 1 for each iteration.

6.11.10 Serialization Label Generation Algorithm

The algorithm generates a serialization label given a label and a mapping state and returns the serialization label.

  1. If the label is already in the serialization labels map, return its associated value.
  2. If the label starts with the string _:c14n, the serialization label is the letter c followed by the number that follows _:c14n in the label.
  3. Otherwise, the serialization label is the letter s followed by the string value of mapping count. Increment the mapping count by 1.
  4. Create a new key-value pair in the serialization labels map where the key is the label and the value is the generated serialization label.

6.11.11 Combinatorial Serialization Algorithm

The combinatorial serialization algorithm takes a node state, a mapping state, a serialization label, a direction, a adjacent serialized labels map, and a adjacent unserialized labels list as inputs and generates the lexicographically least serialization of nodes relating to the node reference.

  1. If the adjacent unserialized labels list is not empty:
    1. Copy the adjacent serialized labels map to the adjacent serialized labels map copy.
    2. Remove the first unserialized label from the adjacent unserialized labels list and create a new new serialization label according to the Serialization Label Generation Algorithm.
    3. Create a new key-value mapping in the adjacent serialized labels map copy where the key is the new serialization label and the value is the unserialized label.
    4. Set the maximum serialization rotations to 1 or the length of the adjacent unserialized labels list, whichever is greater.
    5. While the maximum serialization rotations is greater than 0:
      1. Recursively perform the Combinatorial Serialization Algorithm passing the mapping state for the first iteration of the loop, and a copy of it for each subsequent iteration.
      2. Rotate the elements in the adjacent unserialized labels list by shifting each of them once to the right, moving the element at the end of the list to the beginning of the list.
      3. Decrement the maximum serialization rotations by 1 for each iteration.
  2. If the adjacent unserialized labels list is empty:
    1. Create a list of keys from the keys in the adjacent serialized labels map and sort it lexicographically.
    2. Add a key-value pair to the adjacent info map where the key is the serialization label and the value is an object containing the node reference's label, the list of keys and the adjacent serialized labels map.
    3. Update the serialization string according to the Mapping Serialization Algorithm.
    4. If the direction is outgoing direction then directed serialization refers to the outgoing serialization and the directed serialization map refers to the outgoing serialization map, otherwise it refers to the incoming serialization and the directed serialization map refers to the incoming serialization map. Compare the serialization string to the directed serialization according to the Serialization Comparison Algorithm. If the serialization string is less than or equal to the directed serialization:
      1. For each value in the list of keys, run the Node Serialization Algorithm.
      2. Update the serialization string according to the Mapping Serialization Algorithm.
      3. Compare the serialization string to the directed serialization again and if it is less than or equal and the length of the serialization string is greater than or equal to the length of the directed serialization, then set the directed serialization to the serialization string and set the directed serialization map to the serialized labels map.

6.11.12 Serialization Comparison Algorithm

The serialization comparison algorithm takes two serializations, alpha and beta and returns either which of the two is less than the other or that they are equal.

  1. Whichever serialization is an empty string is greater. If they are both empty strings, they are equal.
  2. Return the result of a lexicographical comparison of alpha and beta up to the number of characters in the shortest of the two serializations.

6.11.13 Mapping Serialization Algorithm

The mapping serialization algorithm incrementally updates the serialization string in a mapping state.

  1. If the key stack is not empty:
    1. Pop the serialization key info off of the key stack.
    2. For each serialization key in the serialization key info array, starting at the serialization key index from the serialization key info:
      1. If the serialization key is not in the adjacent info map, push the serialization key info onto the key stack and exit from this loop.
      2. If the serialization key is a key in serialized keys, a cycle has been detected. Append the concatenation of the _ character and the serialization key to the serialization string.
      3. Otherwise, serialize all outgoing and incoming edges in the related node by performing the following steps:
        1. Mark the serialization key as having been processed by adding a new key-value pair to serialized keys where the key is the serialization key and the value is true.
        2. Set the serialization fragment to the value of the serialization key.
        3. Set the adjacent info to the value of the serialization key in the adjacent info map.
        4. Set the adjacent node label to the node label from the adjacent info.
        5. If a mapping for the adjacent node label exists in the map of all labels:
          1. Append the result of the Label Serialization Algorithm to the serialization fragment.
        6. Append all of the keys in the adjacent info to the serialization fragment.
        7. Append the serialization fragment to the serialization string.
        8. Push a new key info object containing the keys from the adjacent info and an index of 0 onto the key stack.
        9. Recursively update the serialization string according to the Mapping Serialization Algorithm.

6.11.14 Label Serialization Algorithm

The label serialization algorithm serializes information about a node that has been assigned a particular serialization label.

  1. Initialize the label serialization to an empty string.
  2. Append the [ character to the label serialization.
  3. Append all properties to the label serialization by processing each key-value pair in the node reference, excluding the @subject property. The keys should be processed in lexicographical order and their associated values should be processed in the order produced by the Object Comparison Algorithm:
    1. Build a string using the pattern <KEY> where KEY is the current key. Append string to the label serialization.
    2. The value may be a single object or an array of objects. Process all of the objects that are associated with the key, building an object string for each item:
      1. If the object contains an @iri key with a value that starts with _:, set the object string to the value _:. If the value does not start with _:, build the object string using the pattern <IRI> where IRI is the value associated with the @iri key.
      2. If the object contains a @literal key and a @datatype key, build the object string using the pattern "LITERAL"^^<DATATYPE> where LITERAL is the value associated with the @literal key and DATATYPE is the value associated with the @datatype key.
      3. If the object contains a @literal key and a @language key, build the object string using the pattern "LITERAL"@LANGUAGE where LITERAL is the value associated with the @literal key and LANGUAGE is the value associated with the @language key.
      4. Otherwise, the value is a string. Build the object string using the pattern "LITERAL" where LITERAL is the value associated with the current key.
      5. If this is the second iteration of the loop, append a | separator character to the label serialization.
      6. Append the object string to the label serialization.
  4. Append the ] character to the label serialization.
  5. Append the [ character to the label serialization.
  6. Append all incoming references for the current label to the label serialization by processing all of the items associated with the incoming list:
    1. Build a reference string using the pattern <PROPERTY><REFERER> where PROPERTY is the property associated with the incoming reference and REFERER is either the subject of the node referring to the label in the incoming reference or _: if REFERER begins with _:.
    2. If this is the second iteration of the loop, append a | separator character to the label serialization.
    3. Append the reference string to the label serialization.
  7. Append the ] character to the label serialization.
  8. Append all adjacent node labels to the label serialization by concatenating the string value for all of them, one after the other, to the label serialization.
  9. Push the adjacent node labels onto the key stack and append the result of the Mapping Serialization Algorithm to the label serialization.

6.12 Data Round Tripping

When normalizing xsd:double values, implementers must ensure that the normalized value is a string. In order to generate the string from a double value, output equivalent to the printf("%1.6e", value) function in C must be used where "%1.6e" is the string formatter and value is the value to be converted.

To convert the a double value in JavaScript, implementers can use the following snippet of code:

// the variable 'value' below is the JavaScript native double value that is to be converted
(value).toExponential(6).replace(/(e(?:\+|-))([0-9])$/, '$10$2')

When data needs to be normalized, JSON-LD authors should not use values that are going to undergo automatic conversion. This is due to the lossy nature of xsd:double values.

Some JSON serializers, such as PHP's native implementation, backslash-escapes the forward slash character. For example, the value http://example.com/ would be serialized as http:\/\/example.com\/ in some versions of PHP. This is problematic when generating a byte stream for processes such as normalization. There is no need to backslash-escape forward-slashes in JSON-LD. To aid interoperability between JSON-LD processors, a JSON-LD serializer must not backslash-escape forward slashes.

Round-tripping data can be problematic if we mix and match @coerce rules with JSON-native datatypes, like integers. Consider the following code example:

var myObj = { "@context" : {
                "number" : "http://example.com/vocab#number",
                "@coerce": {
                   "xsd:nonNegativeInteger": "number"
                }
              },
              "number" : 42 };

// Map the language-native object to JSON-LD
var jsonldText = jsonld.normalize(myObj);

// Convert the normalized object back to a JavaScript object
var myObj2 = jsonld.parse(jsonldText);

At this point, myObj2 and myObj will have different values for the "number" value. myObj will be the number 42, while myObj2 will be the string "42". This type of data round-tripping error can bite developers. We are currently wondering if having a "coerce validation" phase in the parsing/normalization phases would be a good idea. It would prevent data round-tripping issues like the one mentioned above.

6.13 RDF Conversion

A JSON-LD document may be converted to any other RDF-compatible document format using the algorithm specified in this section.

The JSON-LD Processing Model describes processing rules for extracting RDF from a JSON-LD document. Note that many uses of JSON-LD may not require generation of RDF.

The processing algorithm described in this section is provided in order to demonstrate how one might implement a JSON-LD to RDF processor. Conformant implementations are only required to produce the same type and number of triples during the output process and are not required to implement the algorithm exactly as described.

The RDF Conversion Algorithm is a work in progress.

6.13.1 Overview

This section is non-normative.

JSON-LD is intended to have an easy to parse grammar that closely models existing practice in using JSON for describing object representations. This allows the use of existing libraries for parsing JSON in a document-oriented fashion, or can allow for stream-based parsing similar to SAX.

As with other grammars used for describing Linked Data, a key concept is that of a resource. Resources may be of three basic types: IRIs, for describing externally named entities, BNodes, resources for which an external name does not exist, or is not known, and Literals, which describe terminal entities such as strings, dates and other representations having a lexical representation possibly including an explicit language or datatype.

Data described with JSON-LD may be considered to be the representation of a graph made up of subject and object resources related via a property resource. However, specific implementations may choose to operate on the document as a normal JSON description of objects having attributes.

6.13.2 RDF Conversion Algorithm Terms

default graph
the destination graph for all triples generated by JSON-LD markup.

6.13.3 RDF Conversion Algorithm

The algorithm below is designed for in-memory implementations with random access to JSON object elements.

A conforming JSON-LD processor implementing RDF conversion must implement a processing algorithm that results in the same default graph that the following algorithm generates:

  1. Create a new processor state with with the active context set to the initial context and active subject and active property initialized to NULL.
  2. If a JSON object is detected, perform the following steps:
    1. If the JSON object has a @context key, process the local context as described in Context.
    2. Create a new JSON object by mapping the keys from the current JSON object using the active context to new keys using the associated value from the current JSON object. Repeat the mapping until no entry is found within the active context for the key. Use the new JSON object in subsequent steps.
    3. If the JSON object has an @iri key, set the active object by performing IRI Expansion on the associated value. Generate a triple representing the active subject, the active property and the active object. Return the active object to the calling location.

      @iri really just behaves the same as @subject, consider consolidating them.

    4. If the JSON object has a @literal key, set the active object to a literal value as follows:
      1. as a typed literal if the JSON object contains a @datatype key after performing IRI Expansion on the specified@datatype.
      2. otherwise, as a plain literal. If the JSON object contains a @language key, use it's value to set the language of the plain literal.
      3. Generate a triple representing the active subject, the active property and the active object. Return the active object to the calling location.
    5. If the JSON object has a @subject key:
      1. If the value is a string, set the active object to the result of performing IRI Expansion. Generate a triple representing the active subject, the active property and the active object. Set the active subject to the active object.
      2. Create a new processor state using copies of the active context, active subject and active property and process the value starting at Step 2, set the active subject to the result and proceed using the previous processor state.
    6. If the JSON object does not have a @subject key, set the active object to newly generated blank node identifier. Generate a triple representing the active subject, the active property and the active object. Set the active subject to the active object.
    7. For each key in the JSON object that has not already been processed, perform the following steps:
      1. If the key is @type, set the active property to rdf:type.
      2. Otherwise, set the active property to the result of performing IRI Expansion on the key.
      3. Create a new processor state copies of the active context, active subject and active property and process the value starting at Step 2 and proceed using the previous processor state.
    8. Return the active object to the calling location.
  3. If a regular array is detected, process each value in the array by doing the following returning the result of processing the last value in the array:
    1. Create a new processor state using copies of the active context, active subject and active property and process the value starting at Step 2 then proceed using the previous processor state.
  4. If a string is detected:
    1. If the active property is the target of a @iri coercion, set the active object by performing IRI Expansion on the string.
    2. Otherwise, if the active property is the target of coercion, set the active object by creating a typed literal using the string and the coercion key as the datatype IRI.
    3. Otherwise, set the active object to a plain literal value created from the string.
    Generate a triple representing the active subject, the active property and the active object.
  5. If a number is detected, generate a typed literal using a string representation of the value with datatype set to either xsd:integer or xsd:double, depending on if the value contains a fractional and/or an exponential component. Generate a triple using the active subject, active property and the generated typed literal.
  6. Otherwise, if true or false is detected, generate a triple using the active subject, active property and a typed literal value created from the string representation of the value with datatype set to xsd:boolean.

A. Experimental Concepts

There are a few advanced concepts where it is not clear whether or not the JSON-LD specification is going to support the complexity necessary to support each concept. The entire section on Advanced Concepts should be considered as discussion points; it is merely a list of possibilities where all of the benefits and drawbacks have not been explored.

A.1 Disjoint Graphs

When serializing an RDF graph that contains two or more sections of the graph which are entirely disjoint, one must use an array to express the graph as two graphs. This may not be acceptable to some authors, who would rather express the information as one graph. Since, by definition, disjoint graphs require there to be two top-level objects, JSON-LD utilizes a mechanism that allows disjoint graphs to be expressed using a single graph.

Assume the following RDF graph:

<http://example.org/people#john>
   <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
      <http://xmlns.com/foaf/0.1/Person> .
<http://example.org/people#jane>
   <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
      <http://xmlns.com/foaf/0.1/Person> .

Since the two subjects are entirely disjoint with one another, it is impossible to express the RDF graph above using a single JSON object.

In JSON-LD, one can use the subject to express disjoint graphs as a single graph:

{
  "@context": {
    "Person": "http://xmlns.com/foaf/0.1/Person"
  },
  "@subject":
  [
    {
      "@subject": "http://example.org/people#john",
      "@type": "Person"
    },
    {
      "@subject": "http://example.org/people#jane",
      "@type": "Person"
    }
  ]
}

A disjoint graph could also be expressed like so:

[
  {
    "@subject": "http://example.org/people#john",
    "@type": "http://xmlns.com/foaf/0.1/Person"
  },
  {
    "@subject": "http://example.org/people#jane",
    "@type": "http://xmlns.com/foaf/0.1/Person"
  }
]

Warning: Using this serialisation format it is impossible to include @context given that the document's data structure is an array and not an object.

A.2 Lists

Because graphs do not describe ordering for links between nodes, in contrast to plain JSON, multi-valued properties in JSON-LD do not provide an ordering of the listed objects. For example, consider the following simple document:

{
...
  "@subject": "http://example.org/people#joebob",
  "nick": ["joe", "bob", "jaybee"],
...
}

This results in three triples being generated, each relating the subject to an individual object, with no inherent order.

To preserve the order of the objects, RDF-based languages, such as [TURTLE] use the concept of an rdf:List (as described in [RDF-SCHEMA]). This uses a sequence of unlabeled nodes with properties describing a value, a null-terminated next property. Without specific syntactical support, this could be represented in JSON-LD as follows:

{
...
  "@subject": "http://example.org/people#joebob",
  "nick": {,
    "@first": "joe",
    "@rest": {
      "@first": "bob",
      "@rest": {
        "@first": "jaybee",
        "@rest": "@nil"
        }
      }
    }
  },
...
}

As this notation is rather unwieldy and the notion of ordered collections is rather important in data modeling, it is useful to have specific language support. In JSON-LD, a list may be represented using the @list keyword as follows:

{
...
  "@subject": "http://example.org/people#joebob",
  "foaf:nick": {"@list": ["joe", "bob", "jaybee"]},
...
}

This describes the use of this array as being ordered, and order is maintained through normalization and RDF conversion. If every use of a given multi-valued property is a list, this may be abbreviated by adding an @coerce term:

{
  "@context": {
    ...
    "@coerce": {
      "@list": ["foaf:nick"]
    }
  },
...
  "@subject": "http://example.org/people#joebob",
  "foaf:nick": ["joe", "bob", "jaybee"],
...
}

There is an ongoing discussion about this issue. One of the proposed solutions is allowing to change the default behaviour so that arrays are considered as ordered lists by default.

A.2.1 Expansion

TBD.

A.2.2 Normalization

TBD.

A.2.3 RDF Conversion

To support RDF Conversion of lists, RDF Conversion Algorithm is updated as follows:

  1. 2.4a. If the JSON object has a @list key and the value is an array process the value as a list starting at Step 3a.
  2. 2.7.3. Create a new processor state copies of the active context, active subject and active property.
    1. If the active property is the target of a @list coercion, and the value is an array, process the value as a list starting at Step 3a.
    2. Otherwise, process the value starting at Step 2.
    3. Proceed using the previous processor state.
  3. 3a. Generate an RDF List by linking each element of the list using rdf:first and rdf:next, terminating the list with rdf:nil using the following sequence:
    1. If the list has no element, generate a triple using the active subject, active property and rdf:nil.
    2. Otherwise, generate a triple using using the active subject, active property and a newly generated BNode identified as first blank node identifier.
    3. For each element other than the last element in the list:
      1. Create a processor state using the active context, first blank node identifier as the active subject, and rdf:first as the active property.
      2. Unless this is the last element in the list, generate a new BNode identified as rest blank node identifier, otherwise use rdf:nil.
      3. Generate a new triple using first blank node identifier, rdf:rest and rest blank node identifier.
      4. Set first blank node identifier to rest blank node identifier.

B. Markup Examples

The JSON-LD markup examples below demonstrate how JSON-LD can be used to express semantic data marked up in other languages such as RDFa, Microformats, and Microdata. These sections are merely provided as proof that JSON-LD is very flexible in what it can express across different Linked Data approaches.

B.1 RDFa

The following example describes three people with their respective names and homepages.

<div prefix="foaf: http://xmlns.com/foaf/0.1/">
   <ul>
      <li typeof="foaf:Person">
        <a rel="foaf:homepage" href="http://example.com/bob/" property="foaf:name" >Bob</a>
      </li>
      <li typeof="foaf:Person">
        <a rel="foaf:homepage" href="http://example.com/eve/" property="foaf:name" >Eve</a>
      </li>
      <li typeof="foaf:Person">
        <a rel="foaf:homepage" href="http://example.com/manu/" property="foaf:name" >Manu</a>
      </li>
   </ul>
</div>

An example JSON-LD implementation is described below, however, there are other ways to mark-up this information such that the context is not repeated.

{
  "@context": { "foaf": "http://xmlns.com/foaf/0.1/"},
  "@subject": [
   {
     "@subject": "_:bnode1",
     "@type": "foaf:Person",
     "foaf:homepage": "http://example.com/bob/",
     "foaf:name": "Bob"
   },
   {
     "@subject": "_:bnode2",
     "@type": "foaf:Person",
     "foaf:homepage": "http://example.com/eve/",
     "foaf:name": "Eve"
   },
   {
     "@subject": "_:bnode3",
     "@type": "foaf:Person",
     "foaf:homepage": "http://example.com/manu/",
     "foaf:name": "Manu"
   }
  ]
}

B.2 Microformats

The following example uses a simple Microformats hCard example to express how the Microformat is represented in JSON-LD.

<div class="vcard">
 <a class="url fn" href="http://tantek.com/">Tantek Çelik</a>
</div>

The representation of the hCard expresses the Microformat terms in the context and uses them directly for the url and fn properties. Also note that the Microformat to JSON-LD processor has generated the proper URL type for http://tantek.com.

{
  "@context":
  {
    "vcard": "http://microformats.org/profile/hcard#vcard",
    "url": "http://microformats.org/profile/hcard#url",
    "fn": "http://microformats.org/profile/hcard#fn",
    "@coerce": { "@iri": "url" }
  },
  "@subject": "_:bnode1",
  "@type": "vcard",
  "url": "http://tantek.com/",
  "fn": "Tantek Çelik"
}

B.3 Microdata

The Microdata example below expresses book information as a Microdata Work item.

<dl itemscope
    itemtype="http://purl.org/vocab/frbr/core#Work"
    itemid="http://purl.oreilly.com/works/45U8QJGZSQKDH8N">
 <dt>Title</dt>
 <dd><cite itemprop="http://purl.org/dc/terms/title">Just a Geek</cite></dd>
 <dt>By</dt>
 <dd><span itemprop="http://purl.org/dc/terms/creator">Wil Wheaton</span></dd>
 <dt>Format</dt>
 <dd itemprop="http://purl.org/vocab/frbr/core#realization"
     itemscope
     itemtype="http://purl.org/vocab/frbr/core#Expression"
     itemid="http://purl.oreilly.com/products/9780596007683.BOOK">
  <link itemprop="http://purl.org/dc/terms/type" href="http://purl.oreilly.com/product-types/BOOK">
  Print
 </dd>
 <dd itemprop="http://purl.org/vocab/frbr/core#realization"
     itemscope
     itemtype="http://purl.org/vocab/frbr/core#Expression"
     itemid="http://purl.oreilly.com/products/9780596802189.EBOOK">
  <link itemprop="http://purl.org/dc/terms/type" href="http://purl.oreilly.com/product-types/EBOOK">
  Ebook
 </dd>
</dl>

Note that the JSON-LD representation of the Microdata information stays true to the desires of the Microdata community to avoid contexts and instead refer to items by their full IRI.

[
  {
    "@subject": "http://purl.oreilly.com/works/45U8QJGZSQKDH8N",
    "@type": "http://purl.org/vocab/frbr/core#Work",
    "http://purl.org/dc/terms/title": "Just a Geek",
    "http://purl.org/dc/terms/creator": "Whil Wheaton",
    "http://purl.org/vocab/frbr/core#realization":
      ["http://purl.oreilly.com/products/9780596007683.BOOK", "http://purl.oreilly.com/products/9780596802189.EBOOK"]
  },
  {
    "@subject": "http://purl.oreilly.com/products/9780596007683.BOOK",
    "@type": "http://purl.org/vocab/frbr/core#Expression",
    "http://purl.org/dc/terms/type": "http://purl.oreilly.com/product-types/BOOK"
  },
  {
    "@subject": "http://purl.oreilly.com/products/9780596802189.EBOOK",
    "@type": "http://purl.org/vocab/frbr/core#Expression",
    "http://purl.org/dc/terms/type": "http://purl.oreilly.com/product-types/EBOOK"
  }
]

C. Mashing Up Vocabularies

Developers would also benefit by allowing other vocabularies to be used automatically with their JSON API. There are over 200 Web Vocabulary Documents that are available for use on the Web today. Some of these vocabularies are:

You can use these vocabularies in combination, like so:

{
  "@type": "foaf:Person",
  "foaf:name": "Manu Sporny",
  "foaf:homepage": "http://manu.sporny.org/",
  "sioc:avatar": "http://twitter.com/account/profile_image/manusporny"
}

Developers can also specify their own Vocabulary documents by modifying the active context in-line using the @context keyword, like so:

{
  "@context": { "myvocab": "http://example.org/myvocab#" },
  "@type": "foaf:Person",
  "foaf:name": "Manu Sporny",
  "foaf:homepage": "http://manu.sporny.org/",
  "sioc:avatar": "http://twitter.com/account/profile_image/manusporny",
  "myvocab:personality": "friendly"
}

The @context keyword is used to change how the JSON-LD processor evaluates key-value pairs. In this case, it was used to map one string ('myvocab') to another string, which is interpreted as a IRI. In the example above, the myvocab string is replaced with "http://example.org/myvocab#" when it is detected. In the example above, "myvocab:personality" would expand to "http://example.org/myvocab#personality".

This mechanism is a short-hand, called a Web Vocabulary prefix, and provides developers an unambiguous way to map any JSON value to RDF.

D. IANA Considerations

This section is included merely for standards community review and will be submitted to the Internet Engineering Steering Group if this specification becomes a W3C Recommendation.

Type name:
application
Subtype name:
ld+json
Required parameters:
None
Optional parameters:
form
Determines the serialization form for the JSON-LD document. Valid values include; compacted, expanded, framed, and normalized. Other values are allowed, but must be pre-pended with a x- string until they are clearly defined by a stable specification. If no form is specified in an HTTP request header to a responding application, such as a Web server, the application may choose any form. If no form is specified for a receiving application, the form must not be assumed to take any particular form.
It is currently being discussed to remove form=framed from this specification as there are several issues with it.
Encoding considerations:
The same as the application/json MIME media type.
Security considerations:
Since JSON-LD is intended to be a pure data exchange format for directed graphs, the serialization should not be passed through a code execution mechanism such as JavaScript's eval() function. It is recommended that a conforming parser does not attempt to directly evaluate the JSON-LD serialization and instead purely parse the input into a language-native data structure.
Interoperability considerations:
Not Applicable
Published specification:
The JSON-LD specification.
Applications that use this media type:
Any programming environment that requires the exchange of directed graphs. Implementations of JSON-LD have been created for JavaScript, Python, Ruby, PHP and C++.
Additional information:
Magic number(s):
Not Applicable
File extension(s):
.jsonld
Macintosh file type code(s):
TEXT
Person & email address to contact for further information:
Manu Sporny
Intended usage:
Common
Restrictions on usage:
None
Author(s):
Manu Sporny, Gregg Kellogg, Dave Longley
Change controller:
W3C

E. Acknowledgements

The editors would like to thank Mark Birbeck, who provided a great deal of the initial push behind the JSON-LD work via his work on RDFj, Dave Longley, Dave Lehn and Mike Johnson who reviewed, provided feedback, and performed several implementations of the specification, and Ian Davis, who created RDF/JSON. Thanks also to Nathan Rixham, Bradley P. Allen, Kingsley Idehen, Glenn McDonald, Alexandre Passant, Danny Ayers, Ted Thibodeau Jr., Olivier Grisel, Niklas Lindström, Markus Lanthaler, and Richard Cyganiak for their input on the specification. Another huge thank you goes out to Dave Longley who designed many of the algorithms used in this specification, including the normalization algorithm which was a monumentally difficult design challenge.

F. References

F.1 Normative references

[BCP47]
A. Phillips, M. Davis. Tags for Identifying Languages September 2009. IETF Best Current Practice. URL: http://tools.ietf.org/rfc/bcp/bcp47.txt
[RDF-CONCEPTS]
Graham Klyne; Jeremy J. Carroll. Resource Description Framework (RDF): Concepts and Abstract Syntax. 10 February 2004. W3C Recommendation. URL: http://www.w3.org/TR/2004/REC-rdf-concepts-20040210
[RFC3986]
T. Berners-Lee; R. Fielding; L. Masinter. Uniform Resource Identifier (URI): Generic Syntax. January 2005. Internet RFC 3986. URL: http://www.ietf.org/rfc/rfc3986.txt
[RFC3987]
M. Dürst; M. Suignard. Internationalized Resource Identifiers (IRIs). January 2005. Internet RFC 3987. URL: http://www.ietf.org/rfc/rfc3987.txt
[RFC4627]
D. Crockford. The application/json Media Type for JavaScript Object Notation (JSON) July 2006. Internet RFC 4627. URL: http://www.ietf.org/rfc/rfc4627.txt
[WEBIDL]
Cameron McCormack. Web IDL. 19 December 2008. W3C Working Draft. (Work in progress.) URL: http://www.w3.org/TR/2008/WD-WebIDL-20081219

F.2 Informative references

[ECMA-262]
ECMAScript Language Specification, Third Edition. December 1999. URL: http://www.ecma-international.org/publications/standards/Ecma-262.htm
[MICRODATA]
Ian Hickson; et al. Microdata 04 March 2010. W3C Working Draft. URL: http://www.w3.org/TR/microdata/
[MICROFORMATS]
Microformats. URL: http://microformats.org
[RDF-SCHEMA]
Dan Brickley; Ramanathan V. Guha. RDF Vocabulary Description Language 1.0: RDF Schema. 10 February 2004. W3C Recommendation. URL: http://www.w3.org/TR/2004/REC-rdf-schema-20040210
[RDFA-CORE]
Shane McCarron; et al. RDFa Core 1.1: Syntax and processing rules for embedding RDF through attributes. 31 March 2011. W3C Working Draft. URL: http://www.w3.org/TR/2011/WD-rdfa-core-20110331
[TURTLE]
David Beckett, Tim Berners-Lee. Turtle: Terse RDF Triple Language. January 2008. W3C Team Submission. URL: http://www.w3.org/TeamSubmission/turtle/