Conflict-Free Replicated Data Types (CRDT)
NextGraph supports several CRDTs natively, and is open to integrating more of them in the future.
For now, we offer:
-
Graph CRDT : the Semantic Web / Linked Data / RDF format, made available as a local-first CRDT model thanks to an OR-set logic (Observed Remove Set) formalized in the paper SU-set (SPARQL Update set). This allows the programmer to link data across documents, globally, and privately. Any NextGraph document features a Graph, that enables connecting it with other documents, and representing its data following any Ontology/vocabulary of the RDF world. Links in RDF are known as predicates and help establishing relationships between Documents while qualifying such relation. Learn more below about RDF and how it allows each individual key/value to point to another Document/Resource, similar to foreign keys in the SQL world. With the SPARQL language you can then traverse the Graph and navigate between Documents.
-
Automerge CRDT: A versatile and compact format that offers sequence and set operations in an integrated way, that lets all types be combined in one document. It is based on the formal research of Martin Kleppmann, Geoffrey Litt et al. and the Ink & Switch lab, implemented by Alex Good, and follows the RGA algorithm (Replicated Growable Array). Their work brought the formalization of Byzantine Eventual Consistency, upon which NextGraph builds its own design. Automerge offers a rich text API (Peritext) but we do not expose it in NextGraph for now, preferring the one of Yjs for all rich text purposes.
-
Yjs CRDT: Well-known and widely-used library based on the YATA model. It offers an efficient and robust format for sequences and sets. On top of those primitives, the library offers 4 formats: XML (used by all rich-text applications), plain text, maps (like JSON objects), and arrays (aka lists). We use the XML format for both the “Rich text” and “MarkDown” documents, and offer the 4 formats as separated classes of discrete documents for the programmer to use. We all thank Kevin Jahns for bringing this foundational work to the CRDT world. His paper (et al.) is available here.
In the future, we might want to integrate more CRDTs into NextGraph, specially the promising LORO or json-joy, or older CRDTs like Diamond Types.
If you are not familiar with CRDTs, there are some nice introductions you can read about that subject here and here.
Semantic Web, Linked Data, RDF
If the Semantic Web and RDF are still unknown to you, we have selected some good introductory materials here.
The essential information to understand about RDF is that it encodes the data in the form of triples, which are composition of 3 elements.
The 3 elements are called : Subject -> Predicate -> Object
. That’s one triple. The semantic database is just a set of triples.
The subject represents the Resource we are establishing facts about. The Predicate indicates the “key” or “property” we want to specify about the Resource. And the Object represents the “value”.
Hence, if we want to say that “Bob owns a bicycle”, then we write it this way : Bob -> Owns -> Bicycle
. We can also say that Bob -> color_of_eyes -> Blue
and so on.
In addition, the values (aka, the Object part of the triple) can also be a reference to another Resource. So basically we can link Resources (Subjects) together.
If we have a triple saying Alice -> lives_in -> Wonderland
, then we can also say that Bob -> is_friend_with -> Alice
. Here we have linked the 2 Resources together, and the Predicate is_friend_with
is not just a property, but it is in fact a relationship
. If Alice also considers Bob as a friend, then we could say the inverse relationship Alice -> is_friend_with -> Bob
.
We understand that the Predicates of the RDF world, are corresponding to the keys and properties of the JS/JSON world. Values of the JS world are equivalent to Objects of RDF, although in JS, you cannot encode a link to another (possibly remote) Resource as a value.
Finally, the Subject of a resource is its unique identifier. NextGraph assigns a unique ID to each Document, and that’s the Subject of the triples it contains. As an analogy, the same could be done in JSON by giving a unique name to a JSON file, or to a JS variable (POJO) holding some map of key/values.
3 parts of a Document
A Document in NextGraph is composed of 3 parts :
- the Graph part (some RDF triples). This is mandatory and always available in all documents, even if left empty
- the Discrete part (a Yjs or Automerge based document). This can be optional
- some optional binary files attached to the document.
Unified data model
Graph • Automerge • Yjs
3 CRDT models in 1
NextGraph combines together those 3 CRDT models in order to benefit from all the advantages of each one. As you will see below, they are complementary and the developer of an app using our framework, can decide which model to use based on, depending on the data requirements of the app. We tightly integrated the 3 models in order to normalize the APIs and to offer an interoperable, unified and feature-rich data layer to the developer.
As you will see in the Sync Protocol chapter below, the programmer decides which CRDT model to use, not at the Document level, but at the Branch level. As Documents can be composed of many branches (that act like “blocks”), you are not restricted to one model or another. Your application can use the 3 CRDT models combined inside one Document, hence benefiting from all the features offered by each model. Even more, you can divide your data into several sections, and use a different block for each one, specifying which Discrete model to use block by block. Remember that in any case the Graph model is included by default in every block. So the question to ask yourself is which of the Automerge or Yjs model you want to use for the Discrete part of the block. Eventually, all the data blocks of your document are available as one to your application, and can be queried, mutated, and read seamlessly. More on that below in the section about the APIs.
Now let’s have a look at what those CRDTs have in common and what is different between them. We have marked 🔥 the features that are unique to each model and that we find very cool.
Graph (RDF) | Yjs | Automerge | ||||
---|---|---|---|---|---|---|
key/value | ✅ | ✅ | ✅ | |||
(aka property/value, and in RDF, it is called predicate/object) this is the basic feature that the 3 models offer. You have a Document, and you can add key/value pairs to it. Also known as Map or Object. | ||||||
property names | ✅ 🔥 predicate | string | string | |||
Thanks to the Ontology/Schema mechanism of RDF (OWL), the schema information is embedded inside the data (with what we call a Predicate), thus avoiding any need for schema migration. Also, it brings full data interoperability as many common ontologies already exist and can be reused, and new ones can be published or shared | ||||||
nested | ✅ blank nodes | ✅ | ✅ | |||
key/value pairs can be nested (like in JSON) | ||||||
sequence | ❌ * | ✅ | ✅ | |||
And like in JSON or Javascript, some keys can be Arrays (aka list), which preserve the ordering of the elements. (*) In RDF, storing arrays is a bit more tricky. For that purpose, Collections can encode order, but are not CRDT based nor concurrency safe | ||||||
sets | 🔺 multiset | ✅ | ✅ | |||
RDF predicates (the equivalent of properties or keys in discrete documents) are not unique. They can have multiple values. That’s the main difference between the discrete and graph models. We will offer the option to enforce rules on RDF data (with SHACL/SHEX) that could force unicity of keys, but that would require the use of Synchronous Transactions. Sets are usually represented in JS/JSON with a map, of which the values are not used (set to null or false), because keys are unique, in JSON we use the keys to represent the set. In RDF, keys are not unique, but a set can be represented by choosing a single key (a predicate) and its many values will represent the set, as a pair “key/value” is unique (aka a “predicate/object” pair). | ||||||
unique key | ❌ | ✅ | ✅ | |||
related to the above | ||||||
conflict resolution | ✅ | lamport clock ? | higher actor ID | |||
because RDF has no unique keys, it cannot conflict. | ||||||
CRDT strings in property values | ❌ | ❌ | ✅ 🔥 | |||
allows concurrent edits on the value of a property that is a string. This feature is only offered by Automerge. Very useful for collaborative forms or tables/spreadsheets by example! | ||||||
multi-lingual strings | ✅ 🔥 | ❌ | ❌ | |||
Store the value of a string property in several languages / translations | ||||||
Counter CRDT | ❌ | ❌ | ✅ 🔥 | |||
Counters are a safe way to manage integers in a concurrent system. Automerge is the only one offering counters. Please note that CRDT types in general are “eventual consistent” only (BASE model). If you need stronger guarantees like the ones provided by ACID systems (specially guaranteeing the sequencing of operations, very useful for preventing double-spending) then you have to use a Synchronous Transaction in NextGraph. | ||||||
link/ref values (foreign key) | ✅ 🔥 | ❌ * | ❌ * | |||
(*) discrete data cannot link to external documents. This is the reason why all Documents in NextGraph have a Graph part, in order to enable inter-linking of data and documents across the Giant Global Graph of Linked Data / Semantic Web | ||||||
Float values | ✅ | 🟧 | ✅ | |||
Yjs doesn’t enforce strong typing on values. They can be any valid JSON (and Floats are just Numbers). | ||||||
Date values | ✅ | ❌ | ✅ | |||
JSON doesn’t support JS Date datatype. So for the same reason as above, Yjs doesn’t support Dates. | ||||||
Binary buffer values | ✅ * | ✅ | ✅ | |||
(*) as base64 or hex encoded. Please note that for all purposes of storing binary data, you should use the binary files facility of each Document instead, which is much more efficient. | ||||||
boolean, integer values | ✅ | ✅ | ✅ | |||
NULL values | ❌ | ✅ | ✅ | |||
strongly typed decimal values | ✅ | ❌ | ❌ | |||
signed, unsigned, and different sizes of integers | ||||||
revisions, diff, revert | 🟧 | 🟧 | 🟧 | |||
🔥 implemented at the NextGraph level; work in progress. You will be able to see the diffs and access all the historical data of the Document, and also revert to previous versions. | ||||||
compact | ✅ | ✅ | ❓ | |||
compacting is always available as a feature at the NextGraph level (and will compact Graph and Discrete parts alike). Yjs tends to garbage collect deleted content; not sure if automerge does it. Compact will remove all the historical data and deleted content (you won’t be able to see diffs nor revert, for all the causal past happening before the compact operation. But normal CRDT behaviour can resume after) . This is a synchronous operation. | ||||||
snapshot | ✅ | ✅ | ✅ | |||
take a snapshot of the data at a given HEADs, and store it in a non-CRDT way so it can be opened quickly. Also removes all the historical and deleted data. But a snapshot cannot be used to continue collaborating on the document. See it as something similar to “export as a static file”. | ||||||
isolated transactions | ✅ | ✅ | ✅ | |||
A NextGraph transaction can atomically mutate both the Graph and the Discrete data in a single isolated transaction. Can be useful to enforce consistency and keep in sync between information stored in the discrete and graph parts of the same document. But: transactions cannot span multiple documents (for that matter, see smart contracts). When a SPARQL Update spans across Documents, then the Transaction is split into several ones (one for each target Document) and each one is applied separately, meaning, not atomically. Also, keep in mind, as explained above in the “Counter” section, that CRDTs are eventually consistent. If you need ACID guarantees, use a synchronous transaction instead. | ||||||
Svelte5 reactive Store (Runes) | 🟧 | 🟧 | 🟧 | |||
🔥 this is planned and will be available shortly. The store will be writable and will allow a bidirectional binding of the data to some javascript reactive variables in Svelte (same could be done for React) and we are considering the use of Valtio for a generic reactive store, that would also work on nodeJS and Deno | ||||||
queries across documents | ✅ 🔥SPARQL | 🟧 * | 🟧 * | |||
(*) support is planned at the NextGraph level, to be able to query discrete data too in SPARQL. (GraphQL support could then be added) | ||||||
export/import JSON | ✅ JSON-LD | ✅ | ✅ | |||
Rich Text | N/A | attributes on XMLElement | Marks and Block Markers and here | |||
Yjs integration for ProseMirror and Milkdown is quite stable. Peritext is newer and only offers ProseMirror integration. For this reason we use Yjs for Rich Text. Performance considerations should be evaluated too. | ||||||
Referencing rich text from outside | N/A | ✅ 🔥 Relative Position | ✅ get_cursor | |||
useful for anchoring comments, quotes and annotations (as you shouldn’t have to modify a document in order to add a comment or annotation to it). | ||||||
shared cursor | N/A | 🟧 * | 🟧 * | |||
(*) available in lib but not yet integrated in NextGraph | ||||||
undo/redo | N/A | 🟧 * | 🟧 * |
Keep on reading about how to handle the schema of your data, and what the Semantic Web is all about.