View on GitHub
AWS DBS Reference Architectures - Graph Databases

Data Models and Query Languages

A graph data model connects items or values using elements variously called edges, links or relationships. Many application domains can be modelled as graphs: social, follower and business relationship networks, IT and physical network infrastructures, organizatonal structures, entitlements and access control networks, logistics and delivery networks, supply chains, etc.

Neptune supports two different graph data models: the property graph data model, and the Resource Description Framework. Each data model has its own query language for creating and querying graph data. For a property graph, you create and query data using Apache Tinkerpop Gremlin, an open source query language supported by several other graph databases. For an RDF graph you create and query data using SPARQL, a graph pattern matching language standardized by the W3C.

Property Graph and Gremlin

Vertices and Edges

The property graph data model represents graph data as vertices and edges (sometimes called nodes and relationships). You typically use vertices to represent entities in your domain, edges to represent the relationships between these entities. Every edge must have a name, or label, and a direction – that is, a start vertex and an end vertex. Neptune’s property graph model doesn’t allow dangling edges.

Properties

You can attach one or more properties to each of the vertices and edges in your graph. Typically, you use vertex properties to represent the attributes of entities in your domain, and edge properties to represent the strength, weight or quality of a relationship. You can also use properties to represent metadata – timestamps, access control lists, etc.

IDs

Every vertex and every edge in the graph must have a unique ID. Because every edge has its own identity, you can create multiple edges connecting the same pair of vertices.

Some graph databases allow you to assign your own IDs to vertices and edges. Others automatically create IDs for you. Neptune allows you to supply your own IDs when creating vertices and edges: if you don’t assign your own ID to an element, Neptune will create a string-based UUID for you. All vertex IDs must be unique, and all edge IDs must be unique. However, Neptune does allow a vertex and an edge to have the same ID.

Labels

As well as adding properties to the elements in your graph, you can also attach labels to both the vertices and edges. Edge labels are mandatory: you must attach exactly one label to each edge in your graph. An edge’s label expresses the semantics of the relationship represented by the edge. Vertex labels are optional: you can attach zero, one or many labels to each vertex in your graph. Vertex labels allow you to tag, type and group vertices.

Example

In the following diagram we see three vertices. Each vertex is labelled User, and has an id, and firstName and lastName properties. The vertices are connected by edges labelled FOLLOWS.

Property Graph

To query a property graph in Neptune you use the Gremlin query language. The following Gremlin query finds the names of the users whom Bob follows:

g.V('p-1').out('FOLLOWS').valueMap('firstName', 'lastName')

Learn More

RDF Graph and SPARQL

RDF encodes resource descriptions in the form of subject-predicate-object triples. In contrast to the property graph model, which ‘chunks’ data into record-like vertices and edges with attached properties, RDF creates a more fine-grained representation of your domain.

The following diagram shows the same information as the property graph above, but this time encoded as RDF.

RDF

Subjects and predicates in RDF are always URIs. Object values can be either URIs or literals. In the example shown above, the triple contacts:p-2 contacts:firstName “Alice” comprises a URI subject and predicate, and a string literal object. Relationships between resources use URI-based object values.

To query an RDF graph you use SPARQL. The following SPARQL query finds the names of the users whom Bob follows:

PREFIX s: <http://www.example.com/social#>

SELECT ?firstName ?lastName WHERE {
    s:p-1 s:follows ?p .
    ?p s:firstName ?firstName .
    ?p s:lastName ?lastName
}

Choosing a Data Model and Query Language for Your Workload

Both graph data models and query languages – property graph and Gremlin, RDF and SPARQL – can be used to implement the majority of graph database workloads. Application developers and those coming from a relational database background often find the property graph model easier to work with, whereas those familiar with Semantic Web technologies may prefer RDF, but there are no hard-and-fast rules.

In choosing a model and query language, bear in mind the following points: