Unraveling Connections- Exploring Graph-Like Data Models


In scenarios where complex many-to-many relationships dominate, graph-like data models provide a natural and highly efficient solution. These models are well-suited to use cases such as social networks, recommendation engines, and road networks. This post explores the fundamentals of graph models, their key implementations, and examples of queries using popular graph query languages.


Understanding the Basics of Graph Data Models

A graph consists of:
Vertices (Nodes): Represent entities (people, locations, web pages).
Edges (Relationships): Represent how entities are connected.

Examples of Graph Use Cases:

  1. Social Networks:
    Vertices = People; Edges = Friendships.

  2. Web Graphs:
    Vertices = Web Pages; Edges = Hyperlinks.

  3. Road Networks:
    Vertices = Intersections; Edges = Roads or Rails.


The Property Graph Model

Widely implemented in graph databases like Neo4j, Titan, and InfiniteGraph, the property graph model assigns attributes to both vertices and edges.

Features of a Property Graph:

  • Vertices:
    • Unique identifiers.
    • Set of outgoing and incoming edges.
    • A collection of properties stored as key-value pairs.
  • Edges:
    • Unique identifiers.
    • Tail vertex, head vertex.
    • A label describing the relationship.
    • A collection of properties.

SQL Schema Representation:

CREATE TABLE vertices (  
    vertex_id   integer PRIMARY KEY,  
    properties  json  
);  
   
CREATE TABLE edges (  
    edge_id     integer PRIMARY KEY,  
    tail_vertex integer REFERENCES vertices (vertex_id),  
    head_vertex integer REFERENCES vertices (vertex_id),  
    label       text,  
    properties  json  
);  

Querying Graphs: Cypher Language

Cypher, the primary query language for Neo4j, provides a declarative syntax for traversing graphs.

Example: Insert Vertices and Edges

CREATE  
  (NAmerica:Location {name:'North America', type:'continent'}),  
  (USA:Location      {name:'United States', type:'country'}),  
  (Idaho:Location    {name:'Idaho', type:'state'}),  
  (Lucy:Person       {name:'Lucy'}),  
  (Idaho) -[:WITHIN]->  (USA)  -[:WITHIN]-> (NAmerica),  
  (Lucy)  -[:BORN_IN]-> (Idaho)  
Representation:
        North America  
             ↑  
       +------+  
       |WITHIN|  
       |      |  
      USA    Idaho  
       ↑       ↑  
    (Lucy)  BORN_IN  

Example: Complex Query in Cypher

Find all people born in the US and living in Europe:

MATCH  
  (person) -[:BORN_IN]-> () -[:WITHIN*0..]-> (us:Location {name:'United States'}),  
  (person) -[:LIVES_IN]-> () -[:WITHIN*0..]-> (eu:Location {name:'Europe'})  
RETURN person.name  

Explanation:

  1. Locate people with a BORN_IN relationship to any location within the US.
  2. Check if they also have a LIVES_IN relationship to a European location.
  3. Return their names.

Triple-Stores and SPARQL

The triple-store model, common in RDF databases, follows the structure of subject-predicate-object to represent relationships:
Example Triple: (Jim, likes, bananas)

In SPARQL, you can query triples declaratively:

SELECT ?person WHERE {  
  ?person :bornIn :usa .  
  ?person :livesIn :europe .  
}  

Datalog Approach - Example Query

Datalog uses simple facts and logical rules:
Facts:

name(usa, 'United States').    
type(usa, country).    
within(idaho, usa).    

Rules:

within_recursive(Location, Name) :- name(Location, Name).  
within_recursive(Location, Name) :- within(Location, SubLocation), within_recursive(SubLocation, Name).  

Query:
To find people who migrated from the US to Europe: ```prolog
emigrated(Person,

Series Designing Data-Intensive Applications Part 6 of 41
  1. Designing Reliable Data Systems
  2. What is Scalability in Data Systems?
  3. Building Maintainable Software Systems
  4. Relational Model Versus Document Model
  5. Speaking the Language of Data- A Guide to Query Languages
  6. Unraveling Connections- Exploring Graph-Like Data Models
  7. The Backbone of Databases- Data Structures that Power Storage
  8. Transaction Processing vs. Analytics Let's understand the divide
  9. Understanding Column-Oriented Storage- A Deep Dive into Analytics Optimization
  10. Formats for Encoding Data
  11. Modes of Dataflow in Distributed Systems
  12. Leaders and Followers - The Core of Replication
  13. Problems with Replication Lag - Challenges and Solutions
  14. Multi-Leader Replication in Distributed Databases
  15. Leaderless Replication Flexibility for Distributed Databases
  16. Partitioning and Replication in Scaling Distributed Databases
  17. Partitioning of Key-Value Data- Strategies and Challenges
  18. Partitioning and Secondary Indexes- Balancing Efficiency and Complexity
  19. Efficient Methods for Rebalancing Data in Distributed Systems
  20. Ensuring Accurate Request Routing in Distributed Databases
  21. The Slippery Concept of a Transaction
  22. Exploring Weak Isolation Levels in Databases
  23. Achieving Serializability in Transactions
  24. Faults and Partial Failures in Distributed Systems
  25. Navigating Unreliable Networks in Distributed Systems
  26. The Challenges of Unreliable Clocks in Distributed Systems
  27. Knowledge Truth and Lies in Distributed Systems
  28. Consistency Guarantees in Distributed Systems
  29. Linearizability in Distributed Systems
  30. Understanding Ordering Guarantees in Distributed Systems
  31. Achieving Reliability with Distributed Transactions and Consensus Mechanisms
  32. Leveraging Unix Tools for Efficient Batch Processing
  33. MapReduce and Distributed Filesystems- Foundations of Scalable Data Processing
  34. Advancing Beyond MapReduce- Modern Frameworks for Scalable Data Processing
  35. Enabling Reliable and Scalable Event Streams in Distributed Systems
  36. Synchronizing Databases with Real-Time Streams
  37. Unifying Batch and Stream Processing for Modern Pipelines
  38. Integrating Distributed Systems for Unified Data Pipelines
  39. Unbundling Monolithic Databases for Flexibility
  40. Building Correct Systems in Distributed Environments
  41. Ethical Data Practices for Building Better Systems

Want to get blog posts over email?

Enter your email address and get notified when there's a new post!