What are Triples in graph theory, AI and ML?


Triples are fundamental structures in graph theory and knowledge representation that consist of three connected elements: a subject, a predicate (or relationship), and an object. This three-part structure forms the basic building block for representing facts, relationships, and data in various graph-based systems, particularly in semantic web technologies and knowledge graphs.

Triples have other names, but always have three partsTriples have other names, but always have three parts

You probably intuitively understand triples from your elementary school days.

Sentence structure, like we learned in school, can be recognised as a triple.Sentence structure, like we learned in school, can be recognised as a triple.

The concept of triples emerged from the intersection of mathematical logic, linguistics, and computer science in the mid-20th century. While ordered triples had long been used in mathematics, their application to knowledge representation gained prominence with the development of semantic networks and database theory. The standardization of the Resource Description Framework (RDF) in the late 1990s established triples as a cornerstone of modern graph-based data representation.

In practice, triples function like simple sentences that express a single fact or relationship. Each component plays a specific role: the subject is the entity about which a statement is made, the predicate defines the type of relationship or property being described, and the object is the target entity or value. This structure allows complex networks of information to be built from simple, atomic statements that can be easily stored, queried, and analyzed.

The power of triples lies in their simplicity and flexibility. When combined, they can represent intricate webs of relationships and knowledge. This makes them particularly valuable in applications ranging from semantic web databases and knowledge graphs to artificial intelligence systems and natural language processing. Their standardized format also facilitates data integration and interchange between different systems.

Example:

Consider a knowledge graph about literature. The following triples might represent facts about Shakespeare:

  • (Shakespeare, wrote, "Hamlet")
  • (Shakespeare, bornIn, "Stratford-upon-Avon")
  • ("Hamlet", hasGenre, "Tragedy")
  • ("Hamlet", publishedIn, 1603)

Below is a Python example of how to represent these triples in a graph:

from rdflib import Graph, Literal, RDF, URIRef

# Create a new graph
g = Graph()

# Define some resources
shakespeare = URIRef("http://duohub.ai/Shakespeare")
hamlet = URIRef("http://duohub.ai/Hamlet")
wrote = URIRef("http://duohub.ai/wrote")

# Add a triple to the graph
g.add((shakespeare, wrote, hamlet))

# Query the graph
for subj, pred, obj in g:
    print(f"Subject: {subj}\nPredicate: {pred}\nObject: {obj}\n")