Pyirk User Documentation Overview
Keys in Pyirk
In Pyirk there are the following kinds of keys:
a) short_key like
"R1234"
b) name-labeled key like
"R1234__my_relation"
(consisting of a short_key, a delimiter (__
) and a label)c) prefixed short_key like
"bi__R1234"
(here the prefixbi
refers to the modulebuiltin_entities
)d) prefixed name-labeled key like
"bi__R1234__my_relation"
e) index-labeled key like
"R1234['my relation']"
Note: prefixed and name-labeled keys can optionally have a language indicator. Examples: "bi__R1__de"
or "R1__has_label__fr"
.
Also, the leading character indicates the entity type (called EType
in the code): I
→ item, R
→ relation.
The usage of these syntax variants depends on the context.
For more information see See also Pyirk Modules and Packages.
Visualization
Currently there is some basic visualization support via the command line. To visualize your a module (including its relations to the builtin_entities) you can use a command like
pyirk --load-mod demo-module.py demo -vis __all__
Command Line Interface
For an overview of available command line options, see the CLI page or the command:
pyirk -h
Interactive Usage
To open an IPython shell with a loaded module run e.g.
pyirk -i -l control_theory1.py ct
Then, you have ct
as variable in your namespace and can e.g. run print(ct.I5167.R1)
.
(The above command assumes that the file control_theory1.py
is in your current working directory.)
Update Test Data
pyirk --update-test-data
For details see devdoc#test_data
Multilinguality
Pyirk aims to support an arbitrary number of languages by so called language specified strings. Currently support for English and German is preconfigured in pyirk.settings
. These language specified strings are instances of the class rdflib.Literal
where the .language
-attribute is set to one of the values from pyirk.setting.SUPPORTED_LANGUAGES
and can be created like:
from pyirk import de, en
# ...
lss1 = "some english words"@en
lss2 = "ein paar deutsche Wörter"@de
where en, de
are instances of pyirk.core.LanguageCode
.
The usage inside Pyirk is best demonstrated by the unittest test_c02__multilingual_relations
, see test_core.py (maybe change branch).
Patterns for Knowledge Representation in Pyirk
In Pyirk knowledge is represented via entities and statements (inspired by Wikidata). There are two types of entities: items (mostly associated with nouns) and relations (mostly associated with verbs). Statements consist of subject-predicate-object-triples.
subject: can be any entity (item or a relation),
predicate: is always a relation,
object: can be any entity or literal.
Literals are “atomic” values like strings, numbers or boolean values.
Every entity has short_key (entity.short_key
, see also Keys in Pyirk.) and an uri (entity.uri
). It is recommended but not required that every entity has a label (by means of relation R1["has label"]
) and a description (by means of R2["has description"]
).
Items (Python Subclass of core.Entity
)
The short_key
of any items starts with “I
” and ends with a sequence of number characters (maximum sequence length not yet specified). Optionally the second character is “a
” which indicates that this item was generated automatically (see below).
(Almost) All items are part of a taxonomy, i.e. a hierarchy of “is-a”-relations). This is expressed by the relations R3["is_subclass_of"]
and R4["is instance of"]
.
Note
Unlike in OWL (but like in Wikidata) an item can be an instance and a class at the same time. This allows to treat classes as “ordinary” items if necessary, e.g. use them directly in statements.
Automatically Generated Items
One consequence of expressing knowledge as a collection of triples is the necessity of auxiliary items. E.g. consider the equation \(y = \sin(x)\) where x, y, sin
can be assumed to be well defined items. Because the predicate must be a relation, it is not possible to relate these three items in one triple. The usual approach to deal with such situations is to introduce auxiliary items and more triples (see also wikipedia on “reification”). One possible (fictional) triple representation of the above equation is
auxiliary_expr is_function_call_of_type sin
auxiliary_expr has_arg x
y is_equal_to expr
One of the main goals of Pyirk is to simplify the creation of triples which involves creating auxiliary items (such as evaluated expressions). This can be achieved by calling functions such as pyirk.instance_of(...)
. A more sophisticated way is to overload the __call__
method of entities.
The __call__
-Method
The class pyirk.Entity
implements the __call__
method which formally makes all items and relations callable Python objects. However, by default no method _custom_call
is implemented which results in an exception. Associating a _custom_call
method and thus truly make an item callable can be achieved by
explicitly adding the method, like e.g. in
I4895["mathematical operator"].add_method(p.create_evaluated_mapping, "_custom_call")
creating an item which is a subclass (
R3
) or instance (R4
) of a method which already has a_custom_call
method, seecore.Entity._perform_inheritance
andcore.Entity._perform_instantiation
for details.
Adding Convenience Methods
The method core.Entity.add_method(...)
can be used to add arbitrary methods to items (which then can be inherited by other items). Example: see how the function builtin_entities.get_arguments
is attached to every result of builtin_entities.create_evaluated_mapping
(which itself is used as _custom_call
method).
Relations (core.Relation
, subclass of core.Entity
)
The .short_key
of any relation starts with R
. The predicate part of a semantic triple must always be a (python) instance of Core.Relation
. In general they can occur as subject or object as well.
From a graph perspective the relation defines the type of the edge between two nodes. The nodes are typically Item
-instances.
Literals (core.Literal
imported from rdflib
)
Instances of the class model string values (including a .language
attribute).
Statements (core.Statement
)
Instances of this class model semantic triples (subject, predicate, object) and corresponding qualifiers. Every edge in the knowledge graph corresponds to a statement instance.
Note: For technical reasons for every Statement
instance there exits a dual Statement
instance. For most situations this does not matter, though.
The whole knowledge graph is a collection of Entities (Items, Relation, Literals) and Statements. Roughly speaking, the collection of Entities defines what exists (in the respective universe of discourse) while the collection of Statements defines how these things are related. Because flat subject-predicate-object triples have very limited expressivity it is possible to “make statements about statements”, i.e. use a Statement
instance as subject another triple. This Wikidata-inspired mechanism is called qualifiers (see below).
Qualifiers
Basic statements in Pyirk are modeled as subject
-predicate
-object
-triples.
E.g. to express that R. Kalman works at Stanford University one could use:
# example from ocse0.2 (adapted)
I2746["Rudolf Kalman"].set_relation(R1833["has employer"], I9942["Stanford University"])
#.
This results in the triple: (I2746, R1833, I9942)
. In Pyirk such triples are modeled as instances of class Statement
; each such instance represents an edge in the knowledge graph, where the subject and object are the corresponding nodes and each such edge has a lable (the relation type) and optionally other information attached to it.
However, in many cases more complexity is needed. To express that Kalman worked at Stanford between 1964 and 1971, we can exploit that Statement
-instances can themselves be use as subject of other triples, by means of so called qualifiers:
start_time = p.QualifierFactory(R4156["has start time"])
end_time = p.QualifierFactory(R4698["has end time"])
I2746["Rudolf Kalman"].set_relation(
R1833["has employer"], I9942["Stanford University"], qualifiers=[start_time("1964"), end_time("1971")]
)
#.
Here start_time
and end_time
are instances of the class QualifierFactory
. If such an instance is called, it returns an instance of class RawQualifier
which is basically a yet incomplete triple where only the predicate and the object is fixed. The subject of this triple will be formed by the main statement itself (modeled by an instance of Statement
).
Thus the above code creates three Statement
instances here simplified:
S(2746, R1833, I9942) # the main statement, now referenced as stm1
S(stm1, R4156, "1964")
S(stm1, R4698, "1971")
#.
Note
The concept of qualifiers is borrowed from Wikidata, see e.g the WD-SPARQL-tutorial
Summary: Qualifiers are a flexible possibility to model “information about information” in Pyirk. They are used, e.g. to model the universal quantification.
Scopes
Basics
Many knowledge artifacts (such as theorems or definitions) consists of multiple simpler statements which are in a specific semantic relation to each other. Consider the example theorem:
Let \((a, b, c)\) be the sides of a triangle, ordered from shortest to longest, and \((l_a, l_b, l_c)\) the respective lengths. If the angle between a and b is a right angle then the equation \(l_c^2 = l_a^2 + l_b^2\) holds.
Such a theorem consists of several “semantic parts”, which in the context of Pyirk are called scopes. In particular we have the three following scopes:
setting: “Let \((a, b, c)\) be the sides of a triangle, ordered from shortest to longest, and (la, lb, lc) the respective lengths.”
premise: “If the angle between a and b is a rect angle”
assertion: “then the equation \(l_c^2 = l_a^2 + l_b^2\) holds.”
The concepts “premise” and “assertion” are usually used to refer to parts of theorems (etc). Additionally PyIRK uses the “setting”-scope to refer to those statements which do “set the stage” to properly formulate the premise and the assertion (e.g. by introducing and specifying the relevant objects).
Scopes in Pyirk
Scopes are represented by Items (instances (R4
) of I16["scope"]
). A scope item is specified by R64__has_scope_type
. It is associated with a parent item (e.g. a theorem) via R21__is_scope_of
. A statement which semantically belongs to a specific scope is associated to the respective scope item via the qualifier R20__has_defining_scope
.
Note
R21__is_scope_of
and R20__has_defining_scope
are not inverse (R68__is_inverse_of
) to each other.
Notation of Scopes via Context Managers (with ... as cm
)
To simplify the creation of the auxiliary scope-items python context managers (i.e. with
-statements) are used. This is illustrated by the following example:
I5000 = p.create_item(
R1__has_label="simplified Pythagorean theorem",
R4__is_instance_of=p.I15["implication proposition"],
)
# because I5000 is an instance of I15 it has a `.scope` method:
with I5000["simplified Pythagorean theorem"].scope("setting") as cm:
# the theorem should hold for every planar triangle,
# thus a universally quantified instance is created
cm.new_var(ta=p.uq_instance_of(I1000["planar triangle"]))
cm.new_var(sides=I1001["get polygon sides ordered by length"](cm.ta))
a, b, c = p.unpack_tuple_item(cm.sides)
la, lb, lc = a.R2000__has_length, b.R2000, c.R2000
with I5000["simplified Pythagorean theorem"].scope("premise") as cm:
cm.new_equation(lhs=I1002["angle"](a, b), rhs=I1003["right angle"])
with I5000["simplified Pythagorean theorem"].scope("assertion") as cm:
# convert a pyirk items into sympy.Symbol instances to conveniently
# denote formulas (see documentation below)
La, Lb, Lc = p.items_to_symbols(la, lb, lc)
cm.new_equation( La**2 + Lb**2, "==", Lc**2 )
Operators
Example from math.py
(OCSE):
I4895 = p.create_item(
R1__has_label="mathematical operator",
R2__has_description="general (unspecified) mathematical operator",
R3__is_subclass_of=p.I12["mathematical object"],
)
I4895["mathematical operator"].add_method(p.create_evaluated_mapping, "_custom_call")
I5177 = p.create_item(
R1__has_label="matmul",
R2__has_description=("matrix multiplication operator"),
R4__is_instance_of=I4895["mathematical operator"],
R8__has_domain_of_argument_1=I9904["matrix"],
R9__has_domain_of_argument_2=I9904["matrix"],
R11__has_range_of_result=I9904["matrix"],
)
# representing the product of two matrices:
A = p.instance_of(I9904["matrix"])
B = p.instance_of(I9904["matrix"]])
# this call creates and returns a new item
# (instance of `I32["evaluated mapping"]`)
C = I5177["matmul"](A, B)
# equivalent but more readable:
mul = I5177["matmul"]
C = mul(A, B)
Representing Formulas
In the module math1.py
of OCSE there is an implementation for a convenient formula notation (write x + y + z
instead of add_item(x, add_item(y, z))
). See this example from the OCSE unittests:
ma = p.irkloader.load_mod_from_path(pjoin(OCSE_PATH, "math1.py"), prefix="ma")
t = p.instance_of(ma.I2917["planar triangle"])
sides = ma.I9148["get polygon sides ordered by length"](t)
a, b, c = sides.R39__has_element
la, lb, lc = ma.items_to_symbols(a, b, c, relation=ma.R2495["has length"])
symbolic_sum = la + lb + lc
sum_item = ma.symbolic_expression_to_graph_expression(symbolic_sum)
Convenience-Expressions
Warning
This is not yet implemented. However, see formula representation.
While the operator approach is suitable to create the appropriate notes and edges in the knowledge graph it is not very convenient to write more complex formulas in that way. Thus pyirk offers a convenience mechanism based on the computer algebra package Sympy. The function builtin_entities.items_to_symbols()
creates a sympy symbol for every passed item (and keeps track of the associations). Then, a formula can be denoted using “usual” python syntax with operator signs +
, -
, *
, /
, and **
which results in an instance of sympy.core.expr.Expr
. These expressions can be passed, e.g., to cm.new_equation
where they are converted back to pyirk-items. In other words the following two snippets are equivalent:
# approach 1: using intermediate symbolic expressions
La, Lb, Lc = p.items_to_symbols(la, lb, lc)
cm.new_equation( La**2 + Lb**2, "==", Lc**2 )
# approach 0: without using intermediate symbolic expressions
sq = I1010["squared"]
plus = I1011["plus"]
cm.new_equation( plus(sq(La), sq(Lb)), "==", sq(Lc) )
Stubs (I50["Stub"]
, I000["some arbitrary label"]
and R000["also"]
)
One challenge in formal knowledge representation is Where to begin?. Suppose you want to formalize some knowledge about triangles. It seems natural that you introduce the class triangle as a subclass of polygon. However, the class polygon should also be a subclass of something and so on.
As modelling all knowledge is unfeasible at some points it is necessary to model incomplete entities (Ideally, theses are some relation-steps away from the relevant entities of the domain). To facilitate this there exists I50["stub"]
. This item can be used as (base) class for any item which at the moment no further (taxonomic) information should be modeled. The name “stub” is inspired by Wikipedia’s (stub-pages](https://en.wikipedia.org/wiki/Wikipedia:Stub). Example:
I1234 = p.create_item(
R1__has_label="polygon",
R2__has_description="",
R3__is_subclass_of=p.I50["stub"],
)
In some situations it is desireable to use items and relations which do not yet exist. This can be done by I000["dummy item]
and R000["dummy relation"]
. Both entities can be used with arbitrary labels and can thus be used regarded as a special kind of comment. Example:
I1234 = p.create_item(
R1__has_label="polygon",
R2__has_description="",
R3__is_subclass_of=p.I000["general geometric figure"],
R000__has_dimension=2,
)
This allows to focus a modeling session on the important items and relations and prevents to get distracted by introducing entities of subordinate relevance.
It is quite probable that even mature irk-ontologies contain relations involving I50
. Such items can be considered to constitute the “border of the domain of discourse”. On the other hand, I000
and R000
should be used only temporarily and be replaced soon, e.g., by new instances/subclasses of I50
.
Universal and Existential Quantification
Background, see https://en.wikipedia.org/wiki/Quantifier_(logic).
commonly used quantifiers are ∀ (
$\forall$
) and ∃ ($\exists$
).
They are also called universal quantifier and existential quantifier. In Pyirk they can be expressed via
Qualifiers. In particular (defined in module
builtin_entities
):univ_quant = QualifierFactory(R44["is universally quantified"])
usage (in OCSE):
cm.new_rel(cm.z, p.R15["is element of"], cm.HP, qualifiers=p.univ_quant(True))
exis_quant = QualifierFactory(R66["is existentially quantified"])
usage (in OCSE):
cm.new_var(y=p.instance_of(p.I37["integer number"], qualifiers=[p.exis_quant(True)]))
(Sub)scopes:
# excerpt from test_core.py with I7324["definition of something"].scope("premise") as cm: with cm.universally_quantified() as cm2: cm2.add_condition_statement(cm.x, p.R15["is element of"], my_set) # ... with I7324["definition of something"].scope("assertion") as cm: # also pointless direct meaning, only to test contexts with cm.existentially_quantified() as cm2: z = cm2.new_condition_var(z=p.instance_of(p.I39["positive integer"]))
Warning
Despite having similar phonetics (and spelling) quantifiers (logic operators) and qualifiers (knowledge modeling technique, in triple-based knowledge graphs) are totally different concepts. However, qualifiers can (among many other use cases) be used to model universal or existential quantification of an statement.
Pyirk Modules and Packages
Pyirk entities and statements are organized in Pyirk modules (python files). Each module has to specify its own URI via the variable __URI__
. The uri of an entity from that module is formed by <module URI>#<entity short_key>
. Modules can be bundled together to form pyirk packages. A Pyirk package consists of a directory containing a file irkpackage.toml
and at least one Pyirk module.
Modules can depend on other modules. A usual pattern is the following:
# in module control_theory1.py
import pyirk as p
mod = p.irkloader.load_mod_from_path("./math1.py", prefix="ma")
Here the variable mod
is the module object (like from ordinary python import) and allows to access to the complete namespace of that module:
# ...
A = p.instance_of(mod.I9904["matrix"])
The prefix "ma"
can also be used to refer to that module like here
# ...
res = A.ma__R8736__depends_polynomially_on
Rationale: The attribute name ma__R8736__depends_polynomially_on
is handled as a string by Python (in the method __getattr__
). While mod.R8736
is the relation object we cannot use this syntax as attribute name.
See also Keys in Pyirk.