-
Notifications
You must be signed in to change notification settings - Fork 4
Description
For some applications it is desirable to to declare primary keys for tables. For example, sqlalchemy requires all mapped tables to have a PK if the ORM layer is to be used.
If we were to declare a primary key for statements we would need to either
- Add an additional statement id column and declare it PK
- Use a unique combination such as s,p,o,v as a composite PK
The first has disadvantages, e.g. overhead of adding SEQUENCEs, divergence from RDF model
The second seems reasonable. However, many RDBMs do not allow nulls in a PK. sqlite is more lax here though. However, even though sqlite allows it, certain tools such as sqlalchemy ORM will not allow nulls in PK fields. By design, with statements, EITHER object OR value MUST be NULL.
e.g. given:
class Statements(Base):
"""
Represents an RDF triple
"""
__tablename__ = 'statements'
stanza = Column(Text, primary_key=True)
subject = Column(Text, primary_key=True)
predicate = Column(Text, primary_key=True)
object = Column(Text, primary_key=True)
value = Column(Text, primary_key=True)
datatype = Column(Text, primary_key=True)
language = Column(Text, primary_key=True)if we try and do an insert with an rdfs:label (which has object NULL) we get:
sqlalchemy.orm.exc.FlushError: Can't update table statements using NULL for primary key value on column statements.object
This is OK if one doesn't want to use the sqlalchemy ORM layer (the lower-level base layer should work just fine), but it might be good to be aware of this limitation for future evolution
Aside: I used to be more of a computer science purist and decried ORMs as evil, impedance mismatch yadda yadda. But the sqlalchemy ORM is quite nice and I see the advantages of ORMs in solving more pedestrian problems...
I don't think we should change anything as I think the existing structure is nice and simple and good for certain kinds of queries.
I think the solution here is: if you want to do CRUD with an ORM, then define a separate schema in which object and value are combined into one field (e..g a basic "triples" table), and define a trivial bridge layer between rdftab.statements and your.triple (e.g populate one from the other via INSERT INTO .. AS SELECT ...). This is a separate concern from rdftab.
This issue can be immediately closed, just wanted to flag this if useful for planning future evolution.