Skip to content

Latest commit

 

History

History
253 lines (201 loc) · 8.42 KB

File metadata and controls

253 lines (201 loc) · 8.42 KB

Representing multiverses with Datomic

Related articles

Project code
https://github.com/semiosis/pen.el/

Summary

After reading |:ϝ∷¦ϝ’s blog article LMs are multiverse generators :: Moire, I decided to take a closer look into Datomic as a possible store for LM generations.

Something which caught my attention which Rich Hickey mentioned was the need for immutability and to not throw away data, in order to make progress.

|:ϝ∷¦ϝ also mentioned this when we talked about keeping our notes around and one day fine-tuning LMs on them.

Since we are now using LM’s to facilitate our creativity, we need a rock solid and distributed foundation for retaining memories. Datomic is a great candidate for such a database.

./datomic-memory-engine.png

  • Datomic can harness storage from many different storage and database services
    • Even relational databases
  • Has pluggable comparators
    • Two sorts that are always maintained
      • Primary by entity
      • Primary by attribute
  • Has both the characteristics of a document store and a column store
  • assert and retract are insufficient
    • So we have tx-fn (transaction functions)
  • stable basis
    • same query (including time), same result
  • transactions well defined
    • A transaction is a function to a database
    • functional accretion
  • Embedded Datalog
    • Datomic with Rich Hickey - YouTube
    • Subset of Prolog.
    • guaranteed termination.
    • Set oriented. Has set orientation, unlike prolog which is tuple at a time.
    • A datalog program contains
      • db(s) + rules + queries
        • one or more databases,
        • a set of rules which you can map to views
        • a set of queries
    • Completely data-driven interface to datalog
    • has the power of relational algebra
    • It’s embedded in clojure
      • syntactically subset of Clojure
      • semantically subset of prolog

Example datalog code

Can read this as a map where each key maps to a list.

{:find [?customer ?product]
 :where [[?customer :shipAddress ?addr]
         [?addr :zip ?zip]
         [?product :product/weight ?weight]

         [?product :product/price ?price]
         [(Shipping/estimate ?zip ?weight) ?shipCost]
         [(<= ?price ?shipCost)]]}

No joins – all joins are implicit.

Unification unifies all the values of customer to be the same customer as the product ~~…

unification
    [#prolog]
    [operator]

    Essentially a combination of assignment
    and equality.

    vimlinks +/"?- 2 = 3." "https://learnxinyminutes.com/docs/prolog/"

    If both sides are bound (ie, defined),
    check equality.
    2 = 3

    IMPORTANT (the essence of unification):
    If one side is free (ie, undefined),
    assign to match the other side.
    X = 3

    If both sides are free, the assignment is
    remembered. With some luck, one of the two
    sides will eventually be bound, but this
    isn't necessary.
    X = Y are both free, so Prolog remembers
    it. Therefore assigning X will also assign
    Y. IS is another assigment operator in prolog.
    X = Y, X = 2, Z is Y + 3.

    See:
        vim +/"modus ponens" "$NOTES/ws/logic/glossary.txt"

Database is distributed.

Brings power from database (where traditionally it would be found), to the application logic.

  • The application gets its own brain (datomic database copy)
    • The app becomes a peer
    • The db is effectively local

datalog

  • The big advantage over core.logic or prolog
    • The semantics of those are tuple at a time
    • The semantics of datalog is set at a time
      • This means that underneath the hood, entire sets are being merge-joined

Peer implementation

Writing Datomic in Clojure - Rich Hickey - YouTube

Consistency and Scale

Testing

  • Functional tests
  • Simulation-based testing
    • I like.

Querying Datomic

Learning with Datomic

In the markdown file linked to above and in his videos about Datomic Rich Hickey tells us that tells us that mastery comes from always working slightly beyond your comfort/ability zone, pushing it ever forward.

He also designed Datomic to efficiently accrete and distribute knowledge, whilst remaining queryable. It allows you to both offload logic to the database and run logic locally which traditionally would’ve been conducted remotely on the database.

Additional reading

./datomic-stream.png

Complete notes
https://github.com/semiosis/code-org-tidbits/blob/master/datomic/domain-modeling-with-datalog.org

Example: github. New users. There are 3 users here in our system. 3 ids means they are different things. Each row is a datum.

entityattributevalue
11:user/namerichhickey
22:user/nametonsky
33:user/namepithyless

We use pattern-matching for querying (Datomic).

A pattern in this case will be a vector of 3 elements that represents the EAV structure in our database.

Underscore means I don’t care (wildcard).

[11 _ _]

Variables start with a question mark and some name i.e. ?entity.

Annex

I want to mention this excerpt about the origins of surreal numbers because I think there is some allegory here.

Surreal numbers

https://ianopolous.peergos.me/maths/surreal

In 1972, Conway described his system of
numbers to computer scientist Donald Knuth at
Stanford.

Knuth (creator of the $\TeX$ typesetting
system) then went away and wrote a short
novelette introducing these numbers.

It was the first time for a major mathematical
discovery to be published in a work of fiction
first.

Knuth coined the term surreal numbers; taking
“Sur” from the French for “above”.

The surreal numbers satisfy the axioms for a
field (but the question of whether or not they
constitute a field is complicated by the fact
that, collectively, they are too large to form
a set).