Skip to content

Latest commit

 

History

History
183 lines (138 loc) · 6.48 KB

File metadata and controls

183 lines (138 loc) · 6.48 KB

TEF DEVLOG

i.e. brainstorming about how to formalize stuff in the TEF format without, hopefully, turning it into an ad-hoc mess like YAML.

2025-10-07

How to specify value encodings?

I have "-ref" suffix to indicate that the TEF attribute value is a URI reference rather than the literal value, but that has always felt like a hack.

'$' has always felt appropriate for indicating...encoding information.

cool-pages/togos-music$ref: https://www.nuke24.net/music/
nice-quotes/and-i-am$base64: UGVvcGxlIGFyZSBzdHVwaWQgYW5kIEkgYW0gYSBwZXJzb24u

Could use ';' to separate more info about the value.

nice-quotes/and-i-am$length=36;lang=en;base64: UGVvcGxlIGFyZSBzdHVwaWQgYW5kIEkgYW0gYSBwZXJzb24u

Alternate ideas:

Funkify the colon, e.g. double it, or :* or something.

foo:* base64

Different character than $, like maybe ^

nice-quotes/and-i-am^length=36;lang=en;base64: UGVvcGxlIGFyZSBzdHVwaWQgYW5kIEkgYW0gYSBwZXJzb24u

That might be easier to spot than the $, actually.

Some RDF format uses "encodedstring"^^datatypeuri. See https://www.w3.org/TR/swbp-xsch-datatypes/#sec-recs-dt.

Let's go with ^ for now, indicating that what follows is ;-separated metadata about the value, and entries without = indicate encodings that were applied, and should be unapplied in reverse order

e.g., the following are all equivalent, assuming a particular definition of rot13 and base64:

foo/bar/baz^rot13;base64: Q3JiY3lyIG5lciBmZ2hjdnEgbmFxIFYgbnogbiBjcmVmYmEuCg==
foo/bar/baz^rot13: Crbcyr ner fghcvq naq V nz n crefba.
foo/bar/baz: People are stupid and I am a person.

More alternatives

Pipe?

foo/bar/baz|rot13|base64: Q3JiY3lyIG5lciBmZ2hjdnEgbmFxIFYgbnogbiBjcmVmYmEuCg==

Pro: Very literal interpretation; the value of foo/bar/baz, when piped through rot13 abd base64, gives the following string.

Pro: Matches how Smarty (PHP templating library) indicates transformations, which is one of the few things I like about Smarty.

Con: ...maybe I'd like to use pipe for something else?

Meaning of the encoding string

Should be a URI or translatable to a URI. A full URI must itself be URI-encoded to become part of the attribute string. : does not need to be encoded, so xmlns-style shorthands can be used without encoding. e.g.

# Assuming `xsd:` has been defined as shorthand for `http://www.w3.org/2001/XMLSchema#`:
foo/bar|xsd:base64Binary: Q3JiY3lyIG5lciBmZ2hjdnEgbmFxIFYgbnogbiBjcmVmYmEuCg==
# Or, written the long way:
foo/bar|http%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23base64Binary: Q3JiY3lyIG5lciBmZ2hjdnEgbmFxIFYgbnogbiBjcmVmYmEuCg==

When written with |, having to encode the datatype URI bothers me for some reason that it didn't when using $ or ^. Maybe because |...isn't a valid URI character, and is therefore already 'outside' of the URI, and therefore the things it separates should be URIs. Well in that case we could say:

foo/bar|http://www.w3.org/2001/XMLSchema#base64Binary: Q3JiY3lyIG5lciBmZ2hjdnEgbmFxIFYgbnogbiBjcmVmYmEuCg==

How to indicate specific dialect/version within a TEF file?

TEF needs:

  • a pseudo-header to indicate parsing rules for all that follows so that multiple files can be concatenated together and still handled appropriately.
    • Maybe =tef:file? That is what I have always thought of, anyway.
    • =tef:begin and =tef:end would allow nexted sub-regions. The explicit =tef:end could also be used to signal that the following line (which might be something like #format ...) is not part of the previous entry's content.
  • a way to indicate dialect.
    • Maybe this is what #format is for, though that doesn't seem to play nicely with concatenated-together TEF files.

I guess if I want to be able to differentiate different versions of the format within one big file that is made by concatenating other files, each of which could have a different format... I need to come up with a meta-format that describes how that's supposed to work.

Maybe the meta-format could look like TEF, which would make sense as TEF is already trying to solve the text-files-within-a-text-file problem, and it would be silly to solve the next level of that same problem in a completely different way.

The point of this train of thought, I think, is to 'flatten' that next level by including metadata about the format inside each tef file, so that they can be 'self-describing'. Sort of like an XML <!DOCTYPE.

=tef:sourced
# Provide metadata about the TEF document
# that starts *after* this entry, or possibly
# where this entry's content would be.
# TEF:version applies to this entry, but will serve as the default for later ones.
# Version '555' is chosen as an example 'not actually defined' version,
# for brainstorming purposes.
tef:version: 555.1.1
source-document-name: foo.tef
source-line: 101
# sourced-sections logically have no content.
# If there is 'content', it is actually the beginning of the included document.

=tef:file
# tef:file provides metadata about the stream that it appears in,
# and it serves the same purpose as bare 'file-level headers' would,
# but can appear more than once.
tef:version: 555.1.0
tefns:xsd: http://www.w3.org/2001/XMLSchema#

Maybe I could be more explicit about the subject.

=tef:parsing-directives
# The tef:parsing-directives entry's content
# is a bunch of parsing directives, to be
# applied as they are encountered in the source stream.

# Indicate the TEF version of the following content.
# This should also reset any options.
#tef:version 555.1.0

#tefns:dcterms https://purl.org/dc/terms/ 

# The '#line' directive has the exact same meaning as it would
# to the C preprocessor; i.e. the following line should be considered
# line 123 from the indicated source file.
#line 123 "mystuff.tef"
=tef:about this-document
# This entry describes this TEF document
# Actual attributes are made-up examples.
title: Very cool TEF document with metadata and parsing directives
processor/source-to-turtle: http://example.com/my-tef-to-turtle-converter

I kind of like that. parsing-directives is explicitly giving a series of instructions to the parser without shoehorning them into attributes, while tef:about this-document provides a way to indicate metadata about the document that does take the form of attributes, while also providing a way to talk about other things, serving the same role as an rdf:about attribute in an XML+RDF document.

See hash-line-directive about #line.