i.e. brainstorming about how to formalize stuff in the TEF format without, hopefully, turning it into an ad-hoc mess like YAML.
I have "-ref" suffix to indicate that the TEF attribute value is a URI reference rather than the literal value, but that has always felt like a hack.
'$' has always felt appropriate for indicating...encoding information.
cool-pages/togos-music$ref: https://www.nuke24.net/music/
nice-quotes/and-i-am$base64: UGVvcGxlIGFyZSBzdHVwaWQgYW5kIEkgYW0gYSBwZXJzb24u
Could use ';' to separate more info about the value.
nice-quotes/and-i-am$length=36;lang=en;base64: UGVvcGxlIGFyZSBzdHVwaWQgYW5kIEkgYW0gYSBwZXJzb24u
Alternate ideas:
Funkify the colon, e.g. double it, or :* or something.
foo:* base64
Different character than $, like maybe ^
nice-quotes/and-i-am^length=36;lang=en;base64: UGVvcGxlIGFyZSBzdHVwaWQgYW5kIEkgYW0gYSBwZXJzb24u
That might be easier to spot than the $, actually.
Some RDF format uses "encodedstring"^^datatypeuri. See https://www.w3.org/TR/swbp-xsch-datatypes/#sec-recs-dt.
Let's go with ^ for now, indicating that what follows
is ;-separated metadata about the value, and entries without =
indicate encodings that were applied, and should be unapplied in reverse order
e.g., the following are all equivalent, assuming a particular
definition of rot13 and base64:
foo/bar/baz^rot13;base64: Q3JiY3lyIG5lciBmZ2hjdnEgbmFxIFYgbnogbiBjcmVmYmEuCg==
foo/bar/baz^rot13: Crbcyr ner fghcvq naq V nz n crefba.
foo/bar/baz: People are stupid and I am a person.
Pipe?
foo/bar/baz|rot13|base64: Q3JiY3lyIG5lciBmZ2hjdnEgbmFxIFYgbnogbiBjcmVmYmEuCg==
Pro: Very literal interpretation; the value of foo/bar/baz, when piped through rot13 abd base64, gives the following string.
Pro: Matches how Smarty (PHP templating library) indicates transformations, which is one of the few things I like about Smarty.
Con: ...maybe I'd like to use pipe for something else?
Should be a URI or translatable to a URI.
A full URI must itself be URI-encoded to become part of the attribute string.
: does not need to be encoded, so xmlns-style shorthands can be used without encoding.
e.g.
# Assuming `xsd:` has been defined as shorthand for `http://www.w3.org/2001/XMLSchema#`:
foo/bar|xsd:base64Binary: Q3JiY3lyIG5lciBmZ2hjdnEgbmFxIFYgbnogbiBjcmVmYmEuCg==
# Or, written the long way:
foo/bar|http%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23base64Binary: Q3JiY3lyIG5lciBmZ2hjdnEgbmFxIFYgbnogbiBjcmVmYmEuCg==
When written with |, having to encode the datatype URI bothers me for some reason
that it didn't when using $ or ^. Maybe because |...isn't a valid URI character,
and is therefore already 'outside' of the URI, and therefore the things it separates
should be URIs. Well in that case we could say:
foo/bar|http://www.w3.org/2001/XMLSchema#base64Binary: Q3JiY3lyIG5lciBmZ2hjdnEgbmFxIFYgbnogbiBjcmVmYmEuCg==
TEF needs:
- a pseudo-header to indicate parsing rules for all that follows
so that multiple files can be concatenated together and
still handled appropriately.
- Maybe
=tef:file? That is what I have always thought of, anyway. =tef:beginand=tef:endwould allow nexted sub-regions. The explicit=tef:endcould also be used to signal that the following line (which might be something like#format ...) is not part of the previous entry's content.
- Maybe
- a way to indicate dialect.
- Maybe this is what
#formatis for, though that doesn't seem to play nicely with concatenated-together TEF files.
- Maybe this is what
I guess if I want to be able to differentiate different versions of the format within one big file that is made by concatenating other files, each of which could have a different format... I need to come up with a meta-format that describes how that's supposed to work.
Maybe the meta-format could look like TEF, which would make sense as TEF is already trying to solve the text-files-within-a-text-file problem, and it would be silly to solve the next level of that same problem in a completely different way.
The point of this train of thought, I think, is to 'flatten' that next
level by including metadata about the format inside each tef file,
so that they can be 'self-describing'. Sort of like an XML <!DOCTYPE.
=tef:sourced
# Provide metadata about the TEF document
# that starts *after* this entry, or possibly
# where this entry's content would be.
# TEF:version applies to this entry, but will serve as the default for later ones.
# Version '555' is chosen as an example 'not actually defined' version,
# for brainstorming purposes.
tef:version: 555.1.1
source-document-name: foo.tef
source-line: 101
# sourced-sections logically have no content.
# If there is 'content', it is actually the beginning of the included document.
=tef:file
# tef:file provides metadata about the stream that it appears in,
# and it serves the same purpose as bare 'file-level headers' would,
# but can appear more than once.
tef:version: 555.1.0
tefns:xsd: http://www.w3.org/2001/XMLSchema#
Maybe I could be more explicit about the subject.
=tef:parsing-directives
# The tef:parsing-directives entry's content
# is a bunch of parsing directives, to be
# applied as they are encountered in the source stream.
# Indicate the TEF version of the following content.
# This should also reset any options.
#tef:version 555.1.0
#tefns:dcterms https://purl.org/dc/terms/
# The '#line' directive has the exact same meaning as it would
# to the C preprocessor; i.e. the following line should be considered
# line 123 from the indicated source file.
#line 123 "mystuff.tef"
=tef:about this-document
# This entry describes this TEF document
# Actual attributes are made-up examples.
title: Very cool TEF document with metadata and parsing directives
processor/source-to-turtle: http://example.com/my-tef-to-turtle-converter
I kind of like that. parsing-directives is explicitly giving a
series of instructions to the parser without shoehorning them into
attributes, while tef:about this-document provides a way to indicate
metadata about the document that does take the form of attributes,
while also providing a way to talk about other things,
serving the same role as an rdf:about attribute in an XML+RDF document.
See hash-line-directive about #line.