Skip to content

Directory Serialization

TOGoS edited this page Sep 13, 2010 · 5 revisions

Directories are converted to blobs by converting to RDF in a specific way.

Here is an example of a serialized directory:

<Directory xmlns="http://ns.nuke24.net/ContentCouch/" xmlns:dc="http://purl.org/dc/terms/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<entries rdf:parseType="Collection">
		<DirectoryEntry>
			<name>build.sh</name>
			<size>121</size>
			<target rdf:resource="urn:sha1:RDPOYKPYC4SADHPWTCSBI5GUMTCCMDBV"/>
			<targetType>Blob</targetType>
			<dc:modified>2009-02-20 09:54:17</dc:modified>
		</DirectoryEntry>
		<DirectoryEntry>
			<name>ext-lib</name>
			<target rdf:resource="x-parse-rdf:urn:sha1:CBKXM6DYVMYNYF6YOVQQIRX5LAGWXRBL"/>
			<targetType>Directory</targetType>
		</DirectoryEntry>
		<DirectoryEntry>
			<name>somethin.jpg</name>
			<size>889793</size>
			<target rdf:resource="urn:sha1:ZZZM7CV24GRYUNZFONLZP3RFVZ2MFJTX"/>
			<targetType>Blob</targetType>
			<dc:modified>2008-08-03 21:46:13</dc:modified>
		</DirectoryEntry>
		<DirectoryEntry>
			<name>src</name>
			<target rdf:resource="x-parse-rdf:urn:sha1:PTW4FTUBJEJQ2W6ASH3OXEJ5MU6AXWMW"/>
			<targetType>Directory</targetType>
		</DirectoryEntry>
	</entries>
</Directory>

Some of the specifics of this rdf-ification are:

  • Directory entries are sorted by name
  • RDF attributes are sorted by their full name (e.g. dc:modified expands to http://purl.org/dc/terms/modified, and therefore comes after name, which expands to http://ns.nuke24.net/ContentCouch/name)
  • XML namespace declarations are sorted by abbreviation – xmlns:dc comes before xmlns:rdf
  • Each level of sub-elements is indented one tab
  • Certain namespaces always map to a standard set of abbreviations (but these should only be listed if the namespaces are actually used):
    • xmlns="http://ns.nuke24.net/ContentCouch/name" – the default namespace
    • xmlns:dc="http://purl.org/dc/terms/"
    • xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  • Do not store modification time of subdirectories, since mtime of directories is dubiously meaningful and including it only increases the chance that otherwise identical directories will store as separate objects.

ContentCouch should be able to interpret RDF that is not encoded exactly by these standards, but by following them we can ensure that identical directories are encoded the same way, and thereby reduce the number of different representations of the same object that have to be stored.

Clone this wiki locally