-
Notifications
You must be signed in to change notification settings - Fork 1
Directory Serialization
TOGoS edited this page Sep 13, 2010
·
5 revisions
Directories are converted to blobs by converting to RDF in a specific way.
Here is an example of a serialized directory:
<Directory xmlns="http://ns.nuke24.net/ContentCouch/" xmlns:dc="http://purl.org/dc/terms/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <entries rdf:parseType="Collection"> <DirectoryEntry> <name>build.sh</name> <size>121</size> <target rdf:resource="urn:sha1:RDPOYKPYC4SADHPWTCSBI5GUMTCCMDBV"/> <targetType>Blob</targetType> <dc:modified>2009-02-20 09:54:17</dc:modified> </DirectoryEntry> <DirectoryEntry> <name>ext-lib</name> <target rdf:resource="x-parse-rdf:urn:sha1:CBKXM6DYVMYNYF6YOVQQIRX5LAGWXRBL"/> <targetType>Directory</targetType> </DirectoryEntry> <DirectoryEntry> <name>somethin.jpg</name> <size>889793</size> <target rdf:resource="urn:sha1:ZZZM7CV24GRYUNZFONLZP3RFVZ2MFJTX"/> <targetType>Blob</targetType> <dc:modified>2008-08-03 21:46:13</dc:modified> </DirectoryEntry> <DirectoryEntry> <name>src</name> <target rdf:resource="x-parse-rdf:urn:sha1:PTW4FTUBJEJQ2W6ASH3OXEJ5MU6AXWMW"/> <targetType>Directory</targetType> </DirectoryEntry> </entries> </Directory>
Some of the specifics of this rdf-ification are:
- Directory entries are sorted by name
-
RDF attributes are sorted by their full name (e.g.
dc:modifiedexpands tohttp://purl.org/dc/terms/modified, and therefore comes aftername, which expands tohttp://ns.nuke24.net/ContentCouch/name) - XML namespace declarations are sorted by abbreviation – xmlns:dc comes before xmlns:rdf
- Each level of sub-elements is indented one tab
- Certain namespaces always map to a standard set of abbreviations (but these should only be listed if the namespaces are actually used):
-
xmlns="http://ns.nuke24.net/ContentCouch/name"– the default namespace xmlns:dc="http://purl.org/dc/terms/"xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
-
- Do not store modification time of subdirectories, since mtime of directories is dubiously meaningful and including it only increases the chance that otherwise identical directories will store as separate objects.
ContentCouch should be able to interpret RDF that is not encoded exactly by these standards, but by following them we can ensure that identical directories are encoded the same way, and thereby reduce the number of different representations of the same object that have to be stored.