Skip to content

Latest commit

 

History

History
208 lines (161 loc) · 8.55 KB

File metadata and controls

208 lines (161 loc) · 8.55 KB

The Mediachain testnet

Mediachain Labs has deployed a publicly accessible test network for use by the general public.

The primary purpose of the testnet is to give developers an opportunity to get a feel for the broader architecture of Mediachain as a network and begin experimenting with a shared metadata set.

We encourage all contributions, especially questions! As the components of the mediachain network are under constant development, it can be hard to know where to begin. Please reach out to us on Slack if you have questions about how the pieces fit together, or if you have trouble interacting with the testnet.

Component Breakdown

As was detailed in RFCs 1 and 2, as well as the Mediachain Labs Blog, Mediachain is broken into a few core services:

  • Transactor: Responsible for maintaining consensus on the journal
  • Indexer: Responsible for making data within Mediachain explorable via text and media-based search
  • CLI: Used to ingest and retrieve data

Mediachain Labs will be administrating its own instances of the Transactor and Indexer for public use, so developers needn't worry about running them on their own.

For those interested in running their own testnet, please see the self-hosting documentation.

Known Limitations

  • Because IPLD isn't quite production ready yet, we're using Amazon DynamoDB (with multiaddress, forward-compatible content-addressed keys) as a stand-in. Unstructured data (raw metadata and thumbnails) is still published to IPFS, unless the CLI is passed the --disable-ipfs flag.

  • Maximum write rate for the network is currently limited to ~20 concurrent writers (approx 150 artefact insertions/sec)

  • Because Raft is not byzantine-tolerant, we're not accepting 3rd party transactors into the quorum at this time (blog post about this soon)

  • Only creation and reference cells are supported in the client, so you can create new entities but can't update existing (this will be addressed very soon)

  • Translators are currently "black boxes" in python, i.e. there's no DSL

Client

You probably want to start here! The client is installable with pip, preferably into a virtualenv

$ virtualenv venv
$ source venv/bin/activate
(venv) $ pip install mediachain-client

Make sure you have a recent pip, at least 8.x. If pip --version reports a lower version, you can update pip with itself: pip install -U pip.

You will also need to install IPFS and run ipfs daemon.

# OS X
$ wget https://dist.ipfs.io/go-ipfs/v0.4.2/go-ipfs_v0.4.2_darwin-amd64.tar.gz
# Linux
$ wget https://dist.ipfs.io/go-ipfs/v0.4.2/go-ipfs_v0.4.2_linux-amd64.tar.gz

$ tar xvfz go-ipfs.tar.gz
$ mv go-ipfs/ipfs /usr/local/bin/ipfs
$ ipfs daemon
Initializing daemon...
Swarm listening on /ip4/127.0.0.1/tcp/4001
Swarm listening on /ip4/192.168.0.2/tcp/38463
Swarm listening on /ip4/192.168.1.248/tcp/4001
Swarm listening on /ip6/::1/tcp/4001
API server listening on /ip4/127.0.0.1/tcp/5001
Gateway (readonly) server listening on /ip4/127.0.0.1/tcp/8080
Daemon is ready

Reading

You should then be able to immediately pull down the statement chain for an artefact:

$ mediachain get QmVwwiMVDH7umVSd3vdbwGS3WmFX2axhJCTpbfU4LMPcE8

{
  "metaSource": {
    "@link": "Qmcbo67Ycv6rCREhQYoYeGJzgAJiCZDfyEdtHqdbmTsv6T"
  },
  "meta": {
    "translator": {
      "link": {
        "@link": "QmbDLhgDTYUM88e34P8PgycwwrtSdeZewkv2gLCME3o3mT"
      },
      "id": "getty@QmbDLhgDTYUM88e34P8PgycwwrtSdeZewkv2gLCME3o3mT"
    },
    "raw_ref": {
      "@link": "QmPbhJZazM6vLiNHhjfDGUZnUQ6TzxuYydg8F2mrQHcvxE"
    },
    "data": {
      "artist": "Michael Ochs Archives",
      "collection_name": "Moviepix",
      "title": "Alfred Hitchcock",
      "caption": "LOS ANGELES -  MAY 22, 1956: Director Alfred Hitchcock with actor Jimmy Stewart and actress Doris Day at the premier of 'The Man Who Knew Too Much' in Los Angeles, California. (Photo by Earl Leaf/Michael Ochs Archives/Getty Images)",
      "editorial_source": "Moviepix",
      "keywords": [
        "Vertical",
        "Black And White",
        ...
      ],
      "date_created": "1956-05-22T00:00:00-07:00",
      "_id": "getty_451356503",
      "thumbnail": {
        "binary_asset": true,
        "link": {
          "@link": "QmRWxQ7aHGZX8396ARGwt4GXtNaZyyKR6CziwWBSLryiBy"
        },
        "uri": "http://cache1.asset-cache.net/gc/451356503-director-alfred-hitchcock-with-actor-jimmy-gettyimages.jpg?v=1&c=IWSAsset&k=2&d=GkZZ8bf5zL1ZiijUmxa7QUFN5yfBDADlXbpJ1E1eyY2Njoqu%2f2BqqxBjwB509RuIt8%2b8q1YrWpXJ7oa5%2fEsS%2bA%3d%3d&b=Ng=="
      }
    }
  },
  "type": "artefact",
  "entity": {
    "meta": {
      "translator": {
        "link": {
          "@link": "QmbDLhgDTYUM88e34P8PgycwwrtSdeZewkv2gLCME3o3mT"
        },
        "id": "getty@QmbDLhgDTYUM88e34P8PgycwwrtSdeZewkv2gLCME3o3mT"
      },
      "data": {
        "name": "Michael Ochs Archives"
      }
    },
    "type": "entity"
  }
}

This resolves the chain head pointer, retrieves the parent cells and folds over them to give a complete metadata representation as some nice JSON. Straightforward.

A set of sample ids to query is available on ipfs, and can be retrieved with ipfs get QmXPdt9wmy8ZvpxqgTXoHXcWd7KpxvWGhag8ZunxkRPzEL, or via the http gateway at https://ipfs.io/ipfs/QmXPdt9wmy8ZvpxqgTXoHXcWd7KpxvWGhag8ZunxkRPzEL

For a much larger set, use ipfs get QmWusc71Q4M4dB1UJBzLoRFyFikQ46JkA1xbMX3SxDUibh - this contains roughly 1 million record ids, and due to its size it's much more efficient to retrieve it with the ipfs tool than via the gateway.

Both files contain one id per-line, resolvable with the mediachain get <id> command.

Writing

A core concept in Mediachain is the notion of versioned, lightweight, nondestructive schema translators (see this blog post for a somewhat outdated background treatment). This means that you can import data in arbitrary formats right away without extensive markup or transformation, and then re-process it later with a new version of the translator that "knows" more about the underlying data.

The translators are versioned by the IPFS multihash of the working tree, similar to git trees. This means translators can also be published and retrieved through IPFS.

WARNING right now, translators are simply python code that's imported and executed directly in the main process, without sandboxing. Never execute translators you don't trust! The long-term vision for translators is a data-oriented DSL that can be safely executed as untrusted code, but we're not quite there yet.

$ mediachain ingest translator_name@Qm... target_directory

Please see this page for more on writing and using a translator.

Indexer

The testnet includes a special client known as the Indexer, which ingests mediachain records as they're written to the blockchain and creates a query index that's accessible via a web API. A web-based UI is in progress, but in the meantime, you can issue queries directly by sending json data to http://indexer.mediachain.io/search:

$ curl indexer.mediachain.io/search -d '{"q": "film"}'

As you write new records, they should appear in the search results when you search for keywords contained in their metadata.

If you're interested in running the indexer locally, please see the self-hosting instructions. Note that you don't have to run your own testnet to have a local indexer. The default configuration will connect to the public testnet and create a local index that you can query.

Transactor

The truly adventurous can start their own testnet. This requires spinning up at least 3 transactors, a facade server (copycat pojo -> gRPC translation) and a datastore. Check out the overview for compilation and launch instructions. You can also get a good feel for the process by taking a look at the provisioning scripts. Come to #tech in our Slack if you want to try, and we'll be happy to help!