- Generate a list of unique article_ids
- Aggregate attributes associated with each article_id
Run python3 unique_article.py
Otherwise the base python2.7 will scream at you.
$ gzip -cd enwiki-20080103.good.gz | head -n 1
$ gzip -cd unique_all_articleids_3.gz | head -n 10