Skip to content

xiaotaichai/Wikipedia-Edits-Distributed-Computing

Repository files navigation

Wikipedia-Edits-Distributed-Computing

  • Generate a list of unique article_ids
  • Aggregate attributes associated with each article_id

Remember

Run python3 unique_article.py
Otherwise the base python2.7 will scream at you.

How to read head

$ gzip -cd enwiki-20080103.good.gz | head -n 1
$ gzip -cd unique_all_articleids_3.gz | head -n 10

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages