Releases · anwala/bloc

For sim and top_ngrams subcommands: patched key error when empty BLOC string encountered
Implemented default --ngram value when --token-pattern set to bigram and word

Assets 2

31 Oct 16:20

anwala

v1.1.0

7192e87

bloc-v1.1.0

Major updates: implemented `sim` and `top_ngrams` subcommands:

`sim` (compare the similarity across multiple users):

The following command generates BLOC strings for multiple accounts, @FoxNews, @CNN, @POTUS, @SpeakerPelosi, @GOPLeader, @GenerateACat, and @storygraphbot. Next, it tokenizes the string using pauses ([^□⚀⚁⚂⚃⚄⚅. |()*]+|[□⚀⚁⚂⚃⚄⚅.]). Next, it generates TF-IDF vectors for all accounts using the BLOC words as features. Next, it computes (average) cosine similarity across all pairs, and writes the output to accounts_sim.jsonl:

$ bloc sim -o accounts_sim.jsonl --token-pattern=word --bloc-alphabets action content_syntactic change -m 4 --bearer-token="foo" FoxNews CNN POTUS SpeakerPelosi GOPLeader GenerateACat storygraphbot

Partial output of cosine similarity values across all pairs of accounts in descending order:

  ...
  Cosine sim,
  0.9325: FoxNews vs. CNN
  0.8841: POTUS vs. SpeakerPelosi
  0.6516: SpeakerPelosi vs. GOPLeader
  0.5752: CNN vs. POTUS
  0.5680: POTUS vs. GOPLeader
  0.5023: FoxNews vs. POTUS
  0.3969: CNN vs. SpeakerPelosi
  0.3862: CNN vs. GOPLeader
  0.3483: FoxNews vs. SpeakerPelosi
  0.2945: FoxNews vs. GOPLeader
  0.2590: POTUS vs. GenerateACat
  0.2123: GOPLeader vs. GenerateACat
  0.2041: SpeakerPelosi vs. GenerateACat
  0.1587: CNN vs. GenerateACat
  0.1540: SpeakerPelosi vs. storygraphbot
  0.1403: FoxNews vs. GenerateACat
  0.1386: POTUS vs. storygraphbot
  0.1303: GOPLeader vs. storygraphbot
  0.0724: GenerateACat vs. storygraphbot
  0.0480: CNN vs. storygraphbot
  0.0386: FoxNews vs. storygraphbot
  ------
  0.3379: Average cosine sim
  
write_output(): wrote: accounts_sim.jsonl

Full output which includes ranking of features that contributed the most toward the similarity of account pairs:

  
Features importance,
  FoxNews vs. CNN, (score, feature):
    1. 0.3356 T
    2. 0.3317 Ut
    3. 0.2496 ⚀
    4. 0.0139 □
    5. 0.0006 TT
    6. 0.0005 mUt
    7. 0.0004 EUt
    8. 0.0000 Emφt
    9. 0.0000 EmUt
   10. 0.0000 Emt

  POTUS vs. SpeakerPelosi, (score, feature):
    1. 0.3500 t
    2. 0.2129 T
    3. 0.1910 ⚁
    4. 0.0795 s
    5. 0.0188 ⚀
    6. 0.0149 Ut
    7. 0.0104 Et
    8. 0.0016 Tπ
    9. 0.0012 mt
   10. 0.0011 Eφt

  SpeakerPelosi vs. GOPLeader, (score, feature):
    1. 0.1869 s
    2. 0.1754 ⚁
    3. 0.1070 t
    4. 0.0698 T
    5. 0.0257 Ht
    6. 0.0190 ⚀
    7. 0.0147 r
    8. 0.0121 Ut
    9. 0.0104 Hmt
   10. 0.0080 Et

  CNN vs. POTUS, (score, feature):
    1. 0.3535 T
    2. 0.1377 ⚀
    3. 0.0452 Ut
    4. 0.0251 s
    5. 0.0060 Eφt
    6. 0.0045 ⚁
    7. 0.0027 EUφt
    8. 0.0003 Et
    9. 0.0001 EUt
   10. 0.0001 Emφt

  POTUS vs. GOPLeader, (score, feature):
    1. 0.1440 ⚁
    2. 0.1110 T
    3. 0.1108 t
    4. 0.1010 s
    5. 0.0655 ⚀
    6. 0.0129 Et
    7. 0.0081 Eφt
    8. 0.0070 Ut
    9. 0.0056 r
   10. 0.0006 Emφt

  FoxNews vs. POTUS, (score, feature):
    1. 0.3212 T
    2. 0.1176 ⚀
    3. 0.0633 Ut
    4. 0.0002 EUt
    5. 0.0000 Emφt
    6. 0.0000 Emt
    7. 0.0000 WWW+
    8. 0.0000 www+
    9. 0.0000 E
   10. 0.0000 EEE+Hmmt

  CNN vs. SpeakerPelosi, (score, feature):
    1. 0.2224 T
    2. 0.0782 Ut
    3. 0.0464 s
    4. 0.0399 ⚀
    5. 0.0055 ⚁
    6. 0.0016 □
    7. 0.0009 Eφt
    8. 0.0008 mUt
    9. 0.0004 mmUt
   10. 0.0002 Emt

  CNN vs. GOPLeader, (score, feature):
    1. 0.1390 ⚀
    2. 0.1160 T
    3. 0.0589 s
    4. 0.0365 Ut
    5. 0.0161 □
    6. 0.0065 Eφt
    7. 0.0045 EUφt
    8. 0.0042 ⚁
    9. 0.0012 mUt
   10. 0.0011 Emφt

  FoxNews vs. SpeakerPelosi, (score, feature):
    1. 0.2021 T
    2. 0.1096 Ut
    3. 0.0341 ⚀
    4. 0.0018 □
    5. 0.0003 λ
    6. 0.0003 mUt
    7. 0.0001 EUt
    8. 0.0001 Emt
    9. 0.0000 Emφt
   10. 0.0000 w

  FoxNews vs. GOPLeader, (score, feature):
    1. 0.1187 ⚀
    2. 0.1054 T
    3. 0.0512 Ut
    4. 0.0178 □
    5. 0.0004 mUt
    6. 0.0004 Emφt
    7. 0.0004 EUt
    8. 0.0002 λ
    9. 0.0001 EmUt
   10. 0.0001 Emt

  POTUS vs. GenerateACat, (score, feature):
    1. 0.0862 ⚁
    2. 0.0746 Et
    3. 0.0653 T
    4. 0.0329 ⚀
    5. 0.0000 Emt
    6. 0.0000 E
    7. 0.0000 EEE+Hmmt
    8. 0.0000 EEE+Ut
    9. 0.0000 EEE+mmmt
   10. 0.0000 EEE+mmt

  GOPLeader vs. GenerateACat, (score, feature):
    1. 0.0791 ⚁
    2. 0.0568 Et
    3. 0.0332 ⚀
    4. 0.0216 □
    5. 0.0214 T
    6. 0.0001 Emt
    7. 0.0000 E
    8. 0.0000 EEE+Hmmt
    9. 0.0000 EEE+Ut
   10. 0.0000 EEE+mmmt

  SpeakerPelosi vs. GenerateACat, (score, feature):
    1. 0.1050 ⚁
    2. 0.0461 Et
    3. 0.0411 T
    4. 0.0095 ⚀
    5. 0.0022 □
    6. 0.0001 Emt
    7. 0.0000 E
    8. 0.0000 EEE+Hmmt
    9. 0.0000 EEE+Ut
   10. 0.0000 EEE+mmmt

  CNN vs. GenerateACat, (score, feature):
    1. 0.0698 ⚀
    2. 0.0682 T
    3. 0.0169 □
    4. 0.0025 ⚁
    5. 0.0012 Et
    6. 0.0000 Emt
    7. 0.0000 E
    8. 0.0000 EEE+Hmmt
    9. 0.0000 EEE+Ut
   10. 0.0000 EEE+mmmt

  SpeakerPelosi vs. storygraphbot, (score, feature):
    1. 0.1379 ⚁
    2. 0.0050 ⚀
    3. 0.0049 T
    4. 0.0023 π
    5. 0.0020 Tπ
    6. 0.0007 ⚂
    7. 0.0006 Tπππ+
    8. 0.0003 Tππ
    9. 0.0003 Tπππ
   10. 0.0000 E

  FoxNews vs. GenerateACat, (score, feature):
    1. 0.0620 T
    2. 0.0596 ⚀
    3. 0.0186 □
    4. 0.0000 Emt
    5. 0.0000 E
    6. 0.0000 EEE+Hmmt
    7. 0.0000 EEE+Ut
    8. 0.0000 EEE+mmmt
    9. 0.0000 EEE+mmt
   10. 0.0000 EEE+mt

  POTUS vs. storygraphbot, (score, feature):
    1. 0.1132 ⚁
    2. 0.0172 ⚀
    3. 0.0078 T
    4. 0.0003 Tπ
    5. 0.0001 Tπππ
    6. 0.0000 Tππ
    7. 0.0000 Tπππ+
    8. 0.0000 E
    9. 0.0000 EEE+Hmmt
   10. 0.0000 EEE+Ut

  GOPLeader vs. storygraphbot, (score, feature):
    1. 0.1040 ⚁
    2. 0.0173 ⚀
    3. 0.0064 π
    4. 0.0026 T
    5. 0.0001 ππ
    6. 0.0000 ⚂
    7. 0.0000 E
    8. 0.0000 EEE+Hmmt
    9. 0.0000 EEE+Ut
   10. 0.0000 EEE+mmmt

  GenerateACat vs. storygraphbot, (score, feature):
    1. 0.0622 ⚁
    2. 0.0087 ⚀
    3. 0.0015 T
    4. 0.0000 E
    5. 0.0000 EEE+Hmmt
    6. 0.0000 EEE+Ut
    7. 0.0000 EEE+mmmt
    8. 0.0000 EEE+mmt
    9. 0.0000 EEE+mt
   10. 0.0000 EEE+t

  CNN vs. storygraphbot, (score, feature):
    1. 0.0364 ⚀
    2. 0.0082 T
    3. 0.0033 ⚁
    4. 0.0001 TT
    5. 0.0000 E
    6. 0.0000 EEE+Hmmt
    7. 0.0000 EEE+Ut
    8. 0.0000 EEE+mmmt
    9. 0.0000 EEE+mmt
   10. 0.0000 EEE+mt

  FoxNews vs. storygraphbot, (score, feature):
    1. 0.0311 ⚀
    2. 0.0074 T
    3. 0.0000 TT
    4. 0.0000 E
    5. 0.0000 EEE+Hmmt
    6. 0.0000 EEE+Ut
    7. 0.0000 EEE+mmmt
    8. 0.0000 EEE+mmt
    9. 0.0000 EEE+mt
   10. 0.0000 EEE+t

Cosine sim,
  0.9325: FoxNews vs. CNN
  0.8841: POTUS vs. SpeakerPelosi
  0.6516: SpeakerPelosi vs. GOPLeader
  0.5752: CNN vs. POTUS
  0.5680: POTUS vs. GOPLeader
  0.5023: FoxNews vs. POTUS
  0.3969: CNN vs. SpeakerPelosi
  0.3862: CNN vs. GOPLeader
  0.3483: FoxNews vs. SpeakerPelosi
  0.2945: FoxNews vs. GOPLeader
  0.2590: POTUS vs. GenerateACat
  0.2123: GOPLeader vs. GenerateACat
  0.2041: SpeakerPelosi vs. GenerateACat
  0.1587: CNN vs. GenerateACat
  0.1540: SpeakerPelosi vs. storygraphbot
  0.1403: FoxNews vs. GenerateACat
  0.1386: POTUS vs. storygraphbot
  0.1303: GOPLeader vs. storygraphbot
  0.0724: GenerateACat vs. storygraphbot
  0.0480: CNN vs. storygraphbot
  0.0386: FoxNews vs. storygraphbot
  ------
  0.3379: Average cosine sim

write_output(): wrote: accounts_sim.jsonl

`top_ngrams` (generate list of most frequent BLOC words):

The following command generates the top BLOC words for the same accounts in Example 3. Similar to Example 3, after generating BLOC strings, it tokenizes using pauses, print the top BLOC words for individual accounts and across all accounts, and writes the output to top_bloc_words.json:

$ bloc top_ngrams -o top_bloc_words.json --token-pattern=word --bloc-alphabets action content_syntactic change -m 4 --bearer-token="foo" FoxNews CNN POTUS SpeakerPelosi GOPLeader GenerateACat storygraphbot

Partial output of top BLOC words across all accounts ranked with their document frequencies (fraction of accounts that used a word):

  ...
  Top 10 ngrams across all users, (document freq. DF, word):
    1.   1.0000 T (action)
    2.   0.8571 Emt (content_syntactic)
    3.   0.7143 Ut (content_syntactic)
    4.   0.7143 EUt (content_syntactic)
    5.   0.7143 Emφt (content_syntactic)
    6.   0.7143 Et (content_syntactic)
    7.   0.5714 mUt (content_syntactic)
    dumpJsonToFile(), wrote: top_bloc_words.json

Full output of top BLOC words for individual (ranked by term frequency) and across all accounts (ranked by document frequency):

  print_top_ngrams():

  Top 10 ngrams for user FoxNews, (term freq. TF, word):
    1.   0.3239 T (action)
    2.   0.3130 Ut (content_syntactic)
    3.   0.0125 EUt (content_syntactic)
    4.   0.0067 λ (change)
    5.   0.0050 TT (action)
    6.   0.0050 mUt (c...

Assets 2

28 Oct 17:42

anwala

v1.0.0

1c852e7

bloc-v1.0.0

Official release of bloc Python tool

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Major updates: implemented `sim` and `top_ngrams` subcommands:

`sim` (compare the similarity across multiple users):

`top_ngrams` (generate list of most frequent BLOC words):

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Releases: anwala/bloc

bloc-v1.2.1

Uh oh!

bloc-v1.2.0

Uh oh!

bloc-v1.1.1

Uh oh!

bloc-v1.1.0

Major updates: implemented sim and top_ngrams subcommands:

sim (compare the similarity across multiple users):

top_ngrams (generate list of most frequent BLOC words):

Uh oh!

bloc-v1.0.0

Uh oh!

Major updates: implemented `sim` and `top_ngrams` subcommands:

`sim` (compare the similarity across multiple users):

`top_ngrams` (generate list of most frequent BLOC words):