Skip to content

correct values for 'label' field and total for 'N'-labels#3

Open
dmitris wants to merge 1 commit intosetu1421:mainfrom
dmitris:patch-1
Open

correct values for 'label' field and total for 'N'-labels#3
dmitris wants to merge 1 commit intosetu1421:mainfrom
dmitris:patch-1

Conversation

@dmitris
Copy link
Copy Markdown

@dmitris dmitris commented Jun 13, 2024

After loading the secrebench.csv in R (sb <- read_csv("secretbench.csv")), I'm seeing the following stats on the label column:

> table(sb$label)

    N     Y 
82393 15086 

The total 82393+15086=97479 is the same as in the README, but the number of "fake/dummy" secrets is, I believe, off by 2. Also, README says that the values for the label column are True and False whereas the dataset contains Y and N.

Double-checking with a short Python program:

#!/usr/bin/env python3

import csv

cnts = {}
with open('secretbench.csv') as f:
    reader = csv.DictReader(f)
    for row in reader:
        cnts[row['label']] = cnts.get(row['label'], 0) + 1
for k, v in cnts.items():
    print(k, v)

which prints:

N 82393
Y 15086

This is the checksum and wc counts on the secretbench.csv file that I'm using:

$ sha256sum secretbench.csv && wc secretbench.csv
70e4f3faa30df37cfcf002cbcdc1c56c55917f2326854e807312dd531f360486  secretbench.csv
  167067  449224 45990860 secretbench.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant