metacforecastbench/CITATION.cff at main · kamman2/metacforecastbench · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
cff-version: 1.2.0
title: "ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities"
message: "If you use this software, please cite it as below."
authors:
  - family-names: Karger
    given-names: Ezra
  - family-names: Bastani
    given-names: Houtan
  - family-names: Yueh-Han
    given-names: Chen
  - family-names: Jacobs
    given-names: Zachary
  - family-names: Halawi
    given-names: Danny
  - family-names: Zhang
    given-names: Fred
  - family-names: Tetlock
    given-names: Philip E.
repository-code: https://github.com/forecastingresearch/forecastbench
url: https://www.forecastbench.org/
repository: https://github.com/forecastingresearch/forecastbench-datasets
abstract: >-
  Forecasts of future events are essential inputs into
  informed decision-making. Machine learning (ML) systems
  have the potential to deliver forecasts at scale, but
  there is no framework for evaluating the accuracy of ML
  systems on a standardized set of forecasting questions. To
  address this gap, we introduce ForecastBench: a dynamic
  benchmark that evaluates the accuracy of ML systems on an
  automatically generated and regularly updated set of 1,000
  forecasting questions. To avoid any possibility of data
  leakage, ForecastBench is comprised solely of questions
  about future events that have no known answer at the time
  of submission. We quantify the capabilities of current ML
  systems by collecting forecasts from expert (human)
  forecasters, the general public, and LLMs on a random
  subset of questions from the benchmark (N=200). While LLMs
  have achieved super-human performance on many benchmarks,
  they perform less well here: expert forecasters outperform
  the top-performing LLM (p-value<0.001). We display system
  and human scores in a public leaderboard at
  https://www.forecastbench.org.
keywords:
  - machine learning
  - forecasting
  - benchmarking
  - artificial intelligence
  - decision-making
license: MIT
version: 1.0.0
preferred-citation:
  type: conference-paper
  title: "ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities"
  authors:
    - family-names: Karger
      given-names: Ezra
    - family-names: Bastani
      given-names: Houtan
    - family-names: Yueh-Han
      given-names: Chen
    - family-names: Jacobs
      given-names: Zachary
    - family-names: Halawi
      given-names: Danny
    - family-names: Zhang
      given-names: Fred
    - family-names: Tetlock
      given-names: Philip E.
  year: 2025
  event: "International Conference on Learning Representations (ICLR)"
  url: https://iclr.cc/virtual/2025/poster/28507