forked from forecastingresearch/forecastbench
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathCITATION.cff
More file actions
72 lines (71 loc) · 2.55 KB
/
CITATION.cff
File metadata and controls
72 lines (71 loc) · 2.55 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
cff-version: 1.2.0
title: "ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities"
message: "If you use this software, please cite it as below."
authors:
- family-names: Karger
given-names: Ezra
- family-names: Bastani
given-names: Houtan
- family-names: Yueh-Han
given-names: Chen
- family-names: Jacobs
given-names: Zachary
- family-names: Halawi
given-names: Danny
- family-names: Zhang
given-names: Fred
- family-names: Tetlock
given-names: Philip E.
repository-code: https://github.com/forecastingresearch/forecastbench
url: https://www.forecastbench.org/
repository: https://github.com/forecastingresearch/forecastbench-datasets
abstract: >-
Forecasts of future events are essential inputs into
informed decision-making. Machine learning (ML) systems
have the potential to deliver forecasts at scale, but
there is no framework for evaluating the accuracy of ML
systems on a standardized set of forecasting questions. To
address this gap, we introduce ForecastBench: a dynamic
benchmark that evaluates the accuracy of ML systems on an
automatically generated and regularly updated set of 1,000
forecasting questions. To avoid any possibility of data
leakage, ForecastBench is comprised solely of questions
about future events that have no known answer at the time
of submission. We quantify the capabilities of current ML
systems by collecting forecasts from expert (human)
forecasters, the general public, and LLMs on a random
subset of questions from the benchmark (N=200). While LLMs
have achieved super-human performance on many benchmarks,
they perform less well here: expert forecasters outperform
the top-performing LLM (p-value<0.001). We display system
and human scores in a public leaderboard at
https://www.forecastbench.org.
keywords:
- machine learning
- forecasting
- benchmarking
- artificial intelligence
- decision-making
license: MIT
version: 1.0.0
preferred-citation:
type: conference-paper
title: "ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities"
authors:
- family-names: Karger
given-names: Ezra
- family-names: Bastani
given-names: Houtan
- family-names: Yueh-Han
given-names: Chen
- family-names: Jacobs
given-names: Zachary
- family-names: Halawi
given-names: Danny
- family-names: Zhang
given-names: Fred
- family-names: Tetlock
given-names: Philip E.
year: 2025
event: "International Conference on Learning Representations (ICLR)"
url: https://iclr.cc/virtual/2025/poster/28507