Skip to content

Commit 6517dfb

Browse files
khushit-shah-nxsean-jc
authored andcommitted
KVM: x86: Add x2APIC "features" to control EOI broadcast suppression
Add two flags for KVM_CAP_X2APIC_API to allow userspace to control support for Suppress EOI Broadcasts when using a split IRQCHIP (I/O APIC emulated by userspace), which KVM completely mishandles. When x2APIC support was first added, KVM incorrectly advertised and "enabled" Suppress EOI Broadcast, without fully supporting the I/O APIC side of the equation, i.e. without adding directed EOI to KVM's in-kernel I/O APIC. That flaw was carried over to split IRQCHIP support, i.e. KVM advertised support for Suppress EOI Broadcasts irrespective of whether or not the userspace I/O APIC implementation supported directed EOIs. Even worse, KVM didn't actually suppress EOI broadcasts, i.e. userspace VMMs without support for directed EOI came to rely on the "spurious" broadcasts. KVM "fixed" the in-kernel I/O APIC implementation by completely disabling support for Suppress EOI Broadcasts in commit 0bcc3fb ("KVM: lapic: stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use"), but didn't do anything to remedy userspace I/O APIC implementations. KVM's bogus handling of Suppress EOI Broadcast is problematic when the guest relies on interrupts being masked in the I/O APIC until well after the initial local APIC EOI. E.g. Windows with Credential Guard enabled handles interrupts in the following order: 1. Interrupt for L2 arrives. 2. L1 APIC EOIs the interrupt. 3. L1 resumes L2 and injects the interrupt. 4. L2 EOIs after servicing. 5. L1 performs the I/O APIC EOI. Because KVM EOIs the I/O APIC at step #2, the guest can get an interrupt storm, e.g. if the IRQ line is still asserted and userspace reacts to the EOI by re-injecting the IRQ, because the guest doesn't de-assert the line until step #4, and doesn't expect the interrupt to be re-enabled until step #5. Unfortunately, simply "fixing" the bug isn't an option, as KVM has no way of knowing if the userspace I/O APIC supports directed EOIs, i.e. suppressing EOI broadcasts would result in interrupts being stuck masked in the userspace I/O APIC due to step #5 being ignored by userspace. And fully disabling support for Suppress EOI Broadcast is also undesirable, as picking up the fix would require a guest reboot, *and* more importantly would change the virtual CPU model exposed to the guest without any buy-in from userspace. Add KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST flags to allow userspace to explicitly enable or disable support for Suppress EOI Broadcasts. This gives userspace control over the virtual CPU model exposed to the guest, as KVM should never have enabled support for Suppress EOI Broadcast without userspace opt-in. Not setting either flag will result in legacy quirky behavior for backward compatibility. Disallow fully enabling SUPPRESS_EOI_BROADCAST when using an in-kernel I/O APIC, as KVM's history/support is just as tragic. E.g. it's not clear that commit c806a6a ("KVM: x86: call irq notifiers with directed EOI") was entirely correct, i.e. it may have simply papered over the lack of Directed EOI emulation in the I/O APIC. Note, Suppress EOI Broadcasts is defined only in Intel's SDM, not in AMD's APM. But the bit is writable on some AMD CPUs, e.g. Turin, and KVM's ABI is to support Directed EOI (KVM's name) irrespective of guest CPU vendor. Fixes: 7543a63 ("KVM: x86: Add KVM exit for IOAPIC EOIs") Closes: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com Cc: stable@vger.kernel.org Suggested-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Khushit Shah <khushit.shah@nutanix.com> Link: https://patch.msgid.link/20260123125657.3384063-1-khushit.shah@nutanix.com [sean: clean up minor formatting goofs and fix a comment typo] Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
1 parent 3f2757d commit 6517dfb

File tree

7 files changed

+127
-15
lines changed

7 files changed

+127
-15
lines changed

Documentation/virt/kvm/api.rst

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7835,8 +7835,10 @@ Will return -EBUSY if a VCPU has already been created.
78357835

78367836
Valid feature flags in args[0] are::
78377837

7838-
#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
7839-
#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
7838+
#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
7839+
#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
7840+
#define KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST (1ULL << 2)
7841+
#define KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST (1ULL << 3)
78407842

78417843
Enabling KVM_X2APIC_API_USE_32BIT_IDS changes the behavior of
78427844
KVM_SET_GSI_ROUTING, KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC,
@@ -7849,6 +7851,28 @@ as a broadcast even in x2APIC mode in order to support physical x2APIC
78497851
without interrupt remapping. This is undesirable in logical mode,
78507852
where 0xff represents CPUs 0-7 in cluster 0.
78517853

7854+
Setting KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST instructs KVM to enable
7855+
Suppress EOI Broadcasts. KVM will advertise support for Suppress EOI
7856+
Broadcast to the guest and suppress LAPIC EOI broadcasts when the guest
7857+
sets the Suppress EOI Broadcast bit in the SPIV register. This flag is
7858+
supported only when using a split IRQCHIP.
7859+
7860+
Setting KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST disables support for
7861+
Suppress EOI Broadcasts entirely, i.e. instructs KVM to NOT advertise
7862+
support to the guest.
7863+
7864+
Modern VMMs should either enable KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST
7865+
or KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST. If not, legacy quirky
7866+
behavior will be used by KVM: in split IRQCHIP mode, KVM will advertise
7867+
support for Suppress EOI Broadcasts but not actually suppress EOI
7868+
broadcasts; for in-kernel IRQCHIP mode, KVM will not advertise support for
7869+
Suppress EOI Broadcasts.
7870+
7871+
Setting both KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and
7872+
KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST will fail with an EINVAL error,
7873+
as will setting KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST without a split
7874+
IRCHIP.
7875+
78527876
7.8 KVM_CAP_S390_USER_INSTR0
78537877
----------------------------
78547878

arch/x86/include/asm/kvm_host.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1228,6 +1228,12 @@ enum kvm_irqchip_mode {
12281228
KVM_IRQCHIP_SPLIT, /* created with KVM_CAP_SPLIT_IRQCHIP */
12291229
};
12301230

1231+
enum kvm_suppress_eoi_broadcast_mode {
1232+
KVM_SUPPRESS_EOI_BROADCAST_QUIRKED, /* Legacy behavior */
1233+
KVM_SUPPRESS_EOI_BROADCAST_ENABLED, /* Enable Suppress EOI broadcast */
1234+
KVM_SUPPRESS_EOI_BROADCAST_DISABLED /* Disable Suppress EOI broadcast */
1235+
};
1236+
12311237
struct kvm_x86_msr_filter {
12321238
u8 count;
12331239
bool default_allow:1;
@@ -1477,6 +1483,7 @@ struct kvm_arch {
14771483

14781484
bool x2apic_format;
14791485
bool x2apic_broadcast_quirk_disabled;
1486+
enum kvm_suppress_eoi_broadcast_mode suppress_eoi_broadcast_mode;
14801487

14811488
bool has_mapped_host_mmio;
14821489
bool guest_can_read_msr_platform_info;

arch/x86/include/uapi/asm/kvm.h

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -914,8 +914,10 @@ struct kvm_sev_snp_launch_finish {
914914
__u64 pad1[4];
915915
};
916916

917-
#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
918-
#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
917+
#define KVM_X2APIC_API_USE_32BIT_IDS _BITULL(0)
918+
#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK _BITULL(1)
919+
#define KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST _BITULL(2)
920+
#define KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST _BITULL(3)
919921

920922
struct kvm_hyperv_eventfd {
921923
__u32 conn_id;

arch/x86/kvm/ioapic.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -561,7 +561,7 @@ static void kvm_ioapic_update_eoi_one(struct kvm_vcpu *vcpu,
561561
spin_lock(&ioapic->lock);
562562

563563
if (trigger_mode != IOAPIC_LEVEL_TRIG ||
564-
kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
564+
kvm_lapic_suppress_eoi_broadcast(apic))
565565
return;
566566

567567
ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);

arch/x86/kvm/lapic.c

Lines changed: 68 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,63 @@ bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
105105
apic_test_vector(vector, apic->regs + APIC_IRR);
106106
}
107107

108+
static bool kvm_lapic_advertise_suppress_eoi_broadcast(struct kvm *kvm)
109+
{
110+
switch (kvm->arch.suppress_eoi_broadcast_mode) {
111+
case KVM_SUPPRESS_EOI_BROADCAST_ENABLED:
112+
return true;
113+
case KVM_SUPPRESS_EOI_BROADCAST_DISABLED:
114+
return false;
115+
case KVM_SUPPRESS_EOI_BROADCAST_QUIRKED:
116+
/*
117+
* The default in-kernel I/O APIC emulates the 82093AA and does not
118+
* implement an EOI register. Some guests (e.g. Windows with the
119+
* Hyper-V role enabled) disable LAPIC EOI broadcast without
120+
* checking the I/O APIC version, which can cause level-triggered
121+
* interrupts to never be EOI'd.
122+
*
123+
* To avoid this, KVM doesn't advertise Suppress EOI Broadcast
124+
* support when using the default in-kernel I/O APIC.
125+
*
126+
* Historically, in split IRQCHIP mode, KVM always advertised
127+
* Suppress EOI Broadcast support but did not actually suppress
128+
* EOIs, resulting in quirky behavior.
129+
*/
130+
return !ioapic_in_kernel(kvm);
131+
default:
132+
WARN_ON_ONCE(1);
133+
return false;
134+
}
135+
}
136+
137+
bool kvm_lapic_suppress_eoi_broadcast(struct kvm_lapic *apic)
138+
{
139+
struct kvm *kvm = apic->vcpu->kvm;
140+
141+
if (!(kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI))
142+
return false;
143+
144+
switch (kvm->arch.suppress_eoi_broadcast_mode) {
145+
case KVM_SUPPRESS_EOI_BROADCAST_ENABLED:
146+
return true;
147+
case KVM_SUPPRESS_EOI_BROADCAST_DISABLED:
148+
return false;
149+
case KVM_SUPPRESS_EOI_BROADCAST_QUIRKED:
150+
/*
151+
* Historically, in split IRQCHIP mode, KVM ignored the suppress
152+
* EOI broadcast bit set by the guest and broadcasts EOIs to the
153+
* userspace I/O APIC. For In-kernel I/O APIC, the support itself
154+
* is not advertised, can only be enabled via KVM_SET_APIC_STATE,
155+
* and KVM's I/O APIC doesn't emulate Directed EOIs; but if the
156+
* feature is enabled, it is respected (with odd behavior).
157+
*/
158+
return ioapic_in_kernel(kvm);
159+
default:
160+
WARN_ON_ONCE(1);
161+
return false;
162+
}
163+
}
164+
108165
__read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu);
109166
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_has_noapic_vcpu);
110167

@@ -554,15 +611,9 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
554611

555612
v = APIC_VERSION | ((apic->nr_lvt_entries - 1) << 16);
556613

557-
/*
558-
* KVM emulates 82093AA datasheet (with in-kernel IOAPIC implementation)
559-
* which doesn't have EOI register; Some buggy OSes (e.g. Windows with
560-
* Hyper-V role) disable EOI broadcast in lapic not checking for IOAPIC
561-
* version first and level-triggered interrupts never get EOIed in
562-
* IOAPIC.
563-
*/
614+
564615
if (guest_cpu_cap_has(vcpu, X86_FEATURE_X2APIC) &&
565-
!ioapic_in_kernel(vcpu->kvm))
616+
kvm_lapic_advertise_suppress_eoi_broadcast(vcpu->kvm))
566617
v |= APIC_LVR_DIRECTED_EOI;
567618
kvm_lapic_set_reg(apic, APIC_LVR, v);
568619
}
@@ -1517,6 +1568,15 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
15171568

15181569
/* Request a KVM exit to inform the userspace IOAPIC. */
15191570
if (irqchip_split(apic->vcpu->kvm)) {
1571+
/*
1572+
* Don't exit to userspace if the guest has enabled Directed
1573+
* EOI, a.k.a. Suppress EOI Broadcasts, in which case the local
1574+
* APIC doesn't broadcast EOIs (the guest must EOI the target
1575+
* I/O APIC(s) directly).
1576+
*/
1577+
if (kvm_lapic_suppress_eoi_broadcast(apic))
1578+
return;
1579+
15201580
apic->vcpu->arch.pending_ioapic_eoi = vector;
15211581
kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
15221582
return;

arch/x86/kvm/lapic.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -231,6 +231,8 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
231231

232232
bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
233233

234+
bool kvm_lapic_suppress_eoi_broadcast(struct kvm_lapic *apic);
235+
234236
void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu);
235237

236238
void kvm_bitmap_or_dest_vcpus(struct kvm *kvm, struct kvm_lapic_irq *irq,

arch/x86/kvm/x86.c

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -121,8 +121,10 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
121121

122122
#define KVM_CAP_PMU_VALID_MASK KVM_PMU_CAP_DISABLE
123123

124-
#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
125-
KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
124+
#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
125+
KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK | \
126+
KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST | \
127+
KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
126128

127129
static void update_cr8_intercept(struct kvm_vcpu *vcpu);
128130
static void process_nmi(struct kvm_vcpu *vcpu);
@@ -4932,6 +4934,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
49324934
break;
49334935
case KVM_CAP_X2APIC_API:
49344936
r = KVM_X2APIC_API_VALID_FLAGS;
4937+
if (kvm && !irqchip_split(kvm))
4938+
r &= ~KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST;
49354939
break;
49364940
case KVM_CAP_NESTED_STATE:
49374941
r = kvm_x86_ops.nested_ops->get_state ?
@@ -6740,11 +6744,24 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
67406744
if (cap->args[0] & ~KVM_X2APIC_API_VALID_FLAGS)
67416745
break;
67426746

6747+
if ((cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST) &&
6748+
(cap->args[0] & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST))
6749+
break;
6750+
6751+
if ((cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST) &&
6752+
!irqchip_split(kvm))
6753+
break;
6754+
67436755
if (cap->args[0] & KVM_X2APIC_API_USE_32BIT_IDS)
67446756
kvm->arch.x2apic_format = true;
67456757
if (cap->args[0] & KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
67466758
kvm->arch.x2apic_broadcast_quirk_disabled = true;
67476759

6760+
if (cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST)
6761+
kvm->arch.suppress_eoi_broadcast_mode = KVM_SUPPRESS_EOI_BROADCAST_ENABLED;
6762+
if (cap->args[0] & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
6763+
kvm->arch.suppress_eoi_broadcast_mode = KVM_SUPPRESS_EOI_BROADCAST_DISABLED;
6764+
67486765
r = 0;
67496766
break;
67506767
case KVM_CAP_X86_DISABLE_EXITS:

0 commit comments

Comments
 (0)