Skip to content

Commit 93173b5

Browse files
committed
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini: "Small release, the most interesting stuff is x86 nested virt improvements. x86: - userspace can now hide nested VMX features from guests - nested VMX can now run Hyper-V in a guest - support for AVX512_4VNNIW and AVX512_FMAPS in KVM - infrastructure support for virtual Intel GPUs. PPC: - support for KVM guests on POWER9 - improved support for interrupt polling - optimizations and cleanups. s390: - two small optimizations, more stuff is in flight and will be in 4.11. ARM: - support for the GICv3 ITS on 32bit platforms" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits) arm64: KVM: pmu: Reset PMSELR_EL0.SEL to a sane value before entering the guest KVM: arm/arm64: timer: Check for properly initialized timer on init KVM: arm/arm64: vgic-v2: Limit ITARGETSR bits to number of VCPUs KVM: x86: Handle the kthread worker using the new API KVM: nVMX: invvpid handling improvements KVM: nVMX: check host CR3 on vmentry and vmexit KVM: nVMX: introduce nested_vmx_load_cr3 and call it on vmentry KVM: nVMX: propagate errors from prepare_vmcs02 KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entry KVM: nVMX: generate MSR_IA32_CR{0,4}_FIXED1 from guest CPUID KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation KVM: nVMX: support restore of VMX capability MSRs KVM: nVMX: generate non-true VMX MSRs based on true versions KVM: x86: Do not clear RFLAGS.TF when a singlestep trap occurs. KVM: x86: Add kvm_skip_emulated_instruction and use it. KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12 KVM: VMX: Reorder some skip_emulated_instruction calls KVM: x86: Add a return value to kvm_emulate_cpuid KVM: PPC: Book3S: Move prototypes for KVM functions into kvm_ppc.h ...
2 parents 1c59e1e + f673b5b commit 93173b5

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+2304
-916
lines changed

Documentation/virtual/kvm/00-INDEX

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ cpuid.txt
66
- KVM-specific cpuid leaves (x86).
77
devices/
88
- KVM_CAP_DEVICE_CTRL userspace API.
9+
halt-polling.txt
10+
- notes on halt-polling
911
hypercalls.txt
1012
- KVM hypercalls.
1113
locking.txt

Documentation/virtual/kvm/api.txt

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2034,6 +2034,8 @@ registers, find a list below:
20342034
PPC | KVM_REG_PPC_WORT | 64
20352035
PPC | KVM_REG_PPC_SPRG9 | 64
20362036
PPC | KVM_REG_PPC_DBSR | 32
2037+
PPC | KVM_REG_PPC_TIDR | 64
2038+
PPC | KVM_REG_PPC_PSSCR | 64
20372039
PPC | KVM_REG_PPC_TM_GPR0 | 64
20382040
...
20392041
PPC | KVM_REG_PPC_TM_GPR31 | 64
@@ -2050,6 +2052,7 @@ registers, find a list below:
20502052
PPC | KVM_REG_PPC_TM_VSCR | 32
20512053
PPC | KVM_REG_PPC_TM_DSCR | 64
20522054
PPC | KVM_REG_PPC_TM_TAR | 64
2055+
PPC | KVM_REG_PPC_TM_XER | 64
20532056
| |
20542057
MIPS | KVM_REG_MIPS_R0 | 64
20552058
...
@@ -2209,7 +2212,7 @@ after pausing the vcpu, but before it is resumed.
22092212
4.71 KVM_SIGNAL_MSI
22102213

22112214
Capability: KVM_CAP_SIGNAL_MSI
2212-
Architectures: x86 arm64
2215+
Architectures: x86 arm arm64
22132216
Type: vm ioctl
22142217
Parameters: struct kvm_msi (in)
22152218
Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error
Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
The KVM halt polling system
2+
===========================
3+
4+
The KVM halt polling system provides a feature within KVM whereby the latency
5+
of a guest can, under some circumstances, be reduced by polling in the host
6+
for some time period after the guest has elected to no longer run by cedeing.
7+
That is, when a guest vcpu has ceded, or in the case of powerpc when all of the
8+
vcpus of a single vcore have ceded, the host kernel polls for wakeup conditions
9+
before giving up the cpu to the scheduler in order to let something else run.
10+
11+
Polling provides a latency advantage in cases where the guest can be run again
12+
very quickly by at least saving us a trip through the scheduler, normally on
13+
the order of a few micro-seconds, although performance benefits are workload
14+
dependant. In the event that no wakeup source arrives during the polling
15+
interval or some other task on the runqueue is runnable the scheduler is
16+
invoked. Thus halt polling is especially useful on workloads with very short
17+
wakeup periods where the time spent halt polling is minimised and the time
18+
savings of not invoking the scheduler are distinguishable.
19+
20+
The generic halt polling code is implemented in:
21+
22+
virt/kvm/kvm_main.c: kvm_vcpu_block()
23+
24+
The powerpc kvm-hv specific case is implemented in:
25+
26+
arch/powerpc/kvm/book3s_hv.c: kvmppc_vcore_blocked()
27+
28+
Halt Polling Interval
29+
=====================
30+
31+
The maximum time for which to poll before invoking the scheduler, referred to
32+
as the halt polling interval, is increased and decreased based on the perceived
33+
effectiveness of the polling in an attempt to limit pointless polling.
34+
This value is stored in either the vcpu struct:
35+
36+
kvm_vcpu->halt_poll_ns
37+
38+
or in the case of powerpc kvm-hv, in the vcore struct:
39+
40+
kvmppc_vcore->halt_poll_ns
41+
42+
Thus this is a per vcpu (or vcore) value.
43+
44+
During polling if a wakeup source is received within the halt polling interval,
45+
the interval is left unchanged. In the event that a wakeup source isn't
46+
received during the polling interval (and thus schedule is invoked) there are
47+
two options, either the polling interval and total block time[0] were less than
48+
the global max polling interval (see module params below), or the total block
49+
time was greater than the global max polling interval.
50+
51+
In the event that both the polling interval and total block time were less than
52+
the global max polling interval then the polling interval can be increased in
53+
the hope that next time during the longer polling interval the wake up source
54+
will be received while the host is polling and the latency benefits will be
55+
received. The polling interval is grown in the function grow_halt_poll_ns() and
56+
is multiplied by the module parameter halt_poll_ns_grow.
57+
58+
In the event that the total block time was greater than the global max polling
59+
interval then the host will never poll for long enough (limited by the global
60+
max) to wakeup during the polling interval so it may as well be shrunk in order
61+
to avoid pointless polling. The polling interval is shrunk in the function
62+
shrink_halt_poll_ns() and is divided by the module parameter
63+
halt_poll_ns_shrink, or set to 0 iff halt_poll_ns_shrink == 0.
64+
65+
It is worth noting that this adjustment process attempts to hone in on some
66+
steady state polling interval but will only really do a good job for wakeups
67+
which come at an approximately constant rate, otherwise there will be constant
68+
adjustment of the polling interval.
69+
70+
[0] total block time: the time between when the halt polling function is
71+
invoked and a wakeup source received (irrespective of
72+
whether the scheduler is invoked within that function).
73+
74+
Module Parameters
75+
=================
76+
77+
The kvm module has 3 tuneable module parameters to adjust the global max
78+
polling interval as well as the rate at which the polling interval is grown and
79+
shrunk. These variables are defined in include/linux/kvm_host.h and as module
80+
parameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c in the
81+
powerpc kvm-hv case.
82+
83+
Module Parameter | Description | Default Value
84+
--------------------------------------------------------------------------------
85+
halt_poll_ns | The global max polling interval | KVM_HALT_POLL_NS_DEFAULT
86+
| which defines the ceiling value |
87+
| of the polling interval for | (per arch value)
88+
| each vcpu. |
89+
--------------------------------------------------------------------------------
90+
halt_poll_ns_grow | The value by which the halt | 2
91+
| polling interval is multiplied |
92+
| in the grow_halt_poll_ns() |
93+
| function. |
94+
--------------------------------------------------------------------------------
95+
halt_poll_ns_shrink | The value by which the halt | 0
96+
| polling interval is divided in |
97+
| the shrink_halt_poll_ns() |
98+
| function. |
99+
--------------------------------------------------------------------------------
100+
101+
These module parameters can be set from the debugfs files in:
102+
103+
/sys/module/kvm/parameters/
104+
105+
Note: that these module parameters are system wide values and are not able to
106+
be tuned on a per vm basis.
107+
108+
Further Notes
109+
=============
110+
111+
- Care should be taken when setting the halt_poll_ns module parameter as a
112+
large value has the potential to drive the cpu usage to 100% on a machine which
113+
would be almost entirely idle otherwise. This is because even if a guest has
114+
wakeups during which very little work is done and which are quite far apart, if
115+
the period is shorter than the global max polling interval (halt_poll_ns) then
116+
the host will always poll for the entire block time and thus cpu utilisation
117+
will go to 100%.
118+
119+
- Halt polling essentially presents a trade off between power usage and latency
120+
and the module parameters should be used to tune the affinity for this. Idle
121+
cpu time is essentially converted to host kernel time with the aim of decreasing
122+
latency when entering the guest.
123+
124+
- Halt polling will only be conducted by the host when no other tasks are
125+
runnable on that cpu, otherwise the polling will cease immediately and
126+
schedule will be invoked to allow that other task to run. Thus this doesn't
127+
allow a guest to denial of service the cpu.

arch/arm/include/uapi/asm/kvm.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,9 +87,11 @@ struct kvm_regs {
8787
/* Supported VGICv3 address types */
8888
#define KVM_VGIC_V3_ADDR_TYPE_DIST 2
8989
#define KVM_VGIC_V3_ADDR_TYPE_REDIST 3
90+
#define KVM_VGIC_ITS_ADDR_TYPE 4
9091

9192
#define KVM_VGIC_V3_DIST_SIZE SZ_64K
9293
#define KVM_VGIC_V3_REDIST_SIZE (2 * SZ_64K)
94+
#define KVM_VGIC_V3_ITS_SIZE (2 * SZ_64K)
9395

9496
#define KVM_ARM_VCPU_POWER_OFF 0 /* CPU is started in OFF state */
9597
#define KVM_ARM_VCPU_PSCI_0_2 1 /* CPU uses PSCI v0.2 */

arch/arm/kvm/Kconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ config KVM
3434
select HAVE_KVM_IRQFD
3535
select HAVE_KVM_IRQCHIP
3636
select HAVE_KVM_IRQ_ROUTING
37+
select HAVE_KVM_MSI
3738
depends on ARM_VIRT_EXT && ARM_LPAE && ARM_ARCH_TIMER
3839
---help---
3940
Support hosting virtualized guest machines.

arch/arm/kvm/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,5 +32,6 @@ obj-y += $(KVM)/arm/vgic/vgic-mmio.o
3232
obj-y += $(KVM)/arm/vgic/vgic-mmio-v2.o
3333
obj-y += $(KVM)/arm/vgic/vgic-mmio-v3.o
3434
obj-y += $(KVM)/arm/vgic/vgic-kvm-device.o
35+
obj-y += $(KVM)/arm/vgic/vgic-its.o
3536
obj-y += $(KVM)/irqchip.o
3637
obj-y += $(KVM)/arm/arch_timer.o

arch/arm/kvm/arm.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,12 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
221221
case KVM_CAP_MAX_VCPUS:
222222
r = KVM_MAX_VCPUS;
223223
break;
224+
case KVM_CAP_MSI_DEVID:
225+
if (!kvm)
226+
r = -EINVAL;
227+
else
228+
r = kvm->arch.vgic.msis_require_devid;
229+
break;
224230
default:
225231
r = kvm_arch_dev_ioctl_check_extension(kvm, ext);
226232
break;

arch/arm64/kvm/Kconfig

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,6 @@ menuconfig VIRTUALIZATION
1616

1717
if VIRTUALIZATION
1818

19-
config KVM_ARM_VGIC_V3_ITS
20-
bool
21-
2219
config KVM
2320
bool "Kernel-based Virtual Machine (KVM) support"
2421
depends on OF
@@ -34,7 +31,6 @@ config KVM
3431
select KVM_VFIO
3532
select HAVE_KVM_EVENTFD
3633
select HAVE_KVM_IRQFD
37-
select KVM_ARM_VGIC_V3_ITS
3834
select KVM_ARM_PMU if HW_PERF_EVENTS
3935
select HAVE_KVM_MSI
4036
select HAVE_KVM_IRQCHIP

arch/arm64/kvm/hyp/switch.c

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,13 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
8585
write_sysreg(val, hcr_el2);
8686
/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
8787
write_sysreg(1 << 15, hstr_el2);
88-
/* Make sure we trap PMU access from EL0 to EL2 */
88+
/*
89+
* Make sure we trap PMU access from EL0 to EL2. Also sanitize
90+
* PMSELR_EL0 to make sure it never contains the cycle
91+
* counter, which could make a PMXEVCNTR_EL0 access UNDEF at
92+
* EL1 instead of being trapped to EL2.
93+
*/
94+
write_sysreg(0, pmselr_el0);
8995
write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
9096
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
9197
__activate_traps_arch()();

arch/arm64/kvm/reset.c

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -86,12 +86,6 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
8686
case KVM_CAP_VCPU_ATTRIBUTES:
8787
r = 1;
8888
break;
89-
case KVM_CAP_MSI_DEVID:
90-
if (!kvm)
91-
r = -EINVAL;
92-
else
93-
r = kvm->arch.vgic.msis_require_devid;
94-
break;
9589
default:
9690
r = 0;
9791
}

0 commit comments

Comments
 (0)