Skip to content

Commit 4215ee0

Browse files
committed
Merge tag 'kvm-x86-svm-6.20' of https://github.com/kvm-x86/linux into HEAD
KVM SVM changes for 6.20 - Drop a user-triggerable WARN on nested_svm_load_cr3() failure. - Add support for virtualizing ERAPS. Note, correct virtualization of ERAPS relies on an upcoming, publicly announced change in the APM to reduce the set of conditions where hardware (i.e. KVM) *must* flush the RAP. - Ignore nSVM intercepts for instructions that are not supported according to L1's virtual CPU model. - Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath for EPT Misconfig. - Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM, and allow userspace to restore nested state with GIF=0. - Treat exit_code as an unsigned 64-bit value through all of KVM. - Add support for fetching SNP certificates from userspace. - Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD or VMSAVE on behalf of L2. - Misc fixes and cleanups.
2 parents 687603f + 20c3c41 commit 4215ee0

File tree

22 files changed

+559
-156
lines changed

22 files changed

+559
-156
lines changed

Documentation/virt/kvm/api.rst

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7382,6 +7382,50 @@ Please note that the kernel is allowed to use the kvm_run structure as the
73827382
primary storage for certain register types. Therefore, the kernel may use the
73837383
values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
73847384

7385+
::
7386+
7387+
/* KVM_EXIT_SNP_REQ_CERTS */
7388+
struct kvm_exit_snp_req_certs {
7389+
__u64 gpa;
7390+
__u64 npages;
7391+
__u64 ret;
7392+
};
7393+
7394+
KVM_EXIT_SNP_REQ_CERTS indicates an SEV-SNP guest with certificate-fetching
7395+
enabled (see KVM_SEV_SNP_ENABLE_REQ_CERTS) has generated an Extended Guest
7396+
Request NAE #VMGEXIT (SNP_GUEST_REQUEST) with message type MSG_REPORT_REQ,
7397+
i.e. has requested an attestation report from firmware, and would like the
7398+
certificate data corresponding to the attestation report signature to be
7399+
provided by the hypervisor as part of the request.
7400+
7401+
To allow for userspace to provide the certificate, the 'gpa' and 'npages'
7402+
are forwarded verbatim from the guest request (the RAX and RBX GHCB fields
7403+
respectively). 'ret' is not an "output" from KVM, and is always '0' on
7404+
exit. KVM verifies the 'gpa' is 4KiB aligned prior to exiting to userspace,
7405+
but otherwise the information from the guest isn't validated.
7406+
7407+
Upon the next KVM_RUN, e.g. after userspace has serviced the request (or not),
7408+
KVM will complete the #VMGEXIT, using the 'ret' field to determine whether to
7409+
signal success or failure to the guest, and on failure, what reason code will
7410+
be communicated via SW_EXITINFO2. If 'ret' is set to an unsupported value (see
7411+
the table below), KVM_RUN will fail with -EINVAL. For a 'ret' of 'ENOSPC', KVM
7412+
also consumes the 'npages' field, i.e. userspace can use the field to inform
7413+
the guest of the number of pages needed to hold all the certificate data.
7414+
7415+
The supported 'ret' values and their respective SW_EXITINFO2 encodings:
7416+
7417+
====== =============================================================
7418+
0 0x0, i.e. success. KVM will emit an SNP_GUEST_REQUEST command
7419+
to SNP firmware.
7420+
ENOSPC 0x0000000100000000, i.e. not enough guest pages to hold the
7421+
certificate table and certificate data. KVM will also set the
7422+
RBX field in the GHBC to 'npages'.
7423+
EAGAIN 0x0000000200000000, i.e. the host is busy and the guest should
7424+
retry the request.
7425+
EIO 0xffffffff00000000, for all other errors (this return code is
7426+
a KVM-defined hypervisor value, as allowed by the GHCB)
7427+
====== =============================================================
7428+
73857429

73867430
.. _cap_enable:
73877431

Documentation/virt/kvm/x86/amd-memory-encryption.rst

Lines changed: 51 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -572,18 +572,68 @@ Returns: 0 on success, -negative on error
572572
See SNP_LAUNCH_FINISH in the SEV-SNP specification [snp-fw-abi]_ for further
573573
details on the input parameters in ``struct kvm_sev_snp_launch_finish``.
574574

575+
21. KVM_SEV_SNP_ENABLE_REQ_CERTS
576+
--------------------------------
577+
578+
The KVM_SEV_SNP_ENABLE_REQ_CERTS command will configure KVM to exit to
579+
userspace with a ``KVM_EXIT_SNP_REQ_CERTS`` exit type as part of handling
580+
a guest attestation report, which will to allow userspace to provide a
581+
certificate corresponding to the endorsement key used by firmware to sign
582+
that attestation report.
583+
584+
Returns: 0 on success, -negative on error
585+
586+
NOTE: The endorsement key used by firmware may change as a result of
587+
management activities like updating SEV-SNP firmware or loading new
588+
endorsement keys, so some care should be taken to keep the returned
589+
certificate data in sync with the actual endorsement key in use by
590+
firmware at the time the attestation request is sent to SNP firmware. The
591+
recommended scheme to do this is to use file locking (e.g. via fcntl()'s
592+
F_OFD_SETLK) in the following manner:
593+
594+
- Prior to obtaining/providing certificate data as part of servicing an
595+
exit type of ``KVM_EXIT_SNP_REQ_CERTS``, the VMM should obtain a
596+
shared/read or exclusive/write lock on the certificate blob file before
597+
reading it and returning it to KVM, and continue to hold the lock until
598+
the attestation request is actually sent to firmware. To facilitate
599+
this, the VMM can set the ``immediate_exit`` flag of kvm_run just after
600+
supplying the certificate data, and just before resuming the vCPU.
601+
This will ensure the vCPU will exit again to userspace with ``-EINTR``
602+
after it finishes fetching the attestation request from firmware, at
603+
which point the VMM can safely drop the file lock.
604+
605+
- Tools/libraries that perform updates to SNP firmware TCB values or
606+
endorsement keys (e.g. via /dev/sev interfaces such as ``SNP_COMMIT``,
607+
``SNP_SET_CONFIG``, or ``SNP_VLEK_LOAD``, see
608+
Documentation/virt/coco/sev-guest.rst for more details) in such a way
609+
that the certificate blob needs to be updated, should similarly take an
610+
exclusive lock on the certificate blob for the duration of any updates
611+
to endorsement keys or the certificate blob contents to ensure that
612+
VMMs using the above scheme will not return certificate blob data that
613+
is out of sync with the endorsement key used by firmware at the time
614+
the attestation request is actually issued.
615+
616+
This scheme is recommended so that tools can use a fairly generic/natural
617+
approach to synchronizing firmware/certificate updates via file-locking,
618+
which should make it easier to maintain interoperability across
619+
tools/VMMs/vendors.
620+
575621
Device attribute API
576622
====================
577623

578624
Attributes of the SEV implementation can be retrieved through the
579625
``KVM_HAS_DEVICE_ATTR`` and ``KVM_GET_DEVICE_ATTR`` ioctls on the ``/dev/kvm``
580626
device node, using group ``KVM_X86_GRP_SEV``.
581627

582-
Currently only one attribute is implemented:
628+
The following attributes are currently implemented:
583629

584630
* ``KVM_X86_SEV_VMSA_FEATURES``: return the set of all bits that
585631
are accepted in the ``vmsa_features`` of ``KVM_SEV_INIT2``.
586632

633+
* ``KVM_X86_SEV_SNP_REQ_CERTS``: return a value of 1 if the kernel supports the
634+
``KVM_EXIT_SNP_REQ_CERTS`` exit, which allows for fetching endorsement key
635+
certificates from userspace for each SNP attestation request the guest issues.
636+
587637
Firmware Management
588638
===================
589639

arch/x86/include/asm/cpufeatures.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -472,6 +472,7 @@
472472
#define X86_FEATURE_GP_ON_USER_CPUID (20*32+17) /* User CPUID faulting */
473473

474474
#define X86_FEATURE_PREFETCHI (20*32+20) /* Prefetch Data/Instruction to Cache Level */
475+
#define X86_FEATURE_ERAPS (20*32+24) /* Enhanced Return Address Predictor Security */
475476
#define X86_FEATURE_SBPB (20*32+27) /* Selective Branch Prediction Barrier */
476477
#define X86_FEATURE_IBPB_BRTYPE (20*32+28) /* MSR_PRED_CMD[IBPB] flushes all branch type predictions */
477478
#define X86_FEATURE_SRSO_NO (20*32+29) /* CPU is not affected by SRSO */

arch/x86/include/asm/kvm_host.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,15 @@ enum kvm_reg {
195195

196196
VCPU_EXREG_PDPTR = NR_VCPU_REGS,
197197
VCPU_EXREG_CR0,
198+
/*
199+
* Alias AMD's ERAPS (not a real register) to CR3 so that common code
200+
* can trigger emulation of the RAP (Return Address Predictor) with
201+
* minimal support required in common code. Piggyback CR3 as the RAP
202+
* is cleared on writes to CR3, i.e. marking CR3 dirty will naturally
203+
* mark ERAPS dirty as well.
204+
*/
198205
VCPU_EXREG_CR3,
206+
VCPU_EXREG_ERAPS = VCPU_EXREG_CR3,
199207
VCPU_EXREG_CR4,
200208
VCPU_EXREG_RFLAGS,
201209
VCPU_EXREG_SEGMENTS,

arch/x86/include/asm/svm.h

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -131,13 +131,13 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
131131
u64 tsc_offset;
132132
u32 asid;
133133
u8 tlb_ctl;
134-
u8 reserved_2[3];
134+
u8 erap_ctl;
135+
u8 reserved_2[2];
135136
u32 int_ctl;
136137
u32 int_vector;
137138
u32 int_state;
138139
u8 reserved_3[4];
139-
u32 exit_code;
140-
u32 exit_code_hi;
140+
u64 exit_code;
141141
u64 exit_info_1;
142142
u64 exit_info_2;
143143
u32 exit_int_info;
@@ -182,6 +182,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
182182
#define TLB_CONTROL_FLUSH_ASID 3
183183
#define TLB_CONTROL_FLUSH_ASID_LOCAL 7
184184

185+
#define ERAP_CONTROL_ALLOW_LARGER_RAP BIT(0)
186+
#define ERAP_CONTROL_CLEAR_RAP BIT(1)
187+
185188
#define V_TPR_MASK 0x0f
186189

187190
#define V_IRQ_SHIFT 8

arch/x86/include/uapi/asm/kvm.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -503,6 +503,7 @@ struct kvm_sync_regs {
503503
#define KVM_X86_GRP_SEV 1
504504
# define KVM_X86_SEV_VMSA_FEATURES 0
505505
# define KVM_X86_SNP_POLICY_BITS 1
506+
# define KVM_X86_SEV_SNP_REQ_CERTS 2
506507

507508
struct kvm_vmx_nested_state_data {
508509
__u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
@@ -743,6 +744,7 @@ enum sev_cmd_id {
743744
KVM_SEV_SNP_LAUNCH_START = 100,
744745
KVM_SEV_SNP_LAUNCH_UPDATE,
745746
KVM_SEV_SNP_LAUNCH_FINISH,
747+
KVM_SEV_SNP_ENABLE_REQ_CERTS,
746748

747749
KVM_SEV_NR_MAX,
748750
};

arch/x86/include/uapi/asm/svm.h

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -103,38 +103,38 @@
103103
#define SVM_EXIT_VMGEXIT 0x403
104104

105105
/* SEV-ES software-defined VMGEXIT events */
106-
#define SVM_VMGEXIT_MMIO_READ 0x80000001
107-
#define SVM_VMGEXIT_MMIO_WRITE 0x80000002
108-
#define SVM_VMGEXIT_NMI_COMPLETE 0x80000003
109-
#define SVM_VMGEXIT_AP_HLT_LOOP 0x80000004
110-
#define SVM_VMGEXIT_AP_JUMP_TABLE 0x80000005
106+
#define SVM_VMGEXIT_MMIO_READ 0x80000001ull
107+
#define SVM_VMGEXIT_MMIO_WRITE 0x80000002ull
108+
#define SVM_VMGEXIT_NMI_COMPLETE 0x80000003ull
109+
#define SVM_VMGEXIT_AP_HLT_LOOP 0x80000004ull
110+
#define SVM_VMGEXIT_AP_JUMP_TABLE 0x80000005ull
111111
#define SVM_VMGEXIT_SET_AP_JUMP_TABLE 0
112112
#define SVM_VMGEXIT_GET_AP_JUMP_TABLE 1
113-
#define SVM_VMGEXIT_PSC 0x80000010
114-
#define SVM_VMGEXIT_GUEST_REQUEST 0x80000011
115-
#define SVM_VMGEXIT_EXT_GUEST_REQUEST 0x80000012
116-
#define SVM_VMGEXIT_AP_CREATION 0x80000013
113+
#define SVM_VMGEXIT_PSC 0x80000010ull
114+
#define SVM_VMGEXIT_GUEST_REQUEST 0x80000011ull
115+
#define SVM_VMGEXIT_EXT_GUEST_REQUEST 0x80000012ull
116+
#define SVM_VMGEXIT_AP_CREATION 0x80000013ull
117117
#define SVM_VMGEXIT_AP_CREATE_ON_INIT 0
118118
#define SVM_VMGEXIT_AP_CREATE 1
119119
#define SVM_VMGEXIT_AP_DESTROY 2
120-
#define SVM_VMGEXIT_SNP_RUN_VMPL 0x80000018
121-
#define SVM_VMGEXIT_SAVIC 0x8000001a
120+
#define SVM_VMGEXIT_SNP_RUN_VMPL 0x80000018ull
121+
#define SVM_VMGEXIT_SAVIC 0x8000001aull
122122
#define SVM_VMGEXIT_SAVIC_REGISTER_GPA 0
123123
#define SVM_VMGEXIT_SAVIC_UNREGISTER_GPA 1
124124
#define SVM_VMGEXIT_SAVIC_SELF_GPA ~0ULL
125-
#define SVM_VMGEXIT_HV_FEATURES 0x8000fffd
126-
#define SVM_VMGEXIT_TERM_REQUEST 0x8000fffe
125+
#define SVM_VMGEXIT_HV_FEATURES 0x8000fffdull
126+
#define SVM_VMGEXIT_TERM_REQUEST 0x8000fffeull
127127
#define SVM_VMGEXIT_TERM_REASON(reason_set, reason_code) \
128128
/* SW_EXITINFO1[3:0] */ \
129129
(((((u64)reason_set) & 0xf)) | \
130130
/* SW_EXITINFO1[11:4] */ \
131131
((((u64)reason_code) & 0xff) << 4))
132-
#define SVM_VMGEXIT_UNSUPPORTED_EVENT 0x8000ffff
132+
#define SVM_VMGEXIT_UNSUPPORTED_EVENT 0x8000ffffull
133133

134134
/* Exit code reserved for hypervisor/software use */
135-
#define SVM_EXIT_SW 0xf0000000
135+
#define SVM_EXIT_SW 0xf0000000ull
136136

137-
#define SVM_EXIT_ERR -1
137+
#define SVM_EXIT_ERR -1ull
138138

139139
#define SVM_EXIT_REASONS \
140140
{ SVM_EXIT_READ_CR0, "read_cr0" }, \

arch/x86/kvm/cpuid.c

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1223,6 +1223,7 @@ void kvm_set_cpu_caps(void)
12231223
/* PrefetchCtlMsr */
12241224
/* GpOnUserCpuid */
12251225
/* EPSF */
1226+
F(ERAPS),
12261227
SYNTHESIZED_F(SBPB),
12271228
SYNTHESIZED_F(IBPB_BRTYPE),
12281229
SYNTHESIZED_F(SRSO_NO),
@@ -1803,8 +1804,14 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
18031804
entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
18041805
break;
18051806
case 0x80000021:
1806-
entry->ebx = entry->edx = 0;
1807+
entry->edx = 0;
18071808
cpuid_entry_override(entry, CPUID_8000_0021_EAX);
1809+
1810+
if (kvm_cpu_cap_has(X86_FEATURE_ERAPS))
1811+
entry->ebx &= GENMASK(23, 16);
1812+
else
1813+
entry->ebx = 0;
1814+
18081815
cpuid_entry_override(entry, CPUID_8000_0021_ECX);
18091816
break;
18101817
/* AMD Extended Performance Monitoring and Debug */

arch/x86/kvm/svm/avic.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1224,13 +1224,13 @@ static bool __init avic_want_avic_enabled(void)
12241224
* In "auto" mode, enable AVIC by default for Zen4+ if x2AVIC is
12251225
* supported (to avoid enabling partial support by default, and because
12261226
* x2AVIC should be supported by all Zen4+ CPUs). Explicitly check for
1227-
* family 0x19 and later (Zen5+), as the kernel's synthetic ZenX flags
1227+
* family 0x1A and later (Zen5+), as the kernel's synthetic ZenX flags
12281228
* aren't inclusive of previous generations, i.e. the kernel will set
12291229
* at most one ZenX feature flag.
12301230
*/
12311231
if (avic == AVIC_AUTO_MODE)
12321232
avic = boot_cpu_has(X86_FEATURE_X2AVIC) &&
1233-
(boot_cpu_data.x86 > 0x19 || cpu_feature_enabled(X86_FEATURE_ZEN4));
1233+
(cpu_feature_enabled(X86_FEATURE_ZEN4) || boot_cpu_data.x86 >= 0x1A);
12341234

12351235
if (!avic || !npt_enabled)
12361236
return false;

arch/x86/kvm/svm/hyperv.c

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,13 @@ void svm_hv_inject_synthetic_vmexit_post_tlb_flush(struct kvm_vcpu *vcpu)
1010
{
1111
struct vcpu_svm *svm = to_svm(vcpu);
1212

13+
/*
14+
* The exit code used by Hyper-V for software-defined exits is reserved
15+
* by AMD specifically for such use cases.
16+
*/
17+
BUILD_BUG_ON(HV_SVM_EXITCODE_ENL != SVM_EXIT_SW);
18+
1319
svm->vmcb->control.exit_code = HV_SVM_EXITCODE_ENL;
14-
svm->vmcb->control.exit_code_hi = 0;
1520
svm->vmcb->control.exit_info_1 = HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH;
1621
svm->vmcb->control.exit_info_2 = 0;
1722
nested_svm_vmexit(svm);

0 commit comments

Comments
 (0)