All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
@ 2024-05-01  8:51 Michael Roth
  2024-05-01  8:51 ` [PATCH v15 01/20] Revert "KVM: x86: Add gmem hook for determining max NPT mapping level" Michael Roth
                   ` (21 more replies)
  0 siblings, 22 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-01  8:51 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

This patchset is also available at:

  https://github.com/amdese/linux/commits/snp-host-v15

and is based on top of the series:
  
  "Add SEV-ES hypervisor support for GHCB protocol version 2"
  https://lore.kernel.org/kvm/20240501071048.2208265-1-michael.roth@amd.com/
  https://github.com/amdese/linux/commits/sev-init2-ghcb-v1

which in turn is based on commit 20cc50a0410f (just before v14 SNP patches):

  https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=kvm-coco-queue


Patch Layout
------------

01-02: These patches revert+replace the existing .gmem_validate_fault hook
       with a similar .private_max_mapping_level as suggested by Sean[1]

03-04: These patches add some basic infrastructure and introduces a new
       KVM_X86_SNP_VM vm_type to handle differences verses the existing
       KVM_X86_SEV_VM and KVM_X86_SEV_ES_VM types.

05-07: These implement the KVM API to handle the creation of a
       cryptographic launch context, encrypt/measure the initial image
       into guest memory, and finalize it before launching it.

08-12: These implement handling for various guest-generated events such
       as page state changes, onlining of additional vCPUs, etc.

13-16: These implement the gmem/mmu hooks needed to prepare gmem-allocated
       pages before mapping them into guest private memory ranges as
       well as cleaning them up prior to returning them to the host for
       use as normal memory. Because this supplants certain activities
       like issued WBINVDs during KVM MMU invalidations, there's also
       a patch to avoid duplicating that work to avoid unecessary
       overhead.

17:    With all the core support in place, the patch adds a kvm_amd module
       parameter to enable SNP support.

18-20: These patches all deal with the servicing of guest requests to handle
       things like attestation, as well as some related host-management
       interfaces.

[1] https://lore.kernel.org/kvm/ZimnngU7hn7sKoSc@google.com/#t


Testing
-------

For testing this via QEMU, use the following tree:

  https://github.com/amdese/qemu/commits/snp-v4-wip3c

A patched OVMF is also needed due to upstream KVM no longer supporting MMIO
ranges that are mapped as private. It is recommended you build the AmdSevX64
variant as it provides the kernel-hashing support present in this series:

  https://github.com/amdese/ovmf/commits/apic-mmio-fix1d

A basic command-line invocation for SNP would be:

 qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
  -machine q35,confidential-guest-support=sev0,memory-backend=ram1
  -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
  -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=
  -bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d-AmdSevX64.fd

With kernel-hashing and certificate data supplied:

 qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
  -machine q35,confidential-guest-support=sev0,memory-backend=ram1
  -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
  -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=,certs-path=/home/mroth/cert.blob,kernel-hashes=on
  -bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d-AmdSevX64.fd
  -kernel /boot/vmlinuz-$ver
  -initrd /boot/initrd.img-$ver
  -append "root=UUID=d72a6d1c-06cf-4b79-af43-f1bac4f620f9 ro console=ttyS0,115200n8"

With standard X64 OVMF package with separate image for persistent NVRAM:

 qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
  -machine q35,confidential-guest-support=sev0,memory-backend=ram1
  -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
  -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=
  -bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d.fd 
  -drive if=pflash,format=raw,unit=0,file=OVMF_VARS-upstream-20240410-apic-mmio-fix1d.fd,readonly=off


Known issues / TODOs
--------------------

 * Base tree in some cases reports "Unpatched return thunk in use. This should 
   not happen!" the first time it runs an SVM/SEV/SNP guests. This a recent
   regression upstream and unrelated to this series:

     https://lore.kernel.org/linux-kernel/CANpmjNOcKzEvLHoGGeL-boWDHJobwfwyVxUqMq2kWeka3N4tXA@mail.gmail.com/T/

 * 2MB hugepage support has been dropped pending discussion on how we plan to
   re-enable it in gmem.

 * Host kexec should work, but there is a known issue with host kdump support
   while SNP guests are running that will be addressed as a follow-up.

 * SNP kselftests are currently a WIP and will be included as part of SNP
   upstreaming efforts in the near-term.


SEV-SNP Overview
----------------

This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
changes required to add KVM support for SEV-SNP. This series builds upon
SEV-SNP guest support, which is now in mainline, and and SEV-SNP host
initialization support, which is now in linux-next.

While series provides the basic building blocks to support booting the
SEV-SNP VMs, it does not cover all the security enhancement introduced by
the SEV-SNP such as interrupt protection, which will added in the future.

With SNP, when pages are marked as guest-owned in the RMP table, they are
assigned to a specific guest/ASID, as well as a specific GFN with in the
guest. Any attempts to map it in the RMP table to a different guest/ASID,
or a different GFN within a guest/ASID, will result in an RMP nested page
fault.

Prior to accessing a guest-owned page, the guest must validate it with a
special PVALIDATE instruction which will set a special bit in the RMP table
for the guest. This is the only way to set the validated bit outside of the
initial pre-encrypted guest payload/image; any attempts outside the guest to
modify the RMP entry from that point forward will result in the validated
bit being cleared, at which point the guest will trigger an exception if it
attempts to access that page so it can be made aware of possible tampering.

One exception to this is the initial guest payload, which is pre-validated
by the firmware prior to launching. The guest can use Guest Message requests 
to fetch an attestation report which will include the measurement of the
initial image so that the guest can verify it was booted with the expected
image/environment.

After boot, guests can use Page State Change requests to switch pages
between shared/hypervisor-owned and private/guest-owned to share data for
things like DMA, virtio buffers, and other GHCB requests.

In this implementation of SEV-SNP, private guest memory is managed by a new
kernel framework called guest_memfd (gmem). With gmem, a new
KVM_SET_MEMORY_ATTRIBUTES KVM ioctl has been added to tell the KVM
MMU whether a particular GFN should be backed by shared (normal) memory or
private (gmem-allocated) memory. To tie into this, Page State Change
requests are forward to userspace via KVM_EXIT_VMGEXIT exits, which will
then issue the corresponding KVM_SET_MEMORY_ATTRIBUTES call to set the
private/shared state in the KVM MMU.

The gmem / KVM MMU hooks implemented in this series will then update the RMP
table entries for the backing PFNs to set them to guest-owned/private when
mapping private pages into the guest via KVM MMU, or use the normal KVM MMU
handling in the case of shared pages where the corresponding RMP table
entries are left in the default shared/hypervisor-owned state.

Feedback/review is very much appreciated!

-Mike


Changes since v14:

 * switch to vendor-agnostic KVM_HC_MAP_GPA_RANGE exit for forwarding
   page-state change requests to userspace instead of an SNP-specific exit
   (Sean)
 * drop SNP_PAUSE_ATTESTATION/SNP_RESUME_ATTESTATION interfaces, instead
   add handling in KVM_EXIT_VMGEXIT so that VMMs can implement their own
   mechanisms for keeping userspace-supplied certificates in-sync with
   firmware's TCB/endorsement key (Sean)
 * carve out SEV-ES-specific handling for GHCB protocol 2, add control of 
   the protocol version, and post as a separate prereq patchset (Sean)
 * use more consistent error-handling in snp_launch_{start,update,finish},
   simplify logic based on review comments (Sean)
 * rename .gmem_validate_fault to .private_max_mapping_level and rework
   logic based on review suggestions (Sean)
 * reduce number of pr_debug()'s in series, avoid multiple WARN's in
   succession (Sean)
 * improve documentation and comments throughout

Changes since v13:

 * rebase to new kvm-coco-queue and wire up to PFERR_PRIVATE_ACCESS (Paolo)
 * handle setting kvm->arch.has_private_mem in same location as
   kvm->arch.has_protected_state (Paolo)
 * add flags and additional padding fields to
   snp_launch{start,update,finish} APIs to address alignment and
   expandability (Paolo)
 * update snp_launch_update() to update input struct values to reflect
   current progress of command in situations where mulitple calls are
   needed (Paolo)
 * update snp_launch_update() to avoid copying/accessing 'src' parameter
   when dealing with zero pages. (Paolo)
 * update snp_launch_update() to use u64 as length input parameter instead
   of u32 and adjust padding accordingly
 * modify ordering of SNP_POLICY_MASK_* definitions to be consistent with
   bit order of corresponding flags
 * let firmware handle enforcement of policy bits corresponding to
   user-specified minimum API version
 * add missing "0x" prefixs in pr_debug()'s for snp_launch_start()
 * fix handling of VMSAs during in-place migration (Paolo)

Changes since v12:

 * rebased to latest kvm-coco-queue branch (commit 4d2deb62185f)
 * add more input validation for SNP_LAUNCH_START, especially for handling
   things like MBO/MBZ policy bits, and API major/minor minimums. (Paolo)
 * block SNP KVM instances from being able to run legacy SEV commands (Paolo)
 * don't attempt to measure VMSA for vcpu 0/BSP before the others, let
   userspace deal with the ordering just like with SEV-ES (Paolo)
 * fix up docs for SNP_LAUNCH_FINISH (Paolo)
 * introduce svm->sev_es.snp_has_guest_vmsa flag to better distinguish
   handling for guest-mapped vs non-guest-mapped VMSAs, rename
   'snp_ap_create' flag to 'snp_ap_waiting_for_reset' (Paolo)
 * drop "KVM: SEV: Use a VMSA physical address variable for populating VMCB"
   as it is no longer needed due to above VMSA rework
 * replace pr_debug_ratelimited() messages for RMP #NPFs with a single trace
   event
 * handle transient PSMASH_FAIL_INUSE return codes in kvm_gmem_invalidate(),
   switch to WARN_ON*()'s to indicate remaining error cases are not expected
   and should not be seen in practice. (Paolo)
 * add a cond_resched() in kvm_gmem_invalidate() to avoid soft lock-ups when
   cleaning up large guest memory ranges.
 * rename VLEK_REQUIRED to VCEK_DISABLE. it's be more applicable if another
   key type ever gets added.
 * don't allow attestation to be paused while an attestation request is
   being processed by firmware (Tom)
 * add missing Documentation entry for SNP_VLEK_LOAD
 * collect Reviewed-by's from Paolo and Tom


----------------------------------------------------------------
Ashish Kalra (1):
      KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP

Brijesh Singh (8):
      KVM: SEV: Add initial SEV-SNP support
      KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
      KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
      KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
      KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
      KVM: SEV: Add support to handle RMP nested page faults
      KVM: SVM: Add module parameter to enable SEV-SNP
      KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

Michael Roth (10):
      Revert "KVM: x86: Add gmem hook for determining max NPT mapping level"
      KVM: x86: Add hook for determining max NPT mapping level
      KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y
      KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
      KVM: SEV: Add support to handle Page State Change VMGEXIT
      KVM: SEV: Implement gmem hook for initializing private pages
      KVM: SEV: Implement gmem hook for invalidating private pages
      KVM: x86: Implement hook for determining max NPT mapping level
      KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
      crypto: ccp: Add the SNP_VLEK_LOAD command

Tom Lendacky (1):
      KVM: SEV: Support SEV-SNP AP Creation NAE event

 Documentation/virt/coco/sev-guest.rst              |   19 +
 Documentation/virt/kvm/api.rst                     |   87 ++
 .../virt/kvm/x86/amd-memory-encryption.rst         |  110 +-
 arch/x86/include/asm/kvm-x86-ops.h                 |    2 +-
 arch/x86/include/asm/kvm_host.h                    |    5 +-
 arch/x86/include/asm/sev-common.h                  |   25 +
 arch/x86/include/asm/sev.h                         |    3 +
 arch/x86/include/asm/svm.h                         |    9 +-
 arch/x86/include/uapi/asm/kvm.h                    |   48 +
 arch/x86/kvm/Kconfig                               |    3 +
 arch/x86/kvm/mmu.h                                 |    2 -
 arch/x86/kvm/mmu/mmu.c                             |   27 +-
 arch/x86/kvm/svm/sev.c                             | 1538 +++++++++++++++++++-
 arch/x86/kvm/svm/svm.c                             |   44 +-
 arch/x86/kvm/svm/svm.h                             |   52 +
 arch/x86/kvm/trace.h                               |   31 +
 arch/x86/kvm/x86.c                                 |   17 +
 drivers/crypto/ccp/sev-dev.c                       |   36 +
 include/linux/psp-sev.h                            |    4 +-
 include/uapi/linux/kvm.h                           |   23 +
 include/uapi/linux/psp-sev.h                       |   27 +
 include/uapi/linux/sev-guest.h                     |    9 +
 virt/kvm/guest_memfd.c                             |    4 +-
 23 files changed, 2081 insertions(+), 44 deletions(-)


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v15 01/20] Revert "KVM: x86: Add gmem hook for determining max NPT mapping level"
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
@ 2024-05-01  8:51 ` Michael Roth
  2024-05-01  8:51 ` [PATCH v15 02/20] KVM: x86: Add hook for determining max NPT mapping level Michael Roth
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-01  8:51 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

This reverts commit 20cc50a0410f338657e23e77fcc21fee2bc291e6.

As pointed out here[1], this patch has a few issues:
  - the error response could theoretically kill a guest in cases where
    retrying based on mmu_invalidate_seq might have been sufficient and
    so it should purely be a means to find the max mapping level that
    never returns error
  - the gpa/private arguments are not currently needed for anything
  - it's not really a "gmem" hook but uses the same naming convention
    as actual gmem hooks

Revert it so can replaced with a fully-intact replacement patch that
addresses the above.

[1] https://lore.kernel.org/kvm/ZimnngU7hn7sKoSc@google.com/

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm-x86-ops.h | 1 -
 arch/x86/include/asm/kvm_host.h    | 2 --
 arch/x86/kvm/mmu/mmu.c             | 8 --------
 3 files changed, 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 2db87a6fd52a..c81990937ab4 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -140,7 +140,6 @@ KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
 KVM_X86_OP_OPTIONAL(get_untagged_addr)
 KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
 KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
-KVM_X86_OP_OPTIONAL_RET0(gmem_validate_fault)
 KVM_X86_OP_OPTIONAL(gmem_invalidate)
 
 #undef KVM_X86_OP
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4c9d8a22840a..c6c5018376be 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1816,8 +1816,6 @@ struct kvm_x86_ops {
 	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
 	int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
 	void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
-	int (*gmem_validate_fault)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, bool is_private,
-				   u8 *max_level);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index eebb1562c5bc..510eb1117012 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4292,14 +4292,6 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
 			       fault->max_level);
 	fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
 
-	r = static_call(kvm_x86_gmem_validate_fault)(vcpu->kvm, fault->pfn,
-						     fault->gfn, fault->is_private,
-						     &fault->max_level);
-	if (r) {
-		kvm_release_pfn_clean(fault->pfn);
-		return r;
-	}
-
 	return RET_PF_CONTINUE;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v15 02/20] KVM: x86: Add hook for determining max NPT mapping level
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
  2024-05-01  8:51 ` [PATCH v15 01/20] Revert "KVM: x86: Add gmem hook for determining max NPT mapping level" Michael Roth
@ 2024-05-01  8:51 ` Michael Roth
  2024-05-02 23:11   ` Isaku Yamahata
  2024-05-07 17:48   ` Paolo Bonzini
  2024-05-01  8:51 ` [PATCH v15 03/20] KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y Michael Roth
                   ` (19 subsequent siblings)
  21 siblings, 2 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-01  8:51 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

In the case of SEV-SNP, whether or not a 2MB page can be mapped via a
2MB mapping in the guest's nested page table depends on whether or not
any subpages within the range have already been initialized as private
in the RMP table. The existing mixed-attribute tracking in KVM is
insufficient here, for instance:

- gmem allocates 2MB page
- guest issues PVALIDATE on 2MB page
- guest later converts a subpage to shared
- SNP host code issues PSMASH to split 2MB RMP mapping to 4K
- KVM MMU splits NPT mapping to 4K
- guest later converts that shared page back to private

At this point there are no mixed attributes, and KVM would normally
allow for 2MB NPT mappings again, but this is actually not allowed
because the RMP table mappings are 4K and cannot be promoted on the
hypervisor side, so the NPT mappings must still be limited to 4K to
match this.

Add a hook to determine the max NPT mapping size in situations like
this.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/mmu/mmu.c             | 18 ++++++++++++++++--
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index c81990937ab4..566d19b02483 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -140,6 +140,7 @@ KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
 KVM_X86_OP_OPTIONAL(get_untagged_addr)
 KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
 KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
+KVM_X86_OP_OPTIONAL_RET0(private_max_mapping_level)
 KVM_X86_OP_OPTIONAL(gmem_invalidate)
 
 #undef KVM_X86_OP
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c6c5018376be..87265b73906a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1816,6 +1816,7 @@ struct kvm_x86_ops {
 	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
 	int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
 	void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
+	int (*private_max_mapping_level)(struct kvm *kvm, kvm_pfn_t pfn);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 510eb1117012..0d556da052f6 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4271,6 +4271,20 @@ static inline u8 kvm_max_level_for_order(int order)
 	return PG_LEVEL_4K;
 }
 
+static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
+					u8 max_level, int gmem_order)
+{
+	if (max_level == PG_LEVEL_4K)
+		return PG_LEVEL_4K;
+
+	max_level = min(kvm_max_level_for_order(gmem_order), max_level);
+	if (max_level == PG_LEVEL_4K)
+		return PG_LEVEL_4K;
+
+	return min(max_level,
+		   static_call(kvm_x86_private_max_mapping_level)(kvm, pfn));
+}
+
 static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
 				   struct kvm_page_fault *fault)
 {
@@ -4288,9 +4302,9 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
 		return r;
 	}
 
-	fault->max_level = min(kvm_max_level_for_order(max_order),
-			       fault->max_level);
 	fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
+	fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault->pfn,
+							 fault->max_level, max_order);
 
 	return RET_PF_CONTINUE;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v15 03/20] KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
  2024-05-01  8:51 ` [PATCH v15 01/20] Revert "KVM: x86: Add gmem hook for determining max NPT mapping level" Michael Roth
  2024-05-01  8:51 ` [PATCH v15 02/20] KVM: x86: Add hook for determining max NPT mapping level Michael Roth
@ 2024-05-01  8:51 ` Michael Roth
  2024-05-01  8:51 ` [PATCH v15 04/20] KVM: SEV: Add initial SEV-SNP support Michael Roth
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-01  8:51 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

SEV-SNP relies on private memory support to run guests, so make sure to
enable that support via the CONFIG_KVM_GENERIC_PRIVATE_MEM config
option.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index d64fb2b3eb69..5e72faca4e8f 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -136,6 +136,7 @@ config KVM_AMD_SEV
 	depends on KVM_AMD && X86_64
 	depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
 	select ARCH_HAS_CC_PLATFORM
+	select KVM_GENERIC_PRIVATE_MEM
 	help
 	  Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
 	  with Encrypted State (SEV-ES) on AMD processors.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v15 04/20] KVM: SEV: Add initial SEV-SNP support
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (2 preceding siblings ...)
  2024-05-01  8:51 ` [PATCH v15 03/20] KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y Michael Roth
@ 2024-05-01  8:51 ` Michael Roth
  2024-05-01  8:51 ` [PATCH v15 05/20] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command Michael Roth
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-01  8:51 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

SEV-SNP builds upon existing SEV and SEV-ES functionality while adding
new hardware-based security protection. SEV-SNP adds strong memory
encryption and integrity protection to help prevent malicious
hypervisor-based attacks such as data replay, memory re-mapping, and
more, to create an isolated execution environment.

Define a new KVM_X86_SNP_VM type which makes use of these capabilities
and extend the KVM_SEV_INIT2 ioctl to support it. Also add a basic
helper to check whether SNP is enabled and set PFERR_PRIVATE_ACCESS for
private #NPFs so they are handled appropriately by KVM MMU.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/svm.h      |  3 ++-
 arch/x86/include/uapi/asm/kvm.h |  1 +
 arch/x86/kvm/svm/sev.c          | 21 ++++++++++++++++++++-
 arch/x86/kvm/svm/svm.c          |  8 +++++++-
 arch/x86/kvm/svm/svm.h          | 12 ++++++++++++
 5 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 728c98175b9c..544a43c1cf11 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -285,7 +285,8 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
 
 #define AVIC_HPA_MASK	~((0xFFFULL << 52) | 0xFFF)
 
-#define SVM_SEV_FEAT_DEBUG_SWAP                        BIT(5)
+#define SVM_SEV_FEAT_SNP_ACTIVE				BIT(0)
+#define SVM_SEV_FEAT_DEBUG_SWAP				BIT(5)
 
 struct vmcb_seg {
 	u16 selector;
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 9fae1b73b529..d2ae5fcc0275 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -874,5 +874,6 @@ struct kvm_hyperv_eventfd {
 #define KVM_X86_SW_PROTECTED_VM	1
 #define KVM_X86_SEV_VM		2
 #define KVM_X86_SEV_ES_VM	3
+#define KVM_X86_SNP_VM		4
 
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a4bde1193b92..be831e2c06eb 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -47,6 +47,9 @@ module_param_named(sev, sev_enabled, bool, 0444);
 static bool sev_es_enabled = true;
 module_param_named(sev_es, sev_es_enabled, bool, 0444);
 
+/* enable/disable SEV-SNP support */
+static bool sev_snp_enabled;
+
 /* enable/disable SEV-ES DebugSwap support */
 static bool sev_es_debug_swap_enabled = true;
 module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
@@ -288,6 +291,9 @@ static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp,
 	if (sev->es_active && !sev->ghcb_version)
 		sev->ghcb_version = GHCB_VERSION_DEFAULT;
 
+	if (vm_type == KVM_X86_SNP_VM)
+		sev->vmsa_features |= SVM_SEV_FEAT_SNP_ACTIVE;
+
 	ret = sev_asid_new(sev);
 	if (ret)
 		goto e_no_asid;
@@ -348,7 +354,8 @@ static int sev_guest_init2(struct kvm *kvm, struct kvm_sev_cmd *argp)
 		return -EINVAL;
 
 	if (kvm->arch.vm_type != KVM_X86_SEV_VM &&
-	    kvm->arch.vm_type != KVM_X86_SEV_ES_VM)
+	    kvm->arch.vm_type != KVM_X86_SEV_ES_VM &&
+	    kvm->arch.vm_type != KVM_X86_SNP_VM)
 		return -EINVAL;
 
 	if (copy_from_user(&data, u64_to_user_ptr(argp->data), sizeof(data)))
@@ -2328,11 +2335,16 @@ void __init sev_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_SEV_ES);
 		kvm_caps.supported_vm_types |= BIT(KVM_X86_SEV_ES_VM);
 	}
+	if (sev_snp_enabled) {
+		kvm_cpu_cap_set(X86_FEATURE_SEV_SNP);
+		kvm_caps.supported_vm_types |= BIT(KVM_X86_SNP_VM);
+	}
 }
 
 void __init sev_hardware_setup(void)
 {
 	unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
+	bool sev_snp_supported = false;
 	bool sev_es_supported = false;
 	bool sev_supported = false;
 
@@ -2413,6 +2425,7 @@ void __init sev_hardware_setup(void)
 	sev_es_asid_count = min_sev_asid - 1;
 	WARN_ON_ONCE(misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count));
 	sev_es_supported = true;
+	sev_snp_supported = sev_snp_enabled && cc_platform_has(CC_ATTR_HOST_SEV_SNP);
 
 out:
 	if (boot_cpu_has(X86_FEATURE_SEV))
@@ -2425,9 +2438,15 @@ void __init sev_hardware_setup(void)
 		pr_info("SEV-ES %s (ASIDs %u - %u)\n",
 			sev_es_supported ? "enabled" : "disabled",
 			min_sev_asid > 1 ? 1 : 0, min_sev_asid - 1);
+	if (boot_cpu_has(X86_FEATURE_SEV_SNP))
+		pr_info("SEV-SNP %s (ASIDs %u - %u)\n",
+			sev_snp_supported ? "enabled" : "disabled",
+			min_sev_asid > 1 ? 1 : 0, min_sev_asid - 1);
 
 	sev_enabled = sev_supported;
 	sev_es_enabled = sev_es_supported;
+	sev_snp_enabled = sev_snp_supported;
+
 	if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_DEBUG_SWAP) ||
 	    !cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
 		sev_es_debug_swap_enabled = false;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 535018f152a3..422b452fbc3b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2056,6 +2056,9 @@ static int npf_interception(struct kvm_vcpu *vcpu)
 	if (WARN_ON_ONCE(error_code & PFERR_SYNTHETIC_MASK))
 		error_code &= ~PFERR_SYNTHETIC_MASK;
 
+	if (sev_snp_guest(vcpu->kvm) && (error_code & PFERR_GUEST_ENC_MASK))
+		error_code |= PFERR_PRIVATE_ACCESS;
+
 	trace_kvm_page_fault(vcpu, fault_address, error_code);
 	return kvm_mmu_page_fault(vcpu, fault_address, error_code,
 			static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
@@ -4899,8 +4902,11 @@ static int svm_vm_init(struct kvm *kvm)
 
 	if (type != KVM_X86_DEFAULT_VM &&
 	    type != KVM_X86_SW_PROTECTED_VM) {
-		kvm->arch.has_protected_state = (type == KVM_X86_SEV_ES_VM);
+		kvm->arch.has_protected_state =
+			(type == KVM_X86_SEV_ES_VM || type == KVM_X86_SNP_VM);
 		to_kvm_sev_info(kvm)->need_init = true;
+
+		kvm->arch.has_private_mem = (type == KVM_X86_SNP_VM);
 	}
 
 	if (!pause_filter_count || !pause_filter_thresh)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 9ae0c57c7d20..1407acf45a23 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -349,6 +349,18 @@ static __always_inline bool sev_es_guest(struct kvm *kvm)
 #endif
 }
 
+static __always_inline bool sev_snp_guest(struct kvm *kvm)
+{
+#ifdef CONFIG_KVM_AMD_SEV
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+	return (sev->vmsa_features & SVM_SEV_FEAT_SNP_ACTIVE) &&
+	       !WARN_ON_ONCE(!sev_es_guest(kvm));
+#else
+	return false;
+#endif
+}
+
 static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
 {
 	vmcb->control.clean = 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v15 05/20] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (3 preceding siblings ...)
  2024-05-01  8:51 ` [PATCH v15 04/20] KVM: SEV: Add initial SEV-SNP support Michael Roth
@ 2024-05-01  8:51 ` Michael Roth
  2024-05-01  8:51 ` [PATCH v15 06/20] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command Michael Roth
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-01  8:51 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

KVM_SEV_SNP_LAUNCH_START begins the launch process for an SEV-SNP guest.
The command initializes a cryptographic digest context used to construct
the measurement of the guest. Other commands can then at that point be
used to load/encrypt data into the guest's initial launch image.

For more information see the SEV-SNP specification.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    |  28 ++-
 arch/x86/include/uapi/asm/kvm.h               |  11 ++
 arch/x86/kvm/svm/sev.c                        | 176 +++++++++++++++++-
 arch/x86/kvm/svm/svm.h                        |   1 +
 4 files changed, 212 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 9677a0714a39..dd179e162a87 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -466,6 +466,30 @@ issued by the hypervisor to make the guest ready for execution.
 
 Returns: 0 on success, -negative on error
 
+18. KVM_SEV_SNP_LAUNCH_START
+----------------------------
+
+The KVM_SNP_LAUNCH_START command is used for creating the memory encryption
+context for the SEV-SNP guest. It must be called prior to issuing
+KVM_SEV_SNP_LAUNCH_UPDATE or KVM_SEV_SNP_LAUNCH_FINISH;
+
+Parameters (in): struct  kvm_sev_snp_launch_start
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_snp_launch_start {
+                __u64 policy;           /* Guest policy to use. */
+                __u8 gosvw[16];         /* Guest OS visible workarounds. */
+                __u16 flags;            /* Must be zero. */
+                __u8 pad0[6];
+                __u64 pad1[4];
+        };
+
+See SNP_LAUNCH_START in the SEV-SNP specification [snp-fw-abi]_ for further
+details on the input parameters in ``struct kvm_sev_snp_launch_start``.
+
 Device attribute API
 ====================
 
@@ -497,9 +521,11 @@ References
 ==========
 
 
-See [white-paper]_, [api-spec]_, [amd-apm]_ and [kvm-forum]_ for more info.
+See [white-paper]_, [api-spec]_, [amd-apm]_, [kvm-forum]_, and [snp-fw-abi]_
+for more info.
 
 .. [white-paper] https://developer.amd.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
 .. [api-spec] https://support.amd.com/TechDocs/55766_SEV-KM_API_Specification.pdf
 .. [amd-apm] https://support.amd.com/TechDocs/24593.pdf (section 15.34)
 .. [kvm-forum]  https://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf
+.. [snp-fw-abi] https://www.amd.com/system/files/TechDocs/56860.pdf
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index d2ae5fcc0275..693a80ffe40a 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -697,6 +697,9 @@ enum sev_cmd_id {
 	/* Second time is the charm; improved versions of the above ioctls.  */
 	KVM_SEV_INIT2,
 
+	/* SNP-specific commands */
+	KVM_SEV_SNP_LAUNCH_START = 100,
+
 	KVM_SEV_NR_MAX,
 };
 
@@ -824,6 +827,14 @@ struct kvm_sev_receive_update_data {
 	__u32 pad2;
 };
 
+struct kvm_sev_snp_launch_start {
+	__u64 policy;
+	__u8 gosvw[16];
+	__u16 flags;
+	__u8 pad0[6];
+	__u64 pad1[4];
+};
+
 #define KVM_X2APIC_API_USE_32BIT_IDS            (1ULL << 0)
 #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK  (1ULL << 1)
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index be831e2c06eb..4676ce171aaa 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -25,6 +25,7 @@
 #include <asm/fpu/xcr.h>
 #include <asm/fpu/xstate.h>
 #include <asm/debugreg.h>
+#include <asm/sev.h>
 
 #include "mmu.h"
 #include "x86.h"
@@ -59,6 +60,21 @@ static u64 sev_supported_vmsa_features;
 #define AP_RESET_HOLD_NAE_EVENT		1
 #define AP_RESET_HOLD_MSR_PROTO		2
 
+/* As defined by SEV-SNP Firmware ABI, under "Guest Policy". */
+#define SNP_POLICY_MASK_API_MINOR	GENMASK_ULL(7, 0)
+#define SNP_POLICY_MASK_API_MAJOR	GENMASK_ULL(15, 8)
+#define SNP_POLICY_MASK_SMT		BIT_ULL(16)
+#define SNP_POLICY_MASK_RSVD_MBO	BIT_ULL(17)
+#define SNP_POLICY_MASK_DEBUG		BIT_ULL(19)
+#define SNP_POLICY_MASK_SINGLE_SOCKET	BIT_ULL(20)
+
+#define SNP_POLICY_MASK_VALID		(SNP_POLICY_MASK_API_MINOR	| \
+					 SNP_POLICY_MASK_API_MAJOR	| \
+					 SNP_POLICY_MASK_SMT		| \
+					 SNP_POLICY_MASK_RSVD_MBO	| \
+					 SNP_POLICY_MASK_DEBUG		| \
+					 SNP_POLICY_MASK_SINGLE_SOCKET)
+
 static u8 sev_enc_bit;
 static DECLARE_RWSEM(sev_deactivate_lock);
 static DEFINE_MUTEX(sev_bitmap_lock);
@@ -69,6 +85,8 @@ static unsigned int nr_asids;
 static unsigned long *sev_asid_bitmap;
 static unsigned long *sev_reclaim_asid_bitmap;
 
+static int snp_decommission_context(struct kvm *kvm);
+
 struct enc_region {
 	struct list_head list;
 	unsigned long npages;
@@ -95,12 +113,17 @@ static int sev_flush_asids(unsigned int min_asid, unsigned int max_asid)
 	down_write(&sev_deactivate_lock);
 
 	wbinvd_on_all_cpus();
-	ret = sev_guest_df_flush(&error);
+
+	if (sev_snp_enabled)
+		ret = sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, &error);
+	else
+		ret = sev_guest_df_flush(&error);
 
 	up_write(&sev_deactivate_lock);
 
 	if (ret)
-		pr_err("SEV: DF_FLUSH failed, ret=%d, error=%#x\n", ret, error);
+		pr_err("SEV%s: DF_FLUSH failed, ret=%d, error=%#x\n",
+		       sev_snp_enabled ? "-SNP" : "", ret, error);
 
 	return ret;
 }
@@ -1998,6 +2021,106 @@ int sev_dev_get_attr(u32 group, u64 attr, u64 *val)
 	}
 }
 
+/*
+ * The guest context contains all the information, keys and metadata
+ * associated with the guest that the firmware tracks to implement SEV
+ * and SNP features. The firmware stores the guest context in hypervisor
+ * provide page via the SNP_GCTX_CREATE command.
+ */
+static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct sev_data_snp_addr data = {};
+	void *context;
+	int rc;
+
+	/* Allocate memory for context page */
+	context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
+	if (!context)
+		return NULL;
+
+	data.address = __psp_pa(context);
+	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
+	if (rc) {
+		pr_warn("Failed to create SEV-SNP context, rc %d fw_error %d",
+			rc, argp->error);
+		snp_free_firmware_page(context);
+		return NULL;
+	}
+
+	return context;
+}
+
+static int snp_bind_asid(struct kvm *kvm, int *error)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_activate data = {0};
+
+	data.gctx_paddr = __psp_pa(sev->snp_context);
+	data.asid = sev_get_asid(kvm);
+	return sev_issue_cmd(kvm, SEV_CMD_SNP_ACTIVATE, &data, error);
+}
+
+static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_launch_start start = {0};
+	struct kvm_sev_snp_launch_start params;
+	int rc;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (copy_from_user(&params, u64_to_user_ptr(argp->data), sizeof(params)))
+		return -EFAULT;
+
+	/* Don't allow userspace to allocate memory for more than 1 SNP context. */
+	if (sev->snp_context)
+		return -EINVAL;
+
+	sev->snp_context = snp_context_create(kvm, argp);
+	if (!sev->snp_context)
+		return -ENOTTY;
+
+	if (params.flags)
+		return -EINVAL;
+
+	if (params.policy & ~SNP_POLICY_MASK_VALID)
+		return -EINVAL;
+
+	/* Check for policy bits that must be set */
+	if (!(params.policy & SNP_POLICY_MASK_RSVD_MBO) ||
+	    !(params.policy & SNP_POLICY_MASK_SMT))
+		return -EINVAL;
+
+	if (params.policy & SNP_POLICY_MASK_SINGLE_SOCKET)
+		return -EINVAL;
+
+	start.gctx_paddr = __psp_pa(sev->snp_context);
+	start.policy = params.policy;
+	memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
+	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
+	if (rc) {
+		pr_debug("%s: SEV_CMD_SNP_LAUNCH_START firmware command failed, rc %d\n",
+			 __func__, rc);
+		goto e_free_context;
+	}
+
+	sev->fd = argp->sev_fd;
+	rc = snp_bind_asid(kvm, &argp->error);
+	if (rc) {
+		pr_debug("%s: Failed to bind ASID to SEV-SNP context, rc %d\n",
+			 __func__, rc);
+		goto e_free_context;
+	}
+
+	return 0;
+
+e_free_context:
+	snp_decommission_context(kvm);
+
+	return rc;
+}
+
 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -2021,6 +2144,15 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 		goto out;
 	}
 
+	/*
+	 * Once KVM_SEV_INIT2 initializes a KVM instance as an SNP guest, only
+	 * allow the use of SNP-specific commands.
+	 */
+	if (sev_snp_guest(kvm) && sev_cmd.id < KVM_SEV_SNP_LAUNCH_START) {
+		r = -EPERM;
+		goto out;
+	}
+
 	switch (sev_cmd.id) {
 	case KVM_SEV_ES_INIT:
 		if (!sev_es_enabled) {
@@ -2085,6 +2217,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_RECEIVE_FINISH:
 		r = sev_receive_finish(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_START:
+		r = snp_launch_start(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
@@ -2280,6 +2415,31 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 	return ret;
 }
 
+static int snp_decommission_context(struct kvm *kvm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_addr data = {};
+	int ret;
+
+	/* If context is not created then do nothing */
+	if (!sev->snp_context)
+		return 0;
+
+	/* Do the decommision, which will unbind the ASID from the SNP context */
+	data.address = __sme_pa(sev->snp_context);
+	down_write(&sev_deactivate_lock);
+	ret = sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, &data, NULL);
+	up_write(&sev_deactivate_lock);
+
+	if (WARN_ONCE(ret, "Failed to release guest context, ret %d", ret))
+		return ret;
+
+	snp_free_firmware_page(sev->snp_context);
+	sev->snp_context = NULL;
+
+	return 0;
+}
+
 void sev_vm_destroy(struct kvm *kvm)
 {
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -2321,7 +2481,17 @@ void sev_vm_destroy(struct kvm *kvm)
 		}
 	}
 
-	sev_unbind_asid(kvm, sev->handle);
+	if (sev_snp_guest(kvm)) {
+		/*
+		 * Decomission handles unbinding of the ASID. If it fails for
+		 * some unexpected reason, just leak the ASID.
+		 */
+		if (snp_decommission_context(kvm))
+			return;
+	} else {
+		sev_unbind_asid(kvm, sev->handle);
+	}
+
 	sev_asid_free(sev);
 }
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 1407acf45a23..d175059fa7c8 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -93,6 +93,7 @@ struct kvm_sev_info {
 	struct list_head mirror_entry; /* Use as a list entry of mirrors */
 	struct misc_cg *misc_cg; /* For misc cgroup accounting */
 	atomic_t migration_in_progress;
+	void *snp_context;      /* SNP guest context page */
 };
 
 struct kvm_svm {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v15 06/20] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (4 preceding siblings ...)
  2024-05-01  8:51 ` [PATCH v15 05/20] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command Michael Roth
@ 2024-05-01  8:51 ` Michael Roth
  2024-05-01  8:51 ` [PATCH v15 07/20] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command Michael Roth
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-01  8:51 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

A key aspect of a launching an SNP guest is initializing it with a
known/measured payload which is then encrypted into guest memory as
pre-validated private pages and then measured into the cryptographic
launch context created with KVM_SEV_SNP_LAUNCH_START so that the guest
can attest itself after booting.

Since all private pages are provided by guest_memfd, make use of the
kvm_gmem_populate() interface to handle this. The general flow is that
guest_memfd will handle allocating the pages associated with the GPA
ranges being initialized by each particular call of
KVM_SEV_SNP_LAUNCH_UPDATE, copying data from userspace into those pages,
and then the post_populate callback will do the work of setting the
RMP entries for these pages to private and issuing the SNP firmware
calls to encrypt/measure them.

For more information see the SEV-SNP specification.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    |  54 ++++
 arch/x86/include/uapi/asm/kvm.h               |  19 ++
 arch/x86/kvm/svm/sev.c                        | 230 ++++++++++++++++++
 3 files changed, 303 insertions(+)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index dd179e162a87..cc16a7426d18 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -490,6 +490,60 @@ Returns: 0 on success, -negative on error
 See SNP_LAUNCH_START in the SEV-SNP specification [snp-fw-abi]_ for further
 details on the input parameters in ``struct kvm_sev_snp_launch_start``.
 
+19. KVM_SEV_SNP_LAUNCH_UPDATE
+-----------------------------
+
+The KVM_SEV_SNP_LAUNCH_UPDATE command is used for loading userspace-provided
+data into a guest GPA range, measuring the contents into the SNP guest context
+created by KVM_SEV_SNP_LAUNCH_START, and then encrypting/validating that GPA
+range so that it will be immediately readable using the encryption key
+associated with the guest context once it is booted, after which point it can
+attest the measurement associated with its context before unlocking any
+secrets.
+
+It is required that the GPA ranges initialized by this command have had the
+KVM_MEMORY_ATTRIBUTE_PRIVATE attribute set in advance. See the documentation
+for KVM_SET_MEMORY_ATTRIBUTES for more details on this aspect.
+
+Upon success, this command is not guaranteed to have processed the entire
+range requested. Instead, the ``gfn_start``, ``uaddr``, and ``len`` fields of
+``struct kvm_sev_snp_launch_update`` will be updated to correspond to the
+remaining range that has yet to be processed. The caller should continue
+calling this command until those fields indicate the entire range has been
+processed, e.g. ``len`` is 0, ``gfn_start`` is equal to the last GFN in the
+range plus 1, and ``uaddr`` is the last byte of the userspace-provided source
+buffer address plus 1. In the case where ``type`` is KVM_SEV_SNP_PAGE_TYPE_ZERO,
+``uaddr`` will be ignored completely.
+
+Parameters (in): struct  kvm_sev_snp_launch_update
+
+Returns: 0 on success, < 0 on error, -EAGAIN if caller should retry
+
+::
+
+        struct kvm_sev_snp_launch_update {
+                __u64 gfn_start;        /* Guest page number to load/encrypt data into. */
+                __u64 uaddr;            /* Userspace address of data to be loaded/encrypted. */
+                __u64 len;              /* 4k-aligned length in bytes to copy into guest memory.*/
+                __u8 type;              /* The type of the guest pages being initialized. */
+                __u8 pad0;
+                __u16 flags;            /* Must be zero. */
+                __u32 pad1;
+                __u64 pad2[4];
+
+        };
+
+where the allowed values for page_type are #define'd as::
+
+        KVM_SEV_SNP_PAGE_TYPE_NORMAL
+        KVM_SEV_SNP_PAGE_TYPE_ZERO
+        KVM_SEV_SNP_PAGE_TYPE_UNMEASURED
+        KVM_SEV_SNP_PAGE_TYPE_SECRETS
+        KVM_SEV_SNP_PAGE_TYPE_CPUID
+
+See the SEV-SNP spec [snp-fw-abi]_ for further details on how each page type is
+used/measured.
+
 Device attribute API
 ====================
 
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 693a80ffe40a..5935dc8a7e02 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -699,6 +699,7 @@ enum sev_cmd_id {
 
 	/* SNP-specific commands */
 	KVM_SEV_SNP_LAUNCH_START = 100,
+	KVM_SEV_SNP_LAUNCH_UPDATE,
 
 	KVM_SEV_NR_MAX,
 };
@@ -835,6 +836,24 @@ struct kvm_sev_snp_launch_start {
 	__u64 pad1[4];
 };
 
+/* Kept in sync with firmware values for simplicity. */
+#define KVM_SEV_SNP_PAGE_TYPE_NORMAL		0x1
+#define KVM_SEV_SNP_PAGE_TYPE_ZERO		0x3
+#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED	0x4
+#define KVM_SEV_SNP_PAGE_TYPE_SECRETS		0x5
+#define KVM_SEV_SNP_PAGE_TYPE_CPUID		0x6
+
+struct kvm_sev_snp_launch_update {
+	__u64 gfn_start;
+	__u64 uaddr;
+	__u64 len;
+	__u8 type;
+	__u8 pad0;
+	__u16 flags;
+	__u32 pad1;
+	__u64 pad2[4];
+};
+
 #define KVM_X2APIC_API_USE_32BIT_IDS            (1ULL << 0)
 #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK  (1ULL << 1)
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4676ce171aaa..f31f87655a67 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -259,6 +259,45 @@ static void sev_decommission(unsigned int handle)
 	sev_guest_decommission(&decommission, NULL);
 }
 
+/*
+ * Certain page-states, such as Pre-Guest and Firmware pages (as documented
+ * in Chapter 5 of the SEV-SNP Firmware ABI under "Page States") cannot be
+ * directly transitioned back to normal/hypervisor-owned state via RMPUPDATE
+ * unless they are reclaimed first.
+ *
+ * Until they are reclaimed and subsequently transitioned via RMPUPDATE, they
+ * might not be usable by the host due to being set as immutable or still
+ * being associated with a guest ASID.
+ */
+static int snp_page_reclaim(u64 pfn)
+{
+	struct sev_data_snp_page_reclaim data = {0};
+	int err, rc;
+
+	data.paddr = __sme_set(pfn << PAGE_SHIFT);
+	rc = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+	if (WARN_ONCE(rc, "Failed to reclaim PFN %llx", pfn))
+		snp_leak_pages(pfn, 1);
+
+	return rc;
+}
+
+/*
+ * Transition a page to hypervisor-owned/shared state in the RMP table. This
+ * should not fail under normal conditions, but leak the page should that
+ * happen since it will no longer be usable by the host due to RMP protections.
+ */
+static int host_rmp_make_shared(u64 pfn, enum pg_level level)
+{
+	int rc;
+
+	rc = rmp_make_shared(pfn, level);
+	if (WARN_ON_ONCE(rc))
+		snp_leak_pages(pfn, page_level_size(level) >> PAGE_SHIFT);
+
+	return rc;
+}
+
 static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
 {
 	struct sev_data_deactivate deactivate;
@@ -2121,6 +2160,194 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return rc;
 }
 
+struct sev_gmem_populate_args {
+	__u8 type;
+	int sev_fd;
+	int fw_error;
+};
+
+static int sev_gmem_post_populate(struct kvm *kvm, gfn_t gfn_start, kvm_pfn_t pfn,
+				  void __user *src, int order, void *opaque)
+{
+	struct sev_gmem_populate_args *sev_populate_args = opaque;
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	int n_private = 0, ret, i;
+	int npages = (1 << order);
+	gfn_t gfn;
+
+	if (WARN_ON_ONCE(sev_populate_args->type != KVM_SEV_SNP_PAGE_TYPE_ZERO && !src))
+		return -EINVAL;
+
+	for (gfn = gfn_start, i = 0; gfn < gfn_start + npages; gfn++, i++) {
+		struct sev_data_snp_launch_update fw_args = {0};
+		bool assigned;
+		int level;
+
+		if (!kvm_mem_is_private(kvm, gfn)) {
+			pr_debug("%s: Failed to ensure GFN 0x%llx has private memory attribute set\n",
+				 __func__, gfn);
+			ret = -EINVAL;
+			goto err;
+		}
+
+		ret = snp_lookup_rmpentry((u64)pfn + i, &assigned, &level);
+		if (ret || assigned) {
+			pr_debug("%s: Failed to ensure GFN 0x%llx RMP entry is initial shared state, ret: %d assigned: %d\n",
+				 __func__, gfn, ret, assigned);
+			ret = -EINVAL;
+			goto err;
+		}
+
+		if (src) {
+			void *vaddr = kmap_local_pfn(pfn + i);
+
+			ret = copy_from_user(vaddr, src + i * PAGE_SIZE, PAGE_SIZE);
+			if (ret)
+				goto err;
+			kunmap_local(vaddr);
+		}
+
+		ret = rmp_make_private(pfn + i, gfn << PAGE_SHIFT, PG_LEVEL_4K,
+				       sev_get_asid(kvm), true);
+		if (ret)
+			goto err;
+
+		n_private++;
+
+		fw_args.gctx_paddr = __psp_pa(sev->snp_context);
+		fw_args.address = __sme_set(pfn_to_hpa(pfn + i));
+		fw_args.page_size = PG_LEVEL_TO_RMP(PG_LEVEL_4K);
+		fw_args.page_type = sev_populate_args->type;
+
+		ret = __sev_issue_cmd(sev_populate_args->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+				      &fw_args, &sev_populate_args->fw_error);
+		if (ret)
+			goto fw_err;
+	}
+
+	return 0;
+
+fw_err:
+	/*
+	 * If the firmware command failed handle the reclaim and cleanup of that
+	 * PFN specially vs. prior pages which can be cleaned up below without
+	 * needing to reclaim in advance.
+	 *
+	 * Additionally, when invalid CPUID function entries are detected,
+	 * firmware writes the expected values into the page and leaves it
+	 * unencrypted so it can be used for debugging and error-reporting.
+	 *
+	 * Copy this page back into the source buffer so userspace can use this
+	 * information to provide information on which CPUID leaves/fields
+	 * failed CPUID validation.
+	 */
+	if (!snp_page_reclaim(pfn + i) && !host_rmp_make_shared(pfn + i, PG_LEVEL_4K) &&
+	    sev_populate_args->type == KVM_SEV_SNP_PAGE_TYPE_CPUID &&
+	    sev_populate_args->fw_error == SEV_RET_INVALID_PARAM) {
+		void *vaddr = kmap_local_pfn(pfn + i);
+
+		if (copy_to_user(src + i * PAGE_SIZE, vaddr, PAGE_SIZE))
+			pr_debug("Failed to write CPUID page back to userspace\n");
+
+		kunmap_local(vaddr);
+	}
+
+	/* pfn + i is hypervisor-owned now, so skip below cleanup for it. */
+	n_private--;
+
+err:
+	pr_debug("%s: exiting with error ret %d (fw_error %d), restoring %d gmem PFNs to shared.\n",
+		 __func__, ret, sev_populate_args->fw_error, n_private);
+	for (i = 0; i < n_private; i++)
+		host_rmp_make_shared(pfn + i, PG_LEVEL_4K);
+
+	return ret;
+}
+
+static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_gmem_populate_args sev_populate_args = {0};
+	struct kvm_sev_snp_launch_update params;
+	struct kvm_memory_slot *memslot;
+	long npages, count;
+	void __user *src;
+	int ret = 0;
+
+	if (!sev_snp_guest(kvm) || !sev->snp_context)
+		return -EINVAL;
+
+	if (copy_from_user(&params, u64_to_user_ptr(argp->data), sizeof(params)))
+		return -EFAULT;
+
+	pr_debug("%s: GFN start 0x%llx length 0x%llx type %d flags %d\n", __func__,
+		 params.gfn_start, params.len, params.type, params.flags);
+
+	if (!PAGE_ALIGNED(params.len) || params.flags ||
+	    (params.type != KVM_SEV_SNP_PAGE_TYPE_NORMAL &&
+	     params.type != KVM_SEV_SNP_PAGE_TYPE_ZERO &&
+	     params.type != KVM_SEV_SNP_PAGE_TYPE_UNMEASURED &&
+	     params.type != KVM_SEV_SNP_PAGE_TYPE_SECRETS &&
+	     params.type != KVM_SEV_SNP_PAGE_TYPE_CPUID))
+		return -EINVAL;
+
+	npages = params.len / PAGE_SIZE;
+
+	/*
+	 * For each GFN that's being prepared as part of the initial guest
+	 * state, the following pre-conditions are verified:
+	 *
+	 *   1) The backing memslot is a valid private memslot.
+	 *   2) The GFN has been set to private via KVM_SET_MEMORY_ATTRIBUTES
+	 *      beforehand.
+	 *   3) The PFN of the guest_memfd has not already been set to private
+	 *      in the RMP table.
+	 *
+	 * The KVM MMU relies on kvm->mmu_invalidate_seq to retry nested page
+	 * faults if there's a race between a fault and an attribute update via
+	 * KVM_SET_MEMORY_ATTRIBUTES, and a similar approach could be utilized
+	 * here. However, kvm->slots_lock guards against both this as well as
+	 * concurrent memslot updates occurring while these checks are being
+	 * performed, so use that here to make it easier to reason about the
+	 * initial expected state and better guard against unexpected
+	 * situations.
+	 */
+	mutex_lock(&kvm->slots_lock);
+
+	memslot = gfn_to_memslot(kvm, params.gfn_start);
+	if (!kvm_slot_can_be_private(memslot)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	sev_populate_args.sev_fd = argp->sev_fd;
+	sev_populate_args.type = params.type;
+	src = params.type == KVM_SEV_SNP_PAGE_TYPE_ZERO ? NULL : u64_to_user_ptr(params.uaddr);
+
+	count = kvm_gmem_populate(kvm, params.gfn_start, src, npages,
+				  sev_gmem_post_populate, &sev_populate_args);
+	if (count < 0) {
+		argp->error = sev_populate_args.fw_error;
+		pr_debug("%s: kvm_gmem_populate failed, ret %ld (fw_error %d)\n",
+			 __func__, count, argp->error);
+		ret = -EIO;
+	} else {
+		params.gfn_start += count;
+		params.len -= count * PAGE_SIZE;
+		if (params.type != KVM_SEV_SNP_PAGE_TYPE_ZERO)
+			params.uaddr += count * PAGE_SIZE;
+
+		ret = 0;
+		if (copy_to_user(u64_to_user_ptr(argp->data), &params, sizeof(params)))
+			ret = -EFAULT;
+	}
+
+out:
+	mutex_unlock(&kvm->slots_lock);
+
+	return ret;
+}
+
 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -2220,6 +2447,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SNP_LAUNCH_START:
 		r = snp_launch_start(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_UPDATE:
+		r = snp_launch_update(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v15 07/20] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (5 preceding siblings ...)
  2024-05-01  8:51 ` [PATCH v15 06/20] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command Michael Roth
@ 2024-05-01  8:51 ` Michael Roth
  2024-05-01  8:51 ` [PATCH v15 08/20] KVM: SEV: Add support to handle GHCB GPA register VMGEXIT Michael Roth
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-01  8:51 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh, Harald Hoyer

From: Brijesh Singh <brijesh.singh@amd.com>

Add a KVM_SEV_SNP_LAUNCH_FINISH command to finalize the cryptographic
launch digest which stores the measurement of the guest at launch time.
Also extend the existing SNP firmware data structures to support
disabling the use of Versioned Chip Endorsement Keys (VCEK) by guests as
part of this command.

While finalizing the launch flow, the code also issues the LAUNCH_UPDATE
SNP firmware commands to encrypt/measure the initial VMSA pages for each
configured vCPU, which requires setting the RMP entries for those pages
to private, so also add handling to clean up the RMP entries for these
pages whening freeing vCPUs during shutdown.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Harald Hoyer <harald@profian.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    |  28 ++++
 arch/x86/include/uapi/asm/kvm.h               |  17 +++
 arch/x86/kvm/svm/sev.c                        | 127 ++++++++++++++++++
 include/linux/psp-sev.h                       |   4 +-
 4 files changed, 175 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index cc16a7426d18..1ddb6a86ce7f 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -544,6 +544,34 @@ where the allowed values for page_type are #define'd as::
 See the SEV-SNP spec [snp-fw-abi]_ for further details on how each page type is
 used/measured.
 
+20. KVM_SEV_SNP_LAUNCH_FINISH
+-----------------------------
+
+After completion of the SNP guest launch flow, the KVM_SEV_SNP_LAUNCH_FINISH
+command can be issued to make the guest ready for execution.
+
+Parameters (in): struct kvm_sev_snp_launch_finish
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_snp_launch_finish {
+                __u64 id_block_uaddr;
+                __u64 id_auth_uaddr;
+                __u8 id_block_en;
+                __u8 auth_key_en;
+                __u8 vcek_disabled;
+                __u8 host_data[32];
+                __u8 pad0[3];
+                __u16 flags;                    /* Must be zero */
+                __u64 pad1[4];
+        };
+
+
+See SNP_LAUNCH_FINISH in the SEV-SNP specification [snp-fw-abi]_ for further
+details on the input parameters in ``struct kvm_sev_snp_launch_finish``.
+
 Device attribute API
 ====================
 
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 5935dc8a7e02..988b5204d636 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -700,6 +700,7 @@ enum sev_cmd_id {
 	/* SNP-specific commands */
 	KVM_SEV_SNP_LAUNCH_START = 100,
 	KVM_SEV_SNP_LAUNCH_UPDATE,
+	KVM_SEV_SNP_LAUNCH_FINISH,
 
 	KVM_SEV_NR_MAX,
 };
@@ -854,6 +855,22 @@ struct kvm_sev_snp_launch_update {
 	__u64 pad2[4];
 };
 
+#define KVM_SEV_SNP_ID_BLOCK_SIZE	96
+#define KVM_SEV_SNP_ID_AUTH_SIZE	4096
+#define KVM_SEV_SNP_FINISH_DATA_SIZE	32
+
+struct kvm_sev_snp_launch_finish {
+	__u64 id_block_uaddr;
+	__u64 id_auth_uaddr;
+	__u8 id_block_en;
+	__u8 auth_key_en;
+	__u8 vcek_disabled;
+	__u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
+	__u8 pad0[3];
+	__u16 flags;
+	__u64 pad1[4];
+};
+
 #define KVM_X2APIC_API_USE_32BIT_IDS            (1ULL << 0)
 #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK  (1ULL << 1)
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index f31f87655a67..797230f810f8 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -75,6 +75,8 @@ static u64 sev_supported_vmsa_features;
 					 SNP_POLICY_MASK_DEBUG		| \
 					 SNP_POLICY_MASK_SINGLE_SOCKET)
 
+#define INITIAL_VMSA_GPA 0xFFFFFFFFF000
+
 static u8 sev_enc_bit;
 static DECLARE_RWSEM(sev_deactivate_lock);
 static DEFINE_MUTEX(sev_bitmap_lock);
@@ -2348,6 +2350,115 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_launch_update data = {};
+	struct kvm_vcpu *vcpu;
+	unsigned long i;
+	int ret;
+
+	data.gctx_paddr = __psp_pa(sev->snp_context);
+	data.page_type = SNP_PAGE_TYPE_VMSA;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		struct vcpu_svm *svm = to_svm(vcpu);
+		u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+		ret = sev_es_sync_vmsa(svm);
+		if (ret)
+			return ret;
+
+		/* Transition the VMSA page to a firmware state. */
+		ret = rmp_make_private(pfn, INITIAL_VMSA_GPA, PG_LEVEL_4K, sev->asid, true);
+		if (ret)
+			return ret;
+
+		/* Issue the SNP command to encrypt the VMSA */
+		data.address = __sme_pa(svm->sev_es.vmsa);
+		ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+				      &data, &argp->error);
+		if (ret) {
+			if (!snp_page_reclaim(pfn))
+				host_rmp_make_shared(pfn, PG_LEVEL_4K);
+
+			return ret;
+		}
+
+		svm->vcpu.arch.guest_state_protected = true;
+	}
+
+	return 0;
+}
+
+static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct kvm_sev_snp_launch_finish params;
+	struct sev_data_snp_launch_finish *data;
+	void *id_block = NULL, *id_auth = NULL;
+	int ret;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (!sev->snp_context)
+		return -EINVAL;
+
+	if (copy_from_user(&params, u64_to_user_ptr(argp->data), sizeof(params)))
+		return -EFAULT;
+
+	if (params.flags)
+		return -EINVAL;
+
+	/* Measure all vCPUs using LAUNCH_UPDATE before finalizing the launch flow. */
+	ret = snp_launch_update_vmsa(kvm, argp);
+	if (ret)
+		return ret;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+	if (!data)
+		return -ENOMEM;
+
+	if (params.id_block_en) {
+		id_block = psp_copy_user_blob(params.id_block_uaddr, KVM_SEV_SNP_ID_BLOCK_SIZE);
+		if (IS_ERR(id_block)) {
+			ret = PTR_ERR(id_block);
+			goto e_free;
+		}
+
+		data->id_block_en = 1;
+		data->id_block_paddr = __sme_pa(id_block);
+
+		id_auth = psp_copy_user_blob(params.id_auth_uaddr, KVM_SEV_SNP_ID_AUTH_SIZE);
+		if (IS_ERR(id_auth)) {
+			ret = PTR_ERR(id_auth);
+			goto e_free_id_block;
+		}
+
+		data->id_auth_paddr = __sme_pa(id_auth);
+
+		if (params.auth_key_en)
+			data->auth_key_en = 1;
+	}
+
+	data->vcek_disabled = params.vcek_disabled;
+
+	memcpy(data->host_data, params.host_data, KVM_SEV_SNP_FINISH_DATA_SIZE);
+	data->gctx_paddr = __psp_pa(sev->snp_context);
+	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
+
+	kfree(id_auth);
+
+e_free_id_block:
+	kfree(id_block);
+
+e_free:
+	kfree(data);
+
+	return ret;
+}
+
 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -2450,6 +2561,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SNP_LAUNCH_UPDATE:
 		r = snp_launch_update(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_FINISH:
+		r = snp_launch_finish(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
@@ -2940,11 +3054,24 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
 
 	svm = to_svm(vcpu);
 
+	/*
+	 * If it's an SNP guest, then the VMSA was marked in the RMP table as
+	 * a guest-owned page. Transition the page to hypervisor state before
+	 * releasing it back to the system.
+	 */
+	if (sev_snp_guest(vcpu->kvm)) {
+		u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+		if (host_rmp_make_shared(pfn, PG_LEVEL_4K))
+			goto skip_vmsa_free;
+	}
+
 	if (vcpu->arch.guest_state_protected)
 		sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa);
 
 	__free_page(virt_to_page(svm->sev_es.vmsa));
 
+skip_vmsa_free:
 	if (svm->sev_es.ghcb_sa_free)
 		kvfree(svm->sev_es.ghcb_sa);
 }
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 3705c2044fc0..903ddfea8585 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -658,6 +658,7 @@ struct sev_data_snp_launch_update {
  * @id_auth_paddr: system physical address of ID block authentication structure
  * @id_block_en: indicates whether ID block is present
  * @auth_key_en: indicates whether author key is present in authentication structure
+ * @vcek_disabled: indicates whether use of VCEK is allowed for attestation reports
  * @rsvd: reserved
  * @host_data: host-supplied data for guest, not interpreted by firmware
  */
@@ -667,7 +668,8 @@ struct sev_data_snp_launch_finish {
 	u64 id_auth_paddr;
 	u8 id_block_en:1;
 	u8 auth_key_en:1;
-	u64 rsvd:62;
+	u8 vcek_disabled:1;
+	u64 rsvd:61;
 	u8 host_data[32];
 } __packed;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v15 08/20] KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (6 preceding siblings ...)
  2024-05-01  8:51 ` [PATCH v15 07/20] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command Michael Roth
@ 2024-05-01  8:51 ` Michael Roth
  2024-05-01  8:51 ` [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT Michael Roth
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-01  8:51 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

SEV-SNP guests are required to perform a GHCB GPA registration. Before
using a GHCB GPA for a vCPU the first time, a guest must register the
vCPU GHCB GPA. If hypervisor can work with the guest requested GPA then
it must respond back with the same GPA otherwise return -1.

On VMEXIT, verify that the GHCB GPA matches with the registered value.
If a mismatch is detected, then abort the guest.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev-common.h |  8 ++++++
 arch/x86/kvm/svm/sev.c            | 48 +++++++++++++++++++++++++++----
 arch/x86/kvm/svm/svm.h            |  7 +++++
 3 files changed, 57 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 5a8246dd532f..1006bfffe07a 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -59,6 +59,14 @@
 #define GHCB_MSR_AP_RESET_HOLD_RESULT_POS	12
 #define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK	GENMASK_ULL(51, 0)
 
+/* Preferred GHCB GPA Request */
+#define GHCB_MSR_PREF_GPA_REQ		0x010
+#define GHCB_MSR_GPA_VALUE_POS		12
+#define GHCB_MSR_GPA_VALUE_MASK		GENMASK_ULL(51, 0)
+
+#define GHCB_MSR_PREF_GPA_RESP		0x011
+#define GHCB_MSR_PREF_GPA_NONE		0xfffffffffffff
+
 /* GHCB GPA Register */
 #define GHCB_MSR_REG_GPA_REQ		0x012
 #define GHCB_MSR_REG_GPA_REQ_VAL(v)			\
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 797230f810f8..e1ac5af4cb74 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3540,6 +3540,32 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 		set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
 				  GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
 		break;
+	case GHCB_MSR_PREF_GPA_REQ:
+		if (!sev_snp_guest(vcpu->kvm))
+			goto out_terminate;
+
+		set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_NONE, GHCB_MSR_GPA_VALUE_MASK,
+				  GHCB_MSR_GPA_VALUE_POS);
+		set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_RESP, GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
+	case GHCB_MSR_REG_GPA_REQ: {
+		u64 gfn;
+
+		if (!sev_snp_guest(vcpu->kvm))
+			goto out_terminate;
+
+		gfn = get_ghcb_msr_bits(svm, GHCB_MSR_GPA_VALUE_MASK,
+					GHCB_MSR_GPA_VALUE_POS);
+
+		svm->sev_es.ghcb_registered_gpa = gfn_to_gpa(gfn);
+
+		set_ghcb_msr_bits(svm, gfn, GHCB_MSR_GPA_VALUE_MASK,
+				  GHCB_MSR_GPA_VALUE_POS);
+		set_ghcb_msr_bits(svm, GHCB_MSR_REG_GPA_RESP, GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
+	}
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
@@ -3552,12 +3578,7 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 		pr_info("SEV-ES guest requested termination: %#llx:%#llx\n",
 			reason_set, reason_code);
 
-		vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
-		vcpu->run->system_event.type = KVM_SYSTEM_EVENT_SEV_TERM;
-		vcpu->run->system_event.ndata = 1;
-		vcpu->run->system_event.data[0] = control->ghcb_gpa;
-
-		return 0;
+		goto out_terminate;
 	}
 	default:
 		/* Error, keep GHCB MSR value as-is */
@@ -3568,6 +3589,14 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 					    control->ghcb_gpa, ret);
 
 	return ret;
+
+out_terminate:
+	vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+	vcpu->run->system_event.type = KVM_SYSTEM_EVENT_SEV_TERM;
+	vcpu->run->system_event.ndata = 1;
+	vcpu->run->system_event.data[0] = control->ghcb_gpa;
+
+	return 0;
 }
 
 int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
@@ -3603,6 +3632,13 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 	trace_kvm_vmgexit_enter(vcpu->vcpu_id, svm->sev_es.ghcb);
 
 	sev_es_sync_from_ghcb(svm);
+
+	/* SEV-SNP guest requires that the GHCB GPA must be registered */
+	if (sev_snp_guest(svm->vcpu.kvm) && !ghcb_gpa_is_registered(svm, ghcb_gpa)) {
+		vcpu_unimpl(&svm->vcpu, "vmgexit: GHCB GPA [%#llx] is not registered.\n", ghcb_gpa);
+		return -EINVAL;
+	}
+
 	ret = sev_es_validate_vmgexit(svm);
 	if (ret)
 		return ret;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index d175059fa7c8..bbfbeed4c676 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -209,6 +209,8 @@ struct vcpu_sev_es_state {
 	u32 ghcb_sa_len;
 	bool ghcb_sa_sync;
 	bool ghcb_sa_free;
+
+	u64 ghcb_registered_gpa;
 };
 
 struct vcpu_svm {
@@ -362,6 +364,11 @@ static __always_inline bool sev_snp_guest(struct kvm *kvm)
 #endif
 }
 
+static inline bool ghcb_gpa_is_registered(struct vcpu_svm *svm, u64 val)
+{
+	return svm->sev_es.ghcb_registered_gpa == val;
+}
+
 static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
 {
 	vmcb->control.clean = 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (7 preceding siblings ...)
  2024-05-01  8:51 ` [PATCH v15 08/20] KVM: SEV: Add support to handle GHCB GPA register VMGEXIT Michael Roth
@ 2024-05-01  8:51 ` Michael Roth
  2024-05-16  8:28   ` Binbin Wu
  2024-05-01  8:52 ` [PATCH v15 10/20] KVM: SEV: Add support to handle " Michael Roth
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Michael Roth @ 2024-05-01  8:51 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change MSR protocol
as defined in the GHCB specification.

When using gmem, private/shared memory is allocated through separate
pools, and KVM relies on userspace issuing a KVM_SET_MEMORY_ATTRIBUTES
KVM ioctl to tell the KVM MMU whether or not a particular GFN should be
backed by private memory or not.

Forward these page state change requests to userspace so that it can
issue the expected KVM ioctls. The KVM MMU will handle updating the RMP
entries when it is ready to map a private page into a guest.

Use the existing KVM_HC_MAP_GPA_RANGE hypercall format to deliver these
requests to userspace via KVM_EXIT_HYPERCALL.

Signed-off-by: Michael Roth <michael.roth@amd.com>
Co-developed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/sev-common.h |  6 ++++
 arch/x86/kvm/svm/sev.c            | 48 +++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 1006bfffe07a..6d68db812de1 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -101,11 +101,17 @@ enum psc_op {
 	/* GHCBData[11:0] */				\
 	GHCB_MSR_PSC_REQ)
 
+#define GHCB_MSR_PSC_REQ_TO_GFN(msr) (((msr) & GENMASK_ULL(51, 12)) >> 12)
+#define GHCB_MSR_PSC_REQ_TO_OP(msr) (((msr) & GENMASK_ULL(55, 52)) >> 52)
+
 #define GHCB_MSR_PSC_RESP		0x015
 #define GHCB_MSR_PSC_RESP_VAL(val)			\
 	/* GHCBData[63:32] */				\
 	(((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
 
+/* Set highest bit as a generic error response */
+#define GHCB_MSR_PSC_RESP_ERROR (BIT_ULL(63) | GHCB_MSR_PSC_RESP)
+
 /* GHCB Hypervisor Feature Request/Response */
 #define GHCB_MSR_HV_FT_REQ		0x080
 #define GHCB_MSR_HV_FT_RESP		0x081
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e1ac5af4cb74..720775c9d0b8 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3461,6 +3461,48 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
 	svm->vmcb->control.ghcb_gpa = value;
 }
 
+static int snp_complete_psc_msr(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	if (vcpu->run->hypercall.ret)
+		set_ghcb_msr(svm, GHCB_MSR_PSC_RESP_ERROR);
+	else
+		set_ghcb_msr(svm, GHCB_MSR_PSC_RESP);
+
+	return 1; /* resume guest */
+}
+
+static int snp_begin_psc_msr(struct vcpu_svm *svm, u64 ghcb_msr)
+{
+	u64 gpa = gfn_to_gpa(GHCB_MSR_PSC_REQ_TO_GFN(ghcb_msr));
+	u8 op = GHCB_MSR_PSC_REQ_TO_OP(ghcb_msr);
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+
+	if (op != SNP_PAGE_STATE_PRIVATE && op != SNP_PAGE_STATE_SHARED) {
+		set_ghcb_msr(svm, GHCB_MSR_PSC_RESP_ERROR);
+		return 1; /* resume guest */
+	}
+
+	if (!(vcpu->kvm->arch.hypercall_exit_enabled & (1 << KVM_HC_MAP_GPA_RANGE))) {
+		set_ghcb_msr(svm, GHCB_MSR_PSC_RESP_ERROR);
+		return 1; /* resume guest */
+	}
+
+	vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
+	vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
+	vcpu->run->hypercall.args[0] = gpa;
+	vcpu->run->hypercall.args[1] = 1;
+	vcpu->run->hypercall.args[2] = (op == SNP_PAGE_STATE_PRIVATE)
+				       ? KVM_MAP_GPA_RANGE_ENCRYPTED
+				       : KVM_MAP_GPA_RANGE_DECRYPTED;
+	vcpu->run->hypercall.args[2] |= KVM_MAP_GPA_RANGE_PAGE_SZ_4K;
+
+	vcpu->arch.complete_userspace_io = snp_complete_psc_msr;
+
+	return 0; /* forward request to userspace */
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3566,6 +3608,12 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_POS);
 		break;
 	}
+	case GHCB_MSR_PSC_REQ:
+		if (!sev_snp_guest(vcpu->kvm))
+			goto out_terminate;
+
+		ret = snp_begin_psc_msr(svm, control->ghcb_gpa);
+		break;
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v15 10/20] KVM: SEV: Add support to handle Page State Change VMGEXIT
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (8 preceding siblings ...)
  2024-05-01  8:51 ` [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT Michael Roth
@ 2024-05-01  8:52 ` Michael Roth
  2024-05-01  8:52 ` [PATCH v15 11/20] KVM: SEV: Add support to handle RMP nested page faults Michael Roth
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-01  8:52 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change NAE event
as defined in the GHCB specification version 2.

Forward these requests to userspace as KVM_EXIT_VMGEXITs, similar to how
it is done for requests that don't use a GHCB page.

As with the MSR-based page-state changes, use the existing
KVM_HC_MAP_GPA_RANGE hypercall format to deliver these requests to
userspace via KVM_EXIT_HYPERCALL.

Signed-off-by: Michael Roth <michael.roth@amd.com>
Co-developed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/sev-common.h |  11 ++
 arch/x86/kvm/svm/sev.c            | 180 ++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.h            |   5 +
 3 files changed, 196 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 6d68db812de1..8647cc05e2f4 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -129,8 +129,19 @@ enum psc_op {
  *   The VMGEXIT_PSC_MAX_ENTRY determines the size of the PSC structure, which
  *   is a local stack variable in set_pages_state(). Do not increase this value
  *   without evaluating the impact to stack usage.
+ *
+ *   Use VMGEXIT_PSC_MAX_COUNT in cases where the actual GHCB-defined max value
+ *   is needed, such as when processing GHCB requests on the hypervisor side.
  */
 #define VMGEXIT_PSC_MAX_ENTRY		64
+#define VMGEXIT_PSC_MAX_COUNT		253
+
+#define VMGEXIT_PSC_ERROR_GENERIC	(0x100UL << 32)
+#define VMGEXIT_PSC_ERROR_INVALID_HDR	((1UL << 32) | 1)
+#define VMGEXIT_PSC_ERROR_INVALID_ENTRY	((1UL << 32) | 2)
+
+#define VMGEXIT_PSC_OP_PRIVATE		1
+#define VMGEXIT_PSC_OP_SHARED		2
 
 struct psc_hdr {
 	u16 cur_entry;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 720775c9d0b8..70b8f4cd1b03 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3274,6 +3274,10 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_HV_FEATURES:
 	case SVM_VMGEXIT_TERM_REQUEST:
 		break;
+	case SVM_VMGEXIT_PSC:
+		if (!sev_snp_guest(vcpu->kvm) || !kvm_ghcb_sw_scratch_is_valid(svm))
+			goto vmgexit_err;
+		break;
 	default:
 		reason = GHCB_ERR_INVALID_EVENT;
 		goto vmgexit_err;
@@ -3503,6 +3507,175 @@ static int snp_begin_psc_msr(struct vcpu_svm *svm, u64 ghcb_msr)
 	return 0; /* forward request to userspace */
 }
 
+struct psc_buffer {
+	struct psc_hdr hdr;
+	struct psc_entry entries[];
+} __packed;
+
+static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc);
+
+static int snp_complete_psc(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	struct psc_buffer *psc = svm->sev_es.ghcb_sa;
+	struct psc_entry *entries = psc->entries;
+	struct psc_hdr *hdr = &psc->hdr;
+	__u64 psc_ret;
+	__u16 idx;
+
+	if (vcpu->run->hypercall.ret) {
+		psc_ret = VMGEXIT_PSC_ERROR_GENERIC;
+		goto out_resume;
+	}
+
+	/*
+	 * Everything in-flight has been processed successfully. Update the
+	 * corresponding entries in the guest's PSC buffer.
+	 */
+	for (idx = svm->sev_es.psc_idx; svm->sev_es.psc_inflight;
+	     svm->sev_es.psc_inflight--, idx++) {
+		struct psc_entry *entry = &entries[idx];
+
+		entry->cur_page = svm->sev_es.psc_2m ? 512 : 1;
+	}
+
+	hdr->cur_entry = idx;
+
+	/* Handle the next range (if any). */
+	return snp_begin_psc(svm, psc);
+
+out_resume:
+	svm->sev_es.psc_idx = 0;
+	svm->sev_es.psc_inflight = 0;
+	svm->sev_es.psc_2m = false;
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, psc_ret);
+
+	return 1; /* resume guest */
+}
+
+static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
+{
+	struct psc_entry *entries = psc->entries;
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct psc_hdr *hdr = &psc->hdr;
+	struct psc_entry entry_start;
+	u16 idx, idx_start, idx_end;
+	__u64 psc_ret, gpa;
+	int npages;
+
+	/* There should be no other PSCs in-flight at this point. */
+	if (WARN_ON_ONCE(svm->sev_es.psc_inflight)) {
+		psc_ret = VMGEXIT_PSC_ERROR_GENERIC;
+		goto out_resume;
+	}
+
+	if (!(vcpu->kvm->arch.hypercall_exit_enabled & (1 << KVM_HC_MAP_GPA_RANGE))) {
+		psc_ret = VMGEXIT_PSC_ERROR_GENERIC;
+		goto out_resume;
+	}
+
+	/*
+	 * The PSC descriptor buffer can be modified by a misbehaved guest after
+	 * validation, so take care to only use validated copies of values used
+	 * for things like array indexing.
+	 */
+	idx_start = hdr->cur_entry;
+	idx_end = hdr->end_entry;
+
+	if (idx_end >= VMGEXIT_PSC_MAX_COUNT) {
+		psc_ret = VMGEXIT_PSC_ERROR_INVALID_HDR;
+		goto out_resume;
+	}
+
+	/* Nothing more to process. */
+	if (idx_start > idx_end) {
+		psc_ret = 0;
+		goto out_resume;
+	}
+
+	/* Find the start of the next range which needs processing. */
+	for (idx = idx_start; idx <= idx_end; idx++, hdr->cur_entry++) {
+		__u16 cur_page;
+		gfn_t gfn;
+		bool huge;
+
+		entry_start = entries[idx];
+
+		/* Only private/shared conversions are currently supported. */
+		if (entry_start.operation != VMGEXIT_PSC_OP_PRIVATE &&
+		    entry_start.operation != VMGEXIT_PSC_OP_SHARED)
+			continue;
+
+		gfn = entry_start.gfn;
+		cur_page = entry_start.cur_page;
+		huge = entry_start.pagesize;
+
+		if ((huge && (cur_page > 512 || !IS_ALIGNED(gfn, 512))) ||
+		    (!huge && cur_page > 1)) {
+			psc_ret = VMGEXIT_PSC_ERROR_INVALID_ENTRY;
+			goto out_resume;
+		}
+
+		/* All sub-pages already processed. */
+		if ((huge && cur_page == 512) || (!huge && cur_page == 1))
+			continue;
+
+		/*
+		 * If this is a partially-completed 2M range, force 4K handling
+		 * for the remaining pages since they're effectively split at
+		 * this point. Subsequent code should ensure this doesn't get
+		 * combined with adjacent PSC entries where 2M handling is still
+		 * possible.
+		 */
+		svm->sev_es.psc_2m = cur_page ? false : huge;
+		svm->sev_es.psc_idx = idx;
+		svm->sev_es.psc_inflight = 1;
+
+		gpa = gfn_to_gpa(gfn + cur_page);
+		npages = huge ? 512 - cur_page : 1;
+		break;
+	}
+
+	/*
+	 * Find all subsequent PSC entries that contain adjacent GPA
+	 * ranges/operations and can be combined into a single
+	 * KVM_HC_MAP_GPA_RANGE exit.
+	 */
+	for (idx = svm->sev_es.psc_idx + 1; idx <= idx_end; idx++) {
+		struct psc_entry entry = entries[idx];
+
+		if (entry.operation != entry_start.operation ||
+		    entry.gfn != entry_start.gfn + npages ||
+		    !!entry.pagesize != svm->sev_es.psc_2m)
+			break;
+
+		svm->sev_es.psc_inflight++;
+		npages += entry_start.pagesize ? 512 : 1;
+	}
+
+	vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
+	vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
+	vcpu->run->hypercall.args[0] = gpa;
+	vcpu->run->hypercall.args[1] = npages;
+	vcpu->run->hypercall.args[2] = entry_start.operation == VMGEXIT_PSC_OP_PRIVATE
+				       ? KVM_MAP_GPA_RANGE_ENCRYPTED
+				       : KVM_MAP_GPA_RANGE_DECRYPTED;
+	vcpu->run->hypercall.args[2] |= entry_start.pagesize
+					? KVM_MAP_GPA_RANGE_PAGE_SZ_2M
+					: KVM_MAP_GPA_RANGE_PAGE_SZ_4K;
+	vcpu->arch.complete_userspace_io = snp_complete_psc;
+
+	return 0; /* forward request to userspace */
+
+out_resume:
+	svm->sev_es.psc_idx = 0;
+	svm->sev_es.psc_inflight = 0;
+	svm->sev_es.psc_2m = false;
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, psc_ret);
+
+	return 1; /* resume guest */
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3761,6 +3934,13 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		vcpu->run->system_event.ndata = 1;
 		vcpu->run->system_event.data[0] = control->ghcb_gpa;
 		break;
+	case SVM_VMGEXIT_PSC:
+		ret = setup_vmgexit_scratch(svm, true, control->exit_info_2);
+		if (ret)
+			break;
+
+		ret = snp_begin_psc(svm, svm->sev_es.ghcb_sa);
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index bbfbeed4c676..438cad6c9421 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -210,6 +210,11 @@ struct vcpu_sev_es_state {
 	bool ghcb_sa_sync;
 	bool ghcb_sa_free;
 
+	/* SNP Page-State-Change buffer entries currently being processed */
+	u16 psc_idx;
+	u16 psc_inflight;
+	bool psc_2m;
+
 	u64 ghcb_registered_gpa;
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v15 11/20] KVM: SEV: Add support to handle RMP nested page faults
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (9 preceding siblings ...)
  2024-05-01  8:52 ` [PATCH v15 10/20] KVM: SEV: Add support to handle " Michael Roth
@ 2024-05-01  8:52 ` Michael Roth
  2024-05-01  8:52 ` [PATCH v15 12/20] KVM: SEV: Support SEV-SNP AP Creation NAE event Michael Roth
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-01  8:52 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

When SEV-SNP is enabled in the guest, the hardware places restrictions
on all memory accesses based on the contents of the RMP table. When
hardware encounters RMP check failure caused by the guest memory access
it raises the #NPF. The error code contains additional information on
the access type. See the APM volume 2 for additional information.

When using gmem, RMP faults resulting from mismatches between the state
in the RMP table vs. what the guest expects via its page table result
in KVM_EXIT_MEMORY_FAULTs being forwarded to userspace to handle. This
means the only expected case that needs to be handled in the kernel is
when the page size of the entry in the RMP table is larger than the
mapping in the nested page table, in which case a PSMASH instruction
needs to be issued to split the large RMP entry into individual 4K
entries so that subsequent accesses can succeed.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/include/asm/sev.h      |   3 +
 arch/x86/kvm/mmu.h              |   2 -
 arch/x86/kvm/mmu/mmu.c          |   1 +
 arch/x86/kvm/svm/sev.c          | 109 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c          |  21 ++++--
 arch/x86/kvm/svm/svm.h          |   3 +
 arch/x86/kvm/trace.h            |  31 +++++++++
 arch/x86/kvm/x86.c              |   1 +
 9 files changed, 166 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 87265b73906a..8a414fc972f8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1944,6 +1944,7 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 				   const struct kvm_memory_slot *memslot);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
 
 int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
 
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 7f57382afee4..3a06f06b847a 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -91,6 +91,9 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
 /* RMUPDATE detected 4K page and 2MB page overlap. */
 #define RMPUPDATE_FAIL_OVERLAP		4
 
+/* PSMASH failed due to concurrent access by another CPU */
+#define PSMASH_FAIL_INUSE		3
+
 /* RMP page size */
 #define RMP_PG_SIZE_4K			0
 #define RMP_PG_SIZE_2M			1
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 2343c9f00e31..e3cb35b9396d 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -251,8 +251,6 @@ static inline bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm)
 	return __kvm_mmu_honors_guest_mtrrs(kvm_arch_has_noncoherent_dma(kvm));
 }
 
-void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
-
 int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_post_init_vm(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0d556da052f6..de35dee25bf6 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6758,6 +6758,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 
 	return need_tlb_flush;
 }
+EXPORT_SYMBOL_GPL(kvm_zap_gfn_range);
 
 static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
 					   const struct kvm_memory_slot *slot)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 70b8f4cd1b03..7f5ddd92113e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3465,6 +3465,23 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
 	svm->vmcb->control.ghcb_gpa = value;
 }
 
+static int snp_rmptable_psmash(kvm_pfn_t pfn)
+{
+	int ret;
+
+	pfn = pfn & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+
+	/*
+	 * PSMASH_FAIL_INUSE indicates another processor is modifying the
+	 * entry, so retry until that's no longer the case.
+	 */
+	do {
+		ret = psmash(pfn);
+	} while (ret == PSMASH_FAIL_INUSE);
+
+	return ret;
+}
+
 static int snp_complete_psc_msr(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4221,3 +4238,95 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
 
 	return p;
 }
+
+void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
+{
+	struct kvm_memory_slot *slot;
+	struct kvm *kvm = vcpu->kvm;
+	int order, rmp_level, ret;
+	bool assigned;
+	kvm_pfn_t pfn;
+	gfn_t gfn;
+
+	gfn = gpa >> PAGE_SHIFT;
+
+	/*
+	 * The only time RMP faults occur for shared pages is when the guest is
+	 * triggering an RMP fault for an implicit page-state change from
+	 * shared->private. Implicit page-state changes are forwarded to
+	 * userspace via KVM_EXIT_MEMORY_FAULT events, however, so RMP faults
+	 * for shared pages should not end up here.
+	 */
+	if (!kvm_mem_is_private(kvm, gfn)) {
+		pr_warn_ratelimited("SEV: Unexpected RMP fault for non-private GPA 0x%llx\n",
+				    gpa);
+		return;
+	}
+
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!kvm_slot_can_be_private(slot)) {
+		pr_warn_ratelimited("SEV: Unexpected RMP fault, non-private slot for GPA 0x%llx\n",
+				    gpa);
+		return;
+	}
+
+	ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, &order);
+	if (ret) {
+		pr_warn_ratelimited("SEV: Unexpected RMP fault, no backing page for private GPA 0x%llx\n",
+				    gpa);
+		return;
+	}
+
+	ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+	if (ret || !assigned) {
+		pr_warn_ratelimited("SEV: Unexpected RMP fault, no assigned RMP entry found for GPA 0x%llx PFN 0x%llx error %d\n",
+				    gpa, pfn, ret);
+		goto out_no_trace;
+	}
+
+	/*
+	 * There are 2 cases where a PSMASH may be needed to resolve an #NPF
+	 * with PFERR_GUEST_RMP_BIT set:
+	 *
+	 * 1) RMPADJUST/PVALIDATE can trigger an #NPF with PFERR_GUEST_SIZEM
+	 *    bit set if the guest issues them with a smaller granularity than
+	 *    what is indicated by the page-size bit in the 2MB RMP entry for
+	 *    the PFN that backs the GPA.
+	 *
+	 * 2) Guest access via NPT can trigger an #NPF if the NPT mapping is
+	 *    smaller than what is indicated by the 2MB RMP entry for the PFN
+	 *    that backs the GPA.
+	 *
+	 * In both these cases, the corresponding 2M RMP entry needs to
+	 * be PSMASH'd to 512 4K RMP entries.  If the RMP entry is already
+	 * split into 4K RMP entries, then this is likely a spurious case which
+	 * can occur when there are concurrent accesses by the guest to a 2MB
+	 * GPA range that is backed by a 2MB-aligned PFN who's RMP entry is in
+	 * the process of being PMASH'd into 4K entries. These cases should
+	 * resolve automatically on subsequent accesses, so just ignore them
+	 * here.
+	 */
+	if (rmp_level == PG_LEVEL_4K)
+		goto out;
+
+	ret = snp_rmptable_psmash(pfn);
+	if (ret) {
+		/*
+		 * Look it up again. If it's 4K now then the PSMASH may have
+		 * raced with another process and the issue has already resolved
+		 * itself.
+		 */
+		if (!snp_lookup_rmpentry(pfn, &assigned, &rmp_level) &&
+		    assigned && rmp_level == PG_LEVEL_4K)
+			goto out;
+
+		pr_warn_ratelimited("SEV: Unable to split RMP entry for GPA 0x%llx PFN 0x%llx ret %d\n",
+				    gpa, pfn, ret);
+	}
+
+	kvm_zap_gfn_range(kvm, gfn, gfn + PTRS_PER_PMD);
+out:
+	trace_kvm_rmp_fault(vcpu, gpa, pfn, error_code, rmp_level, ret);
+out_no_trace:
+	put_page(pfn_to_page(pfn));
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 422b452fbc3b..7c9807fdafc3 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2043,6 +2043,7 @@ static int pf_interception(struct kvm_vcpu *vcpu)
 static int npf_interception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	int rc;
 
 	u64 fault_address = svm->vmcb->control.exit_info_2;
 	u64 error_code = svm->vmcb->control.exit_info_1;
@@ -2060,10 +2061,22 @@ static int npf_interception(struct kvm_vcpu *vcpu)
 		error_code |= PFERR_PRIVATE_ACCESS;
 
 	trace_kvm_page_fault(vcpu, fault_address, error_code);
-	return kvm_mmu_page_fault(vcpu, fault_address, error_code,
-			static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
-			svm->vmcb->control.insn_bytes : NULL,
-			svm->vmcb->control.insn_len);
+	rc = kvm_mmu_page_fault(vcpu, fault_address, error_code,
+				static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
+				svm->vmcb->control.insn_bytes : NULL,
+				svm->vmcb->control.insn_len);
+
+	/*
+	 * rc == 0 indicates a userspace exit is needed to handle page
+	 * transitions, so do that first before updating the RMP table.
+	 */
+	if (error_code & PFERR_GUEST_RMP_MASK) {
+		if (rc == 0)
+			return rc;
+		sev_handle_rmp_fault(vcpu, fault_address, error_code);
+	}
+
+	return rc;
 }
 
 static int db_interception(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 438cad6c9421..d779f1f431af 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -728,6 +728,7 @@ void sev_hardware_unsetup(void);
 int sev_cpu_init(struct svm_cpu_data *sd);
 int sev_dev_get_attr(u32 group, u64 attr, u64 *val);
 extern unsigned int max_sev_asid;
+void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 #else
 static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
 	return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -741,6 +742,8 @@ static inline void sev_hardware_unsetup(void) {}
 static inline int sev_cpu_init(struct svm_cpu_data *sd) { return 0; }
 static inline int sev_dev_get_attr(u32 group, u64 attr, u64 *val) { return -ENXIO; }
 #define max_sev_asid 0
+static inline void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) {}
+
 #endif
 
 /* vmenter.S */
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index c6b4b1728006..3531a187d5d9 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -1834,6 +1834,37 @@ TRACE_EVENT(kvm_vmgexit_msr_protocol_exit,
 		  __entry->vcpu_id, __entry->ghcb_gpa, __entry->result)
 );
 
+/*
+ * Tracepoint for #NPFs due to RMP faults.
+ */
+TRACE_EVENT(kvm_rmp_fault,
+	TP_PROTO(struct kvm_vcpu *vcpu, u64 gpa, u64 pfn, u64 error_code,
+		 int rmp_level, int psmash_ret),
+	TP_ARGS(vcpu, gpa, pfn, error_code, rmp_level, psmash_ret),
+
+	TP_STRUCT__entry(
+		__field(unsigned int, vcpu_id)
+		__field(u64, gpa)
+		__field(u64, pfn)
+		__field(u64, error_code)
+		__field(int, rmp_level)
+		__field(int, psmash_ret)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id	= vcpu->vcpu_id;
+		__entry->gpa		= gpa;
+		__entry->pfn		= pfn;
+		__entry->error_code	= error_code;
+		__entry->rmp_level	= rmp_level;
+		__entry->psmash_ret	= psmash_ret;
+	),
+
+	TP_printk("vcpu %u gpa %016llx pfn 0x%llx error_code 0x%llx rmp_level %d psmash_ret %d",
+		  __entry->vcpu_id, __entry->gpa, __entry->pfn,
+		  __entry->error_code, __entry->rmp_level, __entry->psmash_ret)
+);
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 83b8260443a3..14693effec6b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13996,6 +13996,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_enter);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_rmp_fault);
 
 static int __init kvm_x86_init(void)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v15 12/20] KVM: SEV: Support SEV-SNP AP Creation NAE event
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (10 preceding siblings ...)
  2024-05-01  8:52 ` [PATCH v15 11/20] KVM: SEV: Add support to handle RMP nested page faults Michael Roth
@ 2024-05-01  8:52 ` Michael Roth
  2024-05-01  8:52 ` [PATCH v15 13/20] KVM: SEV: Implement gmem hook for initializing private pages Michael Roth
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-01  8:52 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
guests to alter the register state of the APs on their own. This allows
the guest a way of simulating INIT-SIPI.

A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
so as to avoid updating the VMSA pointer while the vCPU is running.

For CREATE
  The guest supplies the GPA of the VMSA to be used for the vCPU with
  the specified APIC ID. The GPA is saved in the svm struct of the
  target vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
  to the vCPU and then the vCPU is kicked.

For CREATE_ON_INIT:
  The guest supplies the GPA of the VMSA to be used for the vCPU with
  the specified APIC ID the next time an INIT is performed. The GPA is
  saved in the svm struct of the target vCPU.

For DESTROY:
  The guest indicates it wishes to stop the vCPU. The GPA is cleared
  from the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is
  added to vCPU and then the vCPU is kicked.

The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked
as a result of the event or as a result of an INIT. If a new VMSA is to
be installed, the VMSA guest page is set as the VMSA in the vCPU VMCB
and the vCPU state is set to KVM_MP_STATE_RUNNABLE. If a new VMSA is not
to be installed, the VMSA is cleared in the vCPU VMCB and the vCPU state
is set to KVM_MP_STATE_HALTED to prevent it from being run.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/include/asm/svm.h      |   6 +
 arch/x86/kvm/svm/sev.c          | 231 +++++++++++++++++++++++++++++++-
 arch/x86/kvm/svm/svm.c          |  11 +-
 arch/x86/kvm/svm/svm.h          |   9 ++
 arch/x86/kvm/x86.c              |  11 ++
 6 files changed, 266 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8a414fc972f8..cd5f99a8891e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -121,6 +121,7 @@
 	KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_HV_TLB_FLUSH \
 	KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE	KVM_ARCH_REQ(34)
 
 #define CR0_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 544a43c1cf11..f0dea3750ca9 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -286,8 +286,14 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
 #define AVIC_HPA_MASK	~((0xFFFULL << 52) | 0xFFF)
 
 #define SVM_SEV_FEAT_SNP_ACTIVE				BIT(0)
+#define SVM_SEV_FEAT_RESTRICTED_INJECTION		BIT(3)
+#define SVM_SEV_FEAT_ALTERNATE_INJECTION		BIT(4)
 #define SVM_SEV_FEAT_DEBUG_SWAP				BIT(5)
 
+#define SVM_SEV_FEAT_INT_INJ_MODES		\
+	(SVM_SEV_FEAT_RESTRICTED_INJECTION |	\
+	 SVM_SEV_FEAT_ALTERNATE_INJECTION)
+
 struct vmcb_seg {
 	u16 selector;
 	u16 attrib;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7f5ddd92113e..69ec8f577763 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -38,7 +38,7 @@
 #define GHCB_VERSION_DEFAULT	2ULL
 #define GHCB_VERSION_MIN	1ULL
 
-#define GHCB_HV_FT_SUPPORTED	GHCB_HV_FT_SNP
+#define GHCB_HV_FT_SUPPORTED	(GHCB_HV_FT_SNP | GHCB_HV_FT_SNP_AP_CREATION)
 
 /* enable/disable SEV support */
 static bool sev_enabled = true;
@@ -3267,6 +3267,13 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 		if (!kvm_ghcb_sw_scratch_is_valid(svm))
 			goto vmgexit_err;
 		break;
+	case SVM_VMGEXIT_AP_CREATION:
+		if (!sev_snp_guest(vcpu->kvm))
+			goto vmgexit_err;
+		if (lower_32_bits(control->exit_info_1) != SVM_VMGEXIT_AP_DESTROY)
+			if (!kvm_ghcb_rax_is_valid(svm))
+				goto vmgexit_err;
+		break;
 	case SVM_VMGEXIT_NMI_COMPLETE:
 	case SVM_VMGEXIT_AP_HLT_LOOP:
 	case SVM_VMGEXIT_AP_JUMP_TABLE:
@@ -3693,6 +3700,205 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
 	return 1; /* resume guest */
 }
 
+static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	WARN_ON(!mutex_is_locked(&svm->sev_es.snp_vmsa_mutex));
+
+	/* Mark the vCPU as offline and not runnable */
+	vcpu->arch.pv.pv_unhalted = false;
+	vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
+
+	/* Clear use of the VMSA */
+	svm->vmcb->control.vmsa_pa = INVALID_PAGE;
+
+	if (VALID_PAGE(svm->sev_es.snp_vmsa_gpa)) {
+		gfn_t gfn = gpa_to_gfn(svm->sev_es.snp_vmsa_gpa);
+		struct kvm_memory_slot *slot;
+		kvm_pfn_t pfn;
+
+		slot = gfn_to_memslot(vcpu->kvm, gfn);
+		if (!slot)
+			return -EINVAL;
+
+		/*
+		 * The new VMSA will be private memory guest memory, so
+		 * retrieve the PFN from the gmem backend.
+		 */
+		if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, NULL))
+			return -EINVAL;
+
+		/*
+		 * From this point forward, the VMSA will always be a
+		 * guest-mapped page rather than the initial one allocated
+		 * by KVM in svm->sev_es.vmsa. In theory, svm->sev_es.vmsa
+		 * could be free'd and cleaned up here, but that involves
+		 * cleanups like wbinvd_on_all_cpus() which would ideally
+		 * be handled during teardown rather than guest boot.
+		 * Deferring that also allows the existing logic for SEV-ES
+		 * VMSAs to be re-used with minimal SNP-specific changes.
+		 */
+		svm->sev_es.snp_has_guest_vmsa = true;
+
+		/* Use the new VMSA */
+		svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn);
+
+		/* Mark the vCPU as runnable */
+		vcpu->arch.pv.pv_unhalted = false;
+		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+
+		svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+
+		/*
+		 * gmem pages aren't currently migratable, but if this ever
+		 * changes then care should be taken to ensure
+		 * svm->sev_es.vmsa is pinned through some other means.
+		 */
+		kvm_release_pfn_clean(pfn);
+	}
+
+	/*
+	 * When replacing the VMSA during SEV-SNP AP creation,
+	 * mark the VMCB dirty so that full state is always reloaded.
+	 */
+	vmcb_mark_all_dirty(svm->vmcb);
+
+	return 0;
+}
+
+/*
+ * Invoked as part of svm_vcpu_reset() processing of an init event.
+ */
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	int ret;
+
+	if (!sev_snp_guest(vcpu->kvm))
+		return;
+
+	mutex_lock(&svm->sev_es.snp_vmsa_mutex);
+
+	if (!svm->sev_es.snp_ap_waiting_for_reset)
+		goto unlock;
+
+	svm->sev_es.snp_ap_waiting_for_reset = false;
+
+	ret = __sev_snp_update_protected_guest_state(vcpu);
+	if (ret)
+		vcpu_unimpl(vcpu, "snp: AP state update on init failed\n");
+
+unlock:
+	mutex_unlock(&svm->sev_es.snp_vmsa_mutex);
+}
+
+static int sev_snp_ap_creation(struct vcpu_svm *svm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm_vcpu *target_vcpu;
+	struct vcpu_svm *target_svm;
+	unsigned int request;
+	unsigned int apic_id;
+	bool kick;
+	int ret;
+
+	request = lower_32_bits(svm->vmcb->control.exit_info_1);
+	apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
+
+	/* Validate the APIC ID */
+	target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
+	if (!target_vcpu) {
+		vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
+			    apic_id);
+		return -EINVAL;
+	}
+
+	ret = 0;
+
+	target_svm = to_svm(target_vcpu);
+
+	/*
+	 * The target vCPU is valid, so the vCPU will be kicked unless the
+	 * request is for CREATE_ON_INIT. For any errors at this stage, the
+	 * kick will place the vCPU in an non-runnable state.
+	 */
+	kick = true;
+
+	mutex_lock(&target_svm->sev_es.snp_vmsa_mutex);
+
+	target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+	target_svm->sev_es.snp_ap_waiting_for_reset = true;
+
+	/* Interrupt injection mode shouldn't change for AP creation */
+	if (request < SVM_VMGEXIT_AP_DESTROY) {
+		u64 sev_features;
+
+		sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
+		sev_features ^= sev->vmsa_features;
+
+		if (sev_features & SVM_SEV_FEAT_INT_INJ_MODES) {
+			vcpu_unimpl(vcpu, "vmgexit: invalid AP injection mode [%#lx] from guest\n",
+				    vcpu->arch.regs[VCPU_REGS_RAX]);
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
+	switch (request) {
+	case SVM_VMGEXIT_AP_CREATE_ON_INIT:
+		kick = false;
+		fallthrough;
+	case SVM_VMGEXIT_AP_CREATE:
+		if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
+			vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
+				    svm->vmcb->control.exit_info_2);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		/*
+		 * Malicious guest can RMPADJUST a large page into VMSA which
+		 * will hit the SNP erratum where the CPU will incorrectly signal
+		 * an RMP violation #PF if a hugepage collides with the RMP entry
+		 * of VMSA page, reject the AP CREATE request if VMSA address from
+		 * guest is 2M aligned.
+		 */
+		if (IS_ALIGNED(svm->vmcb->control.exit_info_2, PMD_SIZE)) {
+			vcpu_unimpl(vcpu,
+				    "vmgexit: AP VMSA address [%llx] from guest is unsafe as it is 2M aligned\n",
+				    svm->vmcb->control.exit_info_2);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
+		break;
+	case SVM_VMGEXIT_AP_DESTROY:
+		break;
+	default:
+		vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
+			    request);
+		ret = -EINVAL;
+		break;
+	}
+
+out:
+	if (kick) {
+		kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
+
+		if (target_vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
+			kvm_make_request(KVM_REQ_UNBLOCK, target_vcpu);
+
+		kvm_vcpu_kick(target_vcpu);
+	}
+
+	mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
+
+	return ret;
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3958,6 +4164,15 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 
 		ret = snp_begin_psc(svm, svm->sev_es.ghcb_sa);
 		break;
+	case SVM_VMGEXIT_AP_CREATION:
+		ret = sev_snp_ap_creation(svm);
+		if (ret) {
+			ghcb_set_sw_exit_info_1(svm->sev_es.ghcb, 2);
+			ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, GHCB_ERR_INVALID_INPUT);
+		}
+
+		ret = 1;
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
@@ -4052,7 +4267,7 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
 	 * the VMSA will be NULL if this vCPU is the destination for intrahost
 	 * migration, and will be copied later.
 	 */
-	if (svm->sev_es.vmsa)
+	if (svm->sev_es.vmsa && !svm->sev_es.snp_has_guest_vmsa)
 		svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);
 
 	/* Can't intercept CR register access, HV can't modify CR registers */
@@ -4128,6 +4343,8 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
 	set_ghcb_msr(svm, GHCB_MSR_SEV_INFO((__u64)sev->ghcb_version,
 					    GHCB_VERSION_MIN,
 					    sev_enc_bit));
+
+	mutex_init(&svm->sev_es.snp_vmsa_mutex);
 }
 
 void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa)
@@ -4239,6 +4456,16 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
 	return p;
 }
 
+void sev_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+	if (!sev_snp_guest(vcpu->kvm))
+		return;
+
+	if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu) &&
+	    vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
+		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+}
+
 void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
 {
 	struct kvm_memory_slot *slot;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7c9807fdafc3..b70556608e8d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1398,6 +1398,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	svm->spec_ctrl = 0;
 	svm->virt_spec_ctrl = 0;
 
+	if (init_event)
+		sev_snp_init_protected_guest_state(vcpu);
+
 	init_vmcb(vcpu);
 
 	if (!init_event)
@@ -4944,6 +4947,12 @@ static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
 	return page_address(page);
 }
 
+static void svm_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+	sev_vcpu_unblocking(vcpu);
+	avic_vcpu_unblocking(vcpu);
+}
+
 static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.name = KBUILD_MODNAME,
 
@@ -4966,7 +4975,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.vcpu_load = svm_vcpu_load,
 	.vcpu_put = svm_vcpu_put,
 	.vcpu_blocking = avic_vcpu_blocking,
-	.vcpu_unblocking = avic_vcpu_unblocking,
+	.vcpu_unblocking = svm_vcpu_unblocking,
 
 	.update_exception_bitmap = svm_update_exception_bitmap,
 	.get_msr_feature = svm_get_msr_feature,
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index d779f1f431af..858e74a26fab 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -216,6 +216,11 @@ struct vcpu_sev_es_state {
 	bool psc_2m;
 
 	u64 ghcb_registered_gpa;
+
+	struct mutex snp_vmsa_mutex; /* Used to handle concurrent updates of VMSA. */
+	gpa_t snp_vmsa_gpa;
+	bool snp_ap_waiting_for_reset;
+	bool snp_has_guest_vmsa;
 };
 
 struct vcpu_svm {
@@ -729,6 +734,8 @@ int sev_cpu_init(struct svm_cpu_data *sd);
 int sev_dev_get_attr(u32 group, u64 attr, u64 *val);
 extern unsigned int max_sev_asid;
 void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
+void sev_vcpu_unblocking(struct kvm_vcpu *vcpu);
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
 #else
 static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
 	return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -743,6 +750,8 @@ static inline int sev_cpu_init(struct svm_cpu_data *sd) { return 0; }
 static inline int sev_dev_get_attr(u32 group, u64 attr, u64 *val) { return -ENXIO; }
 #define max_sev_asid 0
 static inline void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) {}
+static inline void sev_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+static inline void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) {}
 
 #endif
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 14693effec6b..b20f6c1b8214 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10938,6 +10938,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 		if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
 			static_call(kvm_x86_update_cpu_dirty_logging)(vcpu);
+
+		if (kvm_check_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu)) {
+			kvm_vcpu_reset(vcpu, true);
+			if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE) {
+				r = 1;
+				goto out;
+			}
+		}
 	}
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
@@ -13145,6 +13153,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
 	if (kvm_test_request(KVM_REQ_PMI, vcpu))
 		return true;
 
+	if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu))
+		return true;
+
 	if (kvm_arch_interrupt_allowed(vcpu) &&
 	    (kvm_cpu_has_interrupt(vcpu) ||
 	    kvm_guest_apic_has_interrupt(vcpu)))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v15 13/20] KVM: SEV: Implement gmem hook for initializing private pages
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (11 preceding siblings ...)
  2024-05-01  8:52 ` [PATCH v15 12/20] KVM: SEV: Support SEV-SNP AP Creation NAE event Michael Roth
@ 2024-05-01  8:52 ` Michael Roth
  2024-05-01  8:52 ` [PATCH v15 14/20] KVM: SEV: Implement gmem hook for invalidating " Michael Roth
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-01  8:52 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

This will handle the RMP table updates needed to put a page into a
private state before mapping it into an SEV-SNP guest.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/Kconfig   |  1 +
 arch/x86/kvm/svm/sev.c | 98 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c |  2 +
 arch/x86/kvm/svm/svm.h |  5 +++
 arch/x86/kvm/x86.c     |  5 +++
 virt/kvm/guest_memfd.c |  4 +-
 6 files changed, 113 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 5e72faca4e8f..10768f13b240 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -137,6 +137,7 @@ config KVM_AMD_SEV
 	depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
 	select ARCH_HAS_CC_PLATFORM
 	select KVM_GENERIC_PRIVATE_MEM
+	select HAVE_KVM_GMEM_PREPARE
 	help
 	  Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
 	  with Encrypted State (SEV-ES) on AMD processors.
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 69ec8f577763..0439ec12fa90 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4557,3 +4557,101 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
 out_no_trace:
 	put_page(pfn_to_page(pfn));
 }
+
+static bool is_pfn_range_shared(kvm_pfn_t start, kvm_pfn_t end)
+{
+	kvm_pfn_t pfn = start;
+
+	while (pfn < end) {
+		int ret, rmp_level;
+		bool assigned;
+
+		ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+		if (ret) {
+			pr_warn_ratelimited("SEV: Failed to retrieve RMP entry: PFN 0x%llx GFN start 0x%llx GFN end 0x%llx RMP level %d error %d\n",
+					    pfn, start, end, rmp_level, ret);
+			return false;
+		}
+
+		if (assigned) {
+			pr_debug("%s: overlap detected, PFN 0x%llx start 0x%llx end 0x%llx RMP level %d\n",
+				 __func__, pfn, start, end, rmp_level);
+			return false;
+		}
+
+		pfn++;
+	}
+
+	return true;
+}
+
+static u8 max_level_for_order(int order)
+{
+	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
+		return PG_LEVEL_2M;
+
+	return PG_LEVEL_4K;
+}
+
+static bool is_large_rmp_possible(struct kvm *kvm, kvm_pfn_t pfn, int order)
+{
+	kvm_pfn_t pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
+
+	/*
+	 * If this is a large folio, and the entire 2M range containing the
+	 * PFN is currently shared, then the entire 2M-aligned range can be
+	 * set to private via a single 2M RMP entry.
+	 */
+	if (max_level_for_order(order) > PG_LEVEL_4K &&
+	    is_pfn_range_shared(pfn_aligned, pfn_aligned + PTRS_PER_PMD))
+		return true;
+
+	return false;
+}
+
+int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	kvm_pfn_t pfn_aligned;
+	gfn_t gfn_aligned;
+	int level, rc;
+	bool assigned;
+
+	if (!sev_snp_guest(kvm))
+		return 0;
+
+	rc = snp_lookup_rmpentry(pfn, &assigned, &level);
+	if (rc) {
+		pr_err_ratelimited("SEV: Failed to look up RMP entry: GFN %llx PFN %llx error %d\n",
+				   gfn, pfn, rc);
+		return -ENOENT;
+	}
+
+	if (assigned) {
+		pr_debug("%s: already assigned: gfn %llx pfn %llx max_order %d level %d\n",
+			 __func__, gfn, pfn, max_order, level);
+		return 0;
+	}
+
+	if (is_large_rmp_possible(kvm, pfn, max_order)) {
+		level = PG_LEVEL_2M;
+		pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
+		gfn_aligned = ALIGN_DOWN(gfn, PTRS_PER_PMD);
+	} else {
+		level = PG_LEVEL_4K;
+		pfn_aligned = pfn;
+		gfn_aligned = gfn;
+	}
+
+	rc = rmp_make_private(pfn_aligned, gfn_to_gpa(gfn_aligned), level, sev->asid, false);
+	if (rc) {
+		pr_err_ratelimited("SEV: Failed to update RMP entry: GFN %llx PFN %llx level %d error %d\n",
+				   gfn, pfn, level, rc);
+		return -EINVAL;
+	}
+
+	pr_debug("%s: updated: gfn %llx pfn %llx pfn_aligned %llx max_order %d level %d\n",
+		 __func__, gfn, pfn, pfn_aligned, max_order, level);
+
+	return 0;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b70556608e8d..60783e9f2ae8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5085,6 +5085,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
 	.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
 	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
+
+	.gmem_prepare = sev_gmem_prepare,
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 858e74a26fab..ff1aca7e10e9 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -736,6 +736,7 @@ extern unsigned int max_sev_asid;
 void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 void sev_vcpu_unblocking(struct kvm_vcpu *vcpu);
 void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
+int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
 #else
 static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
 	return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -752,6 +753,10 @@ static inline int sev_dev_get_attr(u32 group, u64 attr, u64 *val) { return -ENXI
 static inline void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) {}
 static inline void sev_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 static inline void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) {}
+static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
+{
+	return 0;
+}
 
 #endif
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b20f6c1b8214..0fb76ef9b7e9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13610,6 +13610,11 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
 EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
 
 #ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
+bool kvm_arch_gmem_prepare_needed(struct kvm *kvm)
+{
+	return kvm->arch.vm_type == KVM_X86_SNP_VM;
+}
+
 int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order)
 {
 	return static_call(kvm_x86_gmem_prepare)(kvm, pfn, gfn, max_order);
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index a44f983eb673..7d3932e5a689 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -46,8 +46,8 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol
 		gfn = slot->base_gfn + index - slot->gmem.pgoff;
 		rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, compound_order(compound_head(page)));
 		if (rc) {
-			pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx, error %d.\n",
-					    index, rc);
+			pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx GFN %llx PFN %llx error %d.\n",
+					    index, gfn, pfn, rc);
 			return rc;
 		}
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v15 14/20] KVM: SEV: Implement gmem hook for invalidating private pages
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (12 preceding siblings ...)
  2024-05-01  8:52 ` [PATCH v15 13/20] KVM: SEV: Implement gmem hook for initializing private pages Michael Roth
@ 2024-05-01  8:52 ` Michael Roth
  2024-05-01  8:52 ` [PATCH v15 15/20] KVM: x86: Implement hook for determining max NPT mapping level Michael Roth
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-01  8:52 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

Implement a platform hook to do the work of restoring the direct map
entries of gmem-managed pages and transitioning the corresponding RMP
table entries back to the default shared/hypervisor-owned state.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/Kconfig   |  1 +
 arch/x86/kvm/svm/sev.c | 64 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c |  1 +
 arch/x86/kvm/svm/svm.h |  2 ++
 4 files changed, 68 insertions(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 10768f13b240..2a7f69abcac3 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -138,6 +138,7 @@ config KVM_AMD_SEV
 	select ARCH_HAS_CC_PLATFORM
 	select KVM_GENERIC_PRIVATE_MEM
 	select HAVE_KVM_GMEM_PREPARE
+	select HAVE_KVM_GMEM_INVALIDATE
 	help
 	  Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
 	  with Encrypted State (SEV-ES) on AMD processors.
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0439ec12fa90..cb89f6eba6ea 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4655,3 +4655,67 @@ int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
 
 	return 0;
 }
+
+void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
+{
+	kvm_pfn_t pfn;
+
+	pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n", __func__, start, end);
+
+	for (pfn = start; pfn < end;) {
+		bool use_2m_update = false;
+		int rc, rmp_level;
+		bool assigned;
+
+		rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+		if (WARN_ONCE(rc, "SEV: Failed to retrieve RMP entry for PFN 0x%llx error %d\n",
+			      pfn, rc))
+			goto next_pfn;
+
+		if (!assigned)
+			goto next_pfn;
+
+		use_2m_update = IS_ALIGNED(pfn, PTRS_PER_PMD) &&
+				end >= (pfn + PTRS_PER_PMD) &&
+				rmp_level > PG_LEVEL_4K;
+
+		/*
+		 * If an unaligned PFN corresponds to a 2M region assigned as a
+		 * large page in the RMP table, PSMASH the region into individual
+		 * 4K RMP entries before attempting to convert a 4K sub-page.
+		 */
+		if (!use_2m_update && rmp_level > PG_LEVEL_4K) {
+			/*
+			 * This shouldn't fail, but if it does, report it, but
+			 * still try to update RMP entry to shared and pray this
+			 * was a spurious error that can be addressed later.
+			 */
+			rc = snp_rmptable_psmash(pfn);
+			WARN_ONCE(rc, "SEV: Failed to PSMASH RMP entry for PFN 0x%llx error %d\n",
+				  pfn, rc);
+		}
+
+		rc = rmp_make_shared(pfn, use_2m_update ? PG_LEVEL_2M : PG_LEVEL_4K);
+		if (WARN_ONCE(rc, "SEV: Failed to update RMP entry for PFN 0x%llx error %d\n",
+			      pfn, rc))
+			goto next_pfn;
+
+		/*
+		 * SEV-ES avoids host/guest cache coherency issues through
+		 * WBINVD hooks issued via MMU notifiers during run-time, and
+		 * KVM's VM destroy path at shutdown. Those MMU notifier events
+		 * don't cover gmem since there is no requirement to map pages
+		 * to a HVA in order to use them for a running guest. While the
+		 * shutdown path would still likely cover things for SNP guests,
+		 * userspace may also free gmem pages during run-time via
+		 * hole-punching operations on the guest_memfd, so flush the
+		 * cache entries for these pages before free'ing them back to
+		 * the host.
+		 */
+		clflush_cache_range(__va(pfn_to_hpa(pfn)),
+				    use_2m_update ? PMD_SIZE : PAGE_SIZE);
+next_pfn:
+		pfn += use_2m_update ? PTRS_PER_PMD : 1;
+		cond_resched();
+	}
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 60783e9f2ae8..29dc5fa28d97 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5087,6 +5087,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
 
 	.gmem_prepare = sev_gmem_prepare,
+	.gmem_invalidate = sev_gmem_invalidate,
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index ff1aca7e10e9..f91096722e29 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -737,6 +737,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 void sev_vcpu_unblocking(struct kvm_vcpu *vcpu);
 void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
 int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
+void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
 #else
 static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
 	return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -757,6 +758,7 @@ static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, in
 {
 	return 0;
 }
+static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {}
 
 #endif
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v15 15/20] KVM: x86: Implement hook for determining max NPT mapping level
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (13 preceding siblings ...)
  2024-05-01  8:52 ` [PATCH v15 14/20] KVM: SEV: Implement gmem hook for invalidating " Michael Roth
@ 2024-05-01  8:52 ` Michael Roth
  2024-05-01  8:52 ` [PATCH v15 16/20] KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP Michael Roth
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-01  8:52 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

In the case of SEV-SNP, whether or not a 2MB page can be mapped via a
2MB mapping in the guest's nested page table depends on whether or not
any subpages within the range have already been initialized as private
in the RMP table. The existing mixed-attribute tracking in KVM is
insufficient here, for instance:

  - gmem allocates 2MB page
  - guest issues PVALIDATE on 2MB page
  - guest later converts a subpage to shared
  - SNP host code issues PSMASH to split 2MB RMP mapping to 4K
  - KVM MMU splits NPT mapping to 4K
  - guest later converts that shared page back to private

At this point there are no mixed attributes, and KVM would normally
allow for 2MB NPT mappings again, but this is actually not allowed
because the RMP table mappings are 4K and cannot be promoted on the
hypervisor side, so the NPT mappings must still be limited to 4K to
match this.

Implement a kvm_x86_ops.private_max_mapping_level() hook for SEV that
checks for this condition and adjusts the mapping level accordingly.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 15 +++++++++++++++
 arch/x86/kvm/svm/svm.c |  1 +
 arch/x86/kvm/svm/svm.h |  5 +++++
 3 files changed, 21 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index cb89f6eba6ea..224fdab32950 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4719,3 +4719,18 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
 		cond_resched();
 	}
 }
+
+int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
+{
+	int level, rc;
+	bool assigned;
+
+	if (!sev_snp_guest(kvm))
+		return 0;
+
+	rc = snp_lookup_rmpentry(pfn, &assigned, &level);
+	if (rc || !assigned)
+		return PG_LEVEL_4K;
+
+	return level;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 29dc5fa28d97..426ad49325d7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5088,6 +5088,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.gmem_prepare = sev_gmem_prepare,
 	.gmem_invalidate = sev_gmem_invalidate,
+	.private_max_mapping_level = sev_private_max_mapping_level,
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f91096722e29..e325ede0f463 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -738,6 +738,7 @@ void sev_vcpu_unblocking(struct kvm_vcpu *vcpu);
 void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
 int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
 void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
+int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn);
 #else
 static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
 	return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -759,6 +760,10 @@ static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, in
 	return 0;
 }
 static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {}
+static inline int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
+{
+	return 0;
+}
 
 #endif
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v15 16/20] KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (14 preceding siblings ...)
  2024-05-01  8:52 ` [PATCH v15 15/20] KVM: x86: Implement hook for determining max NPT mapping level Michael Roth
@ 2024-05-01  8:52 ` Michael Roth
  2024-05-01  8:52 ` [PATCH v15 17/20] KVM: SVM: Add module parameter to enable SEV-SNP Michael Roth
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-01  8:52 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

From: Ashish Kalra <ashish.kalra@amd.com>

With SNP/guest_memfd, private/encrypted memory should not be mappable,
and MMU notifications for HVA-mapped memory will only be relevant to
unencrypted guest memory. Therefore, the rationale behind issuing a
wbinvd_on_all_cpus() in sev_guest_memory_reclaimed() should not apply
for SNP guests and can be ignored.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
[mdr: Add some clarifications in commit]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 224fdab32950..e94e3aa4d932 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3039,7 +3039,13 @@ static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va)
 
 void sev_guest_memory_reclaimed(struct kvm *kvm)
 {
-	if (!sev_guest(kvm))
+	/*
+	 * With SNP+gmem, private/encrypted memory is unreachable via the
+	 * hva-based mmu notifiers, so these events are only actually
+	 * pertaining to shared pages where there is no need to perform
+	 * the WBINVD to flush associated caches.
+	 */
+	if (!sev_guest(kvm) || sev_snp_guest(kvm))
 		return;
 
 	wbinvd_on_all_cpus();
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v15 17/20] KVM: SVM: Add module parameter to enable SEV-SNP
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (15 preceding siblings ...)
  2024-05-01  8:52 ` [PATCH v15 16/20] KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP Michael Roth
@ 2024-05-01  8:52 ` Michael Roth
  2024-05-01  8:52 ` [PATCH v15 18/20] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-01  8:52 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Add a module parameter than can be used to enable or disable the SEV-SNP
feature. Now that KVM contains the support for the SNP set the GHCB
hypervisor feature flag to indicate that SNP is supported.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/kvm/svm/sev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e94e3aa4d932..112041ee55e9 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -49,7 +49,8 @@ static bool sev_es_enabled = true;
 module_param_named(sev_es, sev_es_enabled, bool, 0444);
 
 /* enable/disable SEV-SNP support */
-static bool sev_snp_enabled;
+static bool sev_snp_enabled = true;
+module_param_named(sev_snp, sev_snp_enabled, bool, 0444);
 
 /* enable/disable SEV-ES DebugSwap support */
 static bool sev_es_debug_swap_enabled = true;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v15 18/20] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (16 preceding siblings ...)
  2024-05-01  8:52 ` [PATCH v15 17/20] KVM: SVM: Add module parameter to enable SEV-SNP Michael Roth
@ 2024-05-01  8:52 ` Michael Roth
  2024-05-01  8:52 ` [PATCH v15 19/20] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST " Michael Roth
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-01  8:52 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh, Alexey Kardashevskiy

From: Brijesh Singh <brijesh.singh@amd.com>

Version 2 of GHCB specification added support for the SNP Guest Request
Message NAE event. The event allows for an SEV-SNP guest to make
requests to the SEV-SNP firmware through hypervisor using the
SNP_GUEST_REQUEST API defined in the SEV-SNP firmware specification.

This is used by guests primarily to request attestation reports from
firmware. There are other request types are available as well, but the
specifics of what guest requests are being made are opaque to the
hypervisor, which only serves as a proxy for the guest requests and
firmware responses.

Implement handling for these events.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
[mdr: ensure FW command failures are indicated to guest, drop extended
 request handling to be re-written as separate patch, massage commit]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c         | 86 ++++++++++++++++++++++++++++++++++
 include/uapi/linux/sev-guest.h |  9 ++++
 2 files changed, 95 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 112041ee55e9..5c6262f3232f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -19,6 +19,7 @@
 #include <linux/misc_cgroup.h>
 #include <linux/processor.h>
 #include <linux/trace_events.h>
+#include <uapi/linux/sev-guest.h>
 
 #include <asm/pkru.h>
 #include <asm/trapnr.h>
@@ -3292,6 +3293,10 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 		if (!sev_snp_guest(vcpu->kvm) || !kvm_ghcb_sw_scratch_is_valid(svm))
 			goto vmgexit_err;
 		break;
+	case SVM_VMGEXIT_GUEST_REQUEST:
+		if (!sev_snp_guest(vcpu->kvm))
+			goto vmgexit_err;
+		break;
 	default:
 		reason = GHCB_ERR_INVALID_EVENT;
 		goto vmgexit_err;
@@ -3906,6 +3911,83 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm)
 	return ret;
 }
 
+static int snp_setup_guest_buf(struct kvm *kvm, struct sev_data_snp_guest_request *data,
+			       gpa_t req_gpa, gpa_t resp_gpa)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	kvm_pfn_t req_pfn, resp_pfn;
+
+	if (!PAGE_ALIGNED(req_gpa) || !PAGE_ALIGNED(resp_gpa))
+		return -EINVAL;
+
+	req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
+	if (is_error_noslot_pfn(req_pfn))
+		return -EINVAL;
+
+	resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
+	if (is_error_noslot_pfn(resp_pfn))
+		return -EINVAL;
+
+	if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
+		return -EINVAL;
+
+	data->gctx_paddr = __psp_pa(sev->snp_context);
+	data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
+	data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
+
+	return 0;
+}
+
+static int snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data)
+{
+	u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
+
+	if (snp_page_reclaim(pfn) || rmp_make_shared(pfn, PG_LEVEL_4K))
+		return -EINVAL;
+
+	return 0;
+}
+
+static int __snp_handle_guest_req(struct kvm *kvm, gpa_t req_gpa, gpa_t resp_gpa,
+				  sev_ret_code *fw_err)
+{
+	struct sev_data_snp_guest_request data = {0};
+	struct kvm_sev_info *sev;
+	int ret;
+
+	if (!sev_snp_guest(kvm))
+		return -EINVAL;
+
+	sev = &to_kvm_svm(kvm)->sev_info;
+
+	ret = snp_setup_guest_buf(kvm, &data, req_gpa, resp_gpa);
+	if (ret)
+		return ret;
+
+	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, fw_err);
+	if (ret)
+		return ret;
+
+	ret = snp_cleanup_guest_buf(&data);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static void snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm *kvm = vcpu->kvm;
+	sev_ret_code fw_err = 0;
+	int vmm_ret = 0;
+
+	if (__snp_handle_guest_req(kvm, req_gpa, resp_gpa, &fw_err))
+		vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
+
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -4178,6 +4260,10 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 			ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, GHCB_ERR_INVALID_INPUT);
 		}
 
+		ret = 1;
+		break;
+	case SVM_VMGEXIT_GUEST_REQUEST:
+		snp_handle_guest_req(svm, control->exit_info_1, control->exit_info_2);
 		ret = 1;
 		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
diff --git a/include/uapi/linux/sev-guest.h b/include/uapi/linux/sev-guest.h
index 154a87a1eca9..7bd78e258569 100644
--- a/include/uapi/linux/sev-guest.h
+++ b/include/uapi/linux/sev-guest.h
@@ -89,8 +89,17 @@ struct snp_ext_report_req {
 #define SNP_GUEST_FW_ERR_MASK		GENMASK_ULL(31, 0)
 #define SNP_GUEST_VMM_ERR_SHIFT		32
 #define SNP_GUEST_VMM_ERR(x)		(((u64)x) << SNP_GUEST_VMM_ERR_SHIFT)
+#define SNP_GUEST_FW_ERR(x)		((x) & SNP_GUEST_FW_ERR_MASK)
+#define SNP_GUEST_ERR(vmm_err, fw_err)	(SNP_GUEST_VMM_ERR(vmm_err) | \
+					 SNP_GUEST_FW_ERR(fw_err))
 
+/*
+ * The GHCB spec only formally defines INVALID_LEN/BUSY VMM errors, but define
+ * a GENERIC error code such that it won't ever conflict with GHCB-defined
+ * errors if any get added in the future.
+ */
 #define SNP_GUEST_VMM_ERR_INVALID_LEN	1
 #define SNP_GUEST_VMM_ERR_BUSY		2
+#define SNP_GUEST_VMM_ERR_GENERIC	BIT(31)
 
 #endif /* __UAPI_LINUX_SEV_GUEST_H_ */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v15 19/20] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (17 preceding siblings ...)
  2024-05-01  8:52 ` [PATCH v15 18/20] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
@ 2024-05-01  8:52 ` Michael Roth
  2024-05-13 23:48   ` Sean Christopherson
  2024-05-01  8:52 ` [PATCH v15 20/20] crypto: ccp: Add the SNP_VLEK_LOAD command Michael Roth
                   ` (2 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Michael Roth @ 2024-05-01  8:52 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

Version 2 of GHCB specification added support for the SNP Extended Guest
Request Message NAE event. This event serves a nearly identical purpose
to the previously-added SNP_GUEST_REQUEST event, but allows for
additional certificate data to be supplied via an additional
guest-supplied buffer to be used mainly for verifying the signature of
an attestation report as returned by firmware.

This certificate data is supplied by userspace, so unlike with
SNP_GUEST_REQUEST events, SNP_EXTENDED_GUEST_REQUEST events are first
forwarded to userspace via a KVM_EXIT_VMGEXIT exit structure, and then
the firmware request is made after the certificate data has been fetched
from userspace.

Since there is a potential for race conditions where the
userspace-supplied certificate data may be out-of-sync relative to the
reported TCB or VLEK that firmware will use when signing attestation
reports, a hook is also provided so that userspace can be informed once
the attestation request is actually completed. See the updates to
Documentation/ for more details on these aspects.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 Documentation/virt/kvm/api.rst | 87 ++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/sev.c         | 86 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.h         |  3 ++
 include/uapi/linux/kvm.h       | 23 +++++++++
 4 files changed, 199 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index f0b76ff5030d..f3780ac98d56 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7060,6 +7060,93 @@ Please note that the kernel is allowed to use the kvm_run structure as the
 primary storage for certain register types. Therefore, the kernel may use the
 values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
 
+::
+
+		/* KVM_EXIT_VMGEXIT */
+		struct kvm_user_vmgexit {
+  #define KVM_USER_VMGEXIT_REQ_CERTS		1
+			__u32 type; /* KVM_USER_VMGEXIT_* type */
+			union {
+				struct {
+					__u64 data_gpa;
+					__u64 data_npages;
+  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_INVALID_LEN   1
+  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_BUSY          2
+  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_GENERIC       (1 << 31)
+					__u32 ret;
+  #define KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE	BIT(0)
+					__u8 flags;
+  #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING	0
+  #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE		1
+					__u8 status;
+				} req_certs;
+			};
+		};
+
+
+If exit reason is KVM_EXIT_VMGEXIT then it indicates that an SEV-SNP guest
+has issued a VMGEXIT instruction (as documented by the AMD Architecture
+Programmer's Manual (APM)) to the hypervisor that needs to be serviced by
+userspace. These are generally handled by the host kernel, but in some
+cases some aspects of handling a VMGEXIT are done in userspace.
+
+A kvm_user_vmgexit structure is defined to encapsulate the data to be
+sent to or returned by userspace. The type field defines the specific type
+of exit that needs to be serviced, and that type is used as a discriminator
+to determine which union type should be used for input/output.
+
+KVM_USER_VMGEXIT_REQ_CERTS
+--------------------------
+
+When an SEV-SNP issues a guest request for an attestation report, it has the
+option of issuing it in the form an *extended* guest request when a
+certificate blob is returned alongside the attestation report so the guest
+can validate the endorsement key used by SNP firmware to sign the report.
+These certificates are managed by userspace and are requested via
+KVM_EXIT_VMGEXITs using the KVM_USER_VMGEXIT_REQ_CERTS type.
+
+For the KVM_USER_VMGEXIT_REQ_CERTS type, the req_certs union type
+is used. The kernel will supply in 'data_gpa' the value the guest supplies
+via the RAX field of the GHCB when issuing extended guest requests.
+'data_npages' will similarly contain the value the guest supplies in RBX
+denoting the number of shared pages available to write the certificate
+data into.
+
+  - If the supplied number of pages is sufficient, userspace should write
+    the certificate data blob (in the format defined by the GHCB spec) in
+    the address indicated by 'data_gpa' and set 'ret' to 0.
+
+  - If the number of pages supplied is not sufficient, userspace must write
+    the required number of pages in 'data_npages' and then set 'ret' to 1.
+
+  - If userspace is temporarily unable to handle the request, 'ret' should
+    be set to 2 to inform the guest to retry later.
+
+  - If some other error occurred, userspace should set 'ret' to a non-zero
+    value that is distinct from the specific return values mentioned above.
+
+Generally some care needs be taken to keep the returned certificate data in
+sync with the actual endorsement key in use by firmware at the time the
+attestation request is sent to SNP firmware. The recommended scheme to do
+this is for the VMM to obtain a shared or exclusive lock on the path the
+certificate blob file resides at before reading it and returning it to KVM,
+and that it continues to hold the lock until the attestation request is
+actually sent to firmware. To facilitate this, the VMM can set the
+KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE flag before returning the
+certificate blob, in which case another KVM_EXIT_VMGEXIT of type
+KVM_USER_VMGEXIT_REQ_CERTS will be sent to userspace with
+KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE being set in the status field to
+indicate the request is fully-completed and that any associated locks can be
+released.
+
+Tools/libraries that perform updates to SNP firmware TCB values or endorsement
+keys (e.g. firmware interfaces such as SNP_COMMIT, SNP_SET_CONFIG, or
+SNP_VLEK_LOAD, see Documentation/virt/coco/sev-guest.rst for more details) in
+such a way that the certificate blob needs to be updated, should similarly
+take an exclusive lock on the certificate blob for the duration of any updates
+to firmware or the certificate blob contents to ensure that VMMs using the
+above scheme will not return certificate blob data that is out of sync with
+firmware.
 
 6. Capabilities that can be enabled on vCPUs
 ============================================
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 5c6262f3232f..35f0bd91f92e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3297,6 +3297,11 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 		if (!sev_snp_guest(vcpu->kvm))
 			goto vmgexit_err;
 		break;
+	case SVM_VMGEXIT_EXT_GUEST_REQUEST:
+		if (!sev_snp_guest(vcpu->kvm) || !kvm_ghcb_rax_is_valid(svm) ||
+		    !kvm_ghcb_rbx_is_valid(svm))
+			goto vmgexit_err;
+		break;
 	default:
 		reason = GHCB_ERR_INVALID_EVENT;
 		goto vmgexit_err;
@@ -3988,6 +3993,84 @@ static void snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp
 	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
 }
 
+static int snp_complete_ext_guest_req(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	struct vmcb_control_area *control;
+	struct kvm *kvm = vcpu->kvm;
+	sev_ret_code fw_err = 0;
+	int vmm_ret;
+
+	vmm_ret = vcpu->run->vmgexit.req_certs.ret;
+	if (vmm_ret) {
+		if (vmm_ret == SNP_GUEST_VMM_ERR_INVALID_LEN)
+			vcpu->arch.regs[VCPU_REGS_RBX] =
+				vcpu->run->vmgexit.req_certs.data_npages;
+		goto out;
+	}
+
+	/*
+	 * The request was completed on the previous completion callback and
+	 * this completion is only for the STATUS_DONE userspace notification.
+	 */
+	if (vcpu->run->vmgexit.req_certs.status == KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE)
+		goto out_resume;
+
+	control = &svm->vmcb->control;
+
+	if (__snp_handle_guest_req(kvm, control->exit_info_1,
+				   control->exit_info_2, &fw_err))
+		vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
+
+out:
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+
+	if (vcpu->run->vmgexit.req_certs.flags & KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE) {
+		vcpu->run->vmgexit.req_certs.status = KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE;
+		vcpu->run->vmgexit.req_certs.flags = 0;
+		return 0; /* notify userspace of completion */
+	}
+
+out_resume:
+	return 1; /* resume guest */
+}
+
+static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
+{
+	int vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
+	struct vcpu_svm *svm = to_svm(vcpu);
+	unsigned long data_npages;
+	sev_ret_code fw_err;
+	gpa_t data_gpa;
+
+	if (!sev_snp_guest(vcpu->kvm))
+		goto abort_request;
+
+	data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
+	data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
+
+	if (!IS_ALIGNED(data_gpa, PAGE_SIZE))
+		goto abort_request;
+
+	/*
+	 * Grab the certificates from userspace so that can be bundled with
+	 * attestation/guest requests.
+	 */
+	vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+	vcpu->run->vmgexit.type = KVM_USER_VMGEXIT_REQ_CERTS;
+	vcpu->run->vmgexit.req_certs.data_gpa = data_gpa;
+	vcpu->run->vmgexit.req_certs.data_npages = data_npages;
+	vcpu->run->vmgexit.req_certs.flags = 0;
+	vcpu->run->vmgexit.req_certs.status = KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING;
+	vcpu->arch.complete_userspace_io = snp_complete_ext_guest_req;
+
+	return 0; /* forward request to userspace */
+
+abort_request:
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+	return 1; /* resume guest */
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -4266,6 +4349,9 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		snp_handle_guest_req(svm, control->exit_info_1, control->exit_info_2);
 		ret = 1;
 		break;
+	case SVM_VMGEXIT_EXT_GUEST_REQUEST:
+		ret = snp_begin_ext_guest_req(vcpu);
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index e325ede0f463..83c562b4712a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -309,6 +309,9 @@ struct vcpu_svm {
 
 	/* Guest GIF value, used when vGIF is not enabled */
 	bool guest_gif;
+
+	/* Transaction ID associated with SNP config updates */
+	u64 snp_transaction_id;
 };
 
 struct svm_cpu_data {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2190adbe3002..106367d87189 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -135,6 +135,26 @@ struct kvm_xen_exit {
 	} u;
 };
 
+struct kvm_user_vmgexit {
+#define KVM_USER_VMGEXIT_REQ_CERTS		1
+	__u32 type; /* KVM_USER_VMGEXIT_* type */
+	union {
+		struct {
+			__u64 data_gpa;
+			__u64 data_npages;
+#define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_INVALID_LEN   1
+#define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_BUSY          2
+#define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_GENERIC       (1 << 31)
+			__u32 ret;
+#define KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE	BIT(0)
+			__u8 flags;
+#define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING	0
+#define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE		1
+			__u8 status;
+		} req_certs;
+	};
+};
+
 #define KVM_S390_GET_SKEYS_NONE   1
 #define KVM_S390_SKEYS_MAX        1048576
 
@@ -178,6 +198,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_NOTIFY           37
 #define KVM_EXIT_LOONGARCH_IOCSR  38
 #define KVM_EXIT_MEMORY_FAULT     39
+#define KVM_EXIT_VMGEXIT          40
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -433,6 +454,8 @@ struct kvm_run {
 			__u64 gpa;
 			__u64 size;
 		} memory_fault;
+		/* KVM_EXIT_VMGEXIT */
+		struct kvm_user_vmgexit vmgexit;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v15 20/20] crypto: ccp: Add the SNP_VLEK_LOAD command
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (18 preceding siblings ...)
  2024-05-01  8:52 ` [PATCH v15 19/20] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST " Michael Roth
@ 2024-05-01  8:52 ` Michael Roth
  2024-05-07 18:04 ` [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Paolo Bonzini
  2024-05-10  1:58 ` [PATCH v15 21/23] KVM: MMU: Disable fast path for private memslots Michael Roth
  21 siblings, 0 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-01  8:52 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

When requesting an attestation report a guest is able to specify whether
it wants SNP firmware to sign the report using either a Versioned Chip
Endorsement Key (VCEK), which is derived from chip-unique secrets, or a
Versioned Loaded Endorsement Key (VLEK) which is obtained from an AMD
Key Derivation Service (KDS) and derived from seeds allocated to
enrolled cloud service providers (CSPs).

For VLEK keys, an SNP_VLEK_LOAD SNP firmware command is used to load
them into the system after obtaining them from the KDS. Add a
corresponding userspace interface so to allow the loading of VLEK keys
into the system.

See SEV-SNP Firmware ABI 1.54, SNP_VLEK_LOAD for more details.

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 Documentation/virt/coco/sev-guest.rst | 19 ++++++++++++++
 drivers/crypto/ccp/sev-dev.c          | 36 +++++++++++++++++++++++++++
 include/uapi/linux/psp-sev.h          | 27 ++++++++++++++++++++
 3 files changed, 82 insertions(+)

diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index e1eaf6a830ce..de68d3a4b540 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -176,6 +176,25 @@ to SNP_CONFIG command defined in the SEV-SNP spec. The current values of
 the firmware parameters affected by this command can be queried via
 SNP_PLATFORM_STATUS.
 
+2.7 SNP_VLEK_LOAD
+-----------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_user_data_snp_vlek_load
+:Returns (out): 0 on success, -negative on error
+
+When requesting an attestation report a guest is able to specify whether
+it wants SNP firmware to sign the report using either a Versioned Chip
+Endorsement Key (VCEK), which is derived from chip-unique secrets, or a
+Versioned Loaded Endorsement Key (VLEK) which is obtained from an AMD
+Key Derivation Service (KDS) and derived from seeds allocated to
+enrolled cloud service providers.
+
+In the case of VLEK keys, the SNP_VLEK_LOAD SNP command is used to load
+them into the system after obtaining them from the KDS, and corresponds
+closely to the SNP_VLEK_LOAD firmware command specified in the SEV-SNP
+spec.
+
 3. SEV-SNP CPUID Enforcement
 ============================
 
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 2102377f727b..97a7959406ee 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -2027,6 +2027,39 @@ static int sev_ioctl_do_snp_set_config(struct sev_issue_cmd *argp, bool writable
 	return __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
 }
 
+static int sev_ioctl_do_snp_vlek_load(struct sev_issue_cmd *argp, bool writable)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	struct sev_user_data_snp_vlek_load input;
+	void *blob;
+	int ret;
+
+	if (!sev->snp_initialized || !argp->data)
+		return -EINVAL;
+
+	if (!writable)
+		return -EPERM;
+
+	if (copy_from_user(&input, u64_to_user_ptr(argp->data), sizeof(input)))
+		return -EFAULT;
+
+	if (input.len != sizeof(input) || input.vlek_wrapped_version != 0)
+		return -EINVAL;
+
+	blob = psp_copy_user_blob(input.vlek_wrapped_address,
+				  sizeof(struct sev_user_data_snp_wrapped_vlek_hashstick));
+	if (IS_ERR(blob))
+		return PTR_ERR(blob);
+
+	input.vlek_wrapped_address = __psp_pa(blob);
+
+	ret = __sev_do_cmd_locked(SEV_CMD_SNP_VLEK_LOAD, &input, &argp->error);
+
+	kfree(blob);
+
+	return ret;
+}
+
 static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
@@ -2087,6 +2120,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 	case SNP_SET_CONFIG:
 		ret = sev_ioctl_do_snp_set_config(&input, writable);
 		break;
+	case SNP_VLEK_LOAD:
+		ret = sev_ioctl_do_snp_vlek_load(&input, writable);
+		break;
 	default:
 		ret = -EINVAL;
 		goto out;
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index b7a2c2ee35b7..2289b7c76c59 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -31,6 +31,7 @@ enum {
 	SNP_PLATFORM_STATUS,
 	SNP_COMMIT,
 	SNP_SET_CONFIG,
+	SNP_VLEK_LOAD,
 
 	SEV_MAX,
 };
@@ -214,6 +215,32 @@ struct sev_user_data_snp_config {
 	__u8 rsvd1[52];
 } __packed;
 
+/**
+ * struct sev_data_snp_vlek_load - SNP_VLEK_LOAD structure
+ *
+ * @len: length of the command buffer read by the PSP
+ * @vlek_wrapped_version: version of wrapped VLEK hashstick (Must be 0h)
+ * @rsvd: reserved
+ * @vlek_wrapped_address: address of a wrapped VLEK hashstick
+ *                        (struct sev_user_data_snp_wrapped_vlek_hashstick)
+ */
+struct sev_user_data_snp_vlek_load {
+	__u32 len;				/* In */
+	__u8 vlek_wrapped_version;		/* In */
+	__u8 rsvd[3];				/* In */
+	__u64 vlek_wrapped_address;		/* In */
+} __packed;
+
+/**
+ * struct sev_user_data_snp_vlek_wrapped_vlek_hashstick - Wrapped VLEK data
+ *
+ * @data: Opaque data provided by AMD KDS (as described in SEV-SNP Firmware ABI
+ *        1.54, SNP_VLEK_LOAD)
+ */
+struct sev_user_data_snp_wrapped_vlek_hashstick {
+	__u8 data[432];				/* In */
+} __packed;
+
 /**
  * struct sev_issue_cmd - SEV ioctl parameters
  *
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 02/20] KVM: x86: Add hook for determining max NPT mapping level
  2024-05-01  8:51 ` [PATCH v15 02/20] KVM: x86: Add hook for determining max NPT mapping level Michael Roth
@ 2024-05-02 23:11   ` Isaku Yamahata
  2024-05-07 17:48   ` Paolo Bonzini
  1 sibling, 0 replies; 49+ messages in thread
From: Isaku Yamahata @ 2024-05-02 23:11 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, isaku.yamahata, isaku.yamahata, rick.p.edgecombe

On Wed, May 01, 2024 at 03:51:52AM -0500,
Michael Roth <michael.roth@amd.com> wrote:

...

> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c6c5018376be..87265b73906a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1816,6 +1816,7 @@ struct kvm_x86_ops {
>  	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
>  	int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
>  	void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
> +	int (*private_max_mapping_level)(struct kvm *kvm, kvm_pfn_t pfn);

Explicit private prefix is nice.


>  };
>  
>  struct kvm_x86_nested_ops {
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 510eb1117012..0d556da052f6 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4271,6 +4271,20 @@ static inline u8 kvm_max_level_for_order(int order)
>  	return PG_LEVEL_4K;
>  }
>  
> +static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
> +					u8 max_level, int gmem_order)
> +{
> +	if (max_level == PG_LEVEL_4K)
> +		return PG_LEVEL_4K;
> +
> +	max_level = min(kvm_max_level_for_order(gmem_order), max_level);
> +	if (max_level == PG_LEVEL_4K)
> +		return PG_LEVEL_4K;
> +
> +	return min(max_level,
> +		   static_call(kvm_x86_private_max_mapping_level)(kvm, pfn));

If we don't implement this hook, OPTIONAL_RET0 causes always PG_LEVEL_NONE.
Anyway when TDX implements the hook, we can remove OPTIONAL_RET0.

This hook works for TDX by "return PG_LEVEL_4K;".
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>

-- 
Isaku Yamahata <isaku.yamahata@intel.com>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 02/20] KVM: x86: Add hook for determining max NPT mapping level
  2024-05-01  8:51 ` [PATCH v15 02/20] KVM: x86: Add hook for determining max NPT mapping level Michael Roth
  2024-05-02 23:11   ` Isaku Yamahata
@ 2024-05-07 17:48   ` Paolo Bonzini
  1 sibling, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2024-05-07 17:48 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

On 5/1/24 10:51, Michael Roth wrote:
> In the case of SEV-SNP, whether or not a 2MB page can be mapped via a
> 2MB mapping in the guest's nested page table depends on whether or not
> any subpages within the range have already been initialized as private
> in the RMP table. The existing mixed-attribute tracking in KVM is
> insufficient here, for instance:
> 
> - gmem allocates 2MB page
> - guest issues PVALIDATE on 2MB page
> - guest later converts a subpage to shared
> - SNP host code issues PSMASH to split 2MB RMP mapping to 4K
> - KVM MMU splits NPT mapping to 4K
> - guest later converts that shared page back to private
> 
> At this point there are no mixed attributes, and KVM would normally
> allow for 2MB NPT mappings again, but this is actually not allowed
> because the RMP table mappings are 4K and cannot be promoted on the
> hypervisor side, so the NPT mappings must still be limited to 4K to
> match this.
> 
> Add a hook to determine the max NPT mapping size in situations like
> this.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>   arch/x86/include/asm/kvm-x86-ops.h |  1 +
>   arch/x86/include/asm/kvm_host.h    |  1 +
>   arch/x86/kvm/mmu/mmu.c             | 18 ++++++++++++++++--
>   3 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index c81990937ab4..566d19b02483 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -140,6 +140,7 @@ KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
>   KVM_X86_OP_OPTIONAL(get_untagged_addr)
>   KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
>   KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
> +KVM_X86_OP_OPTIONAL_RET0(private_max_mapping_level)
>   KVM_X86_OP_OPTIONAL(gmem_invalidate)
>   
>   #undef KVM_X86_OP
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c6c5018376be..87265b73906a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1816,6 +1816,7 @@ struct kvm_x86_ops {
>   	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
>   	int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
>   	void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
> +	int (*private_max_mapping_level)(struct kvm *kvm, kvm_pfn_t pfn);
>   };
>   
>   struct kvm_x86_nested_ops {
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 510eb1117012..0d556da052f6 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4271,6 +4271,20 @@ static inline u8 kvm_max_level_for_order(int order)
>   	return PG_LEVEL_4K;
>   }
>   
> +static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
> +					u8 max_level, int gmem_order)
> +{
> +	if (max_level == PG_LEVEL_4K)
> +		return PG_LEVEL_4K;
> +
> +	max_level = min(kvm_max_level_for_order(gmem_order), max_level);
> +	if (max_level == PG_LEVEL_4K)
> +		return PG_LEVEL_4K;
> +
> +	return min(max_level,
> +		   static_call(kvm_x86_private_max_mapping_level)(kvm, pfn));
> +}

Since you're returning 0 both as a default and, later in the series, for non-SNP guests, you need to treat 0 as "don't care":

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index de35dee25bf6..62ad38b2a8c9 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4274,6 +4274,8 @@ static inline u8 kvm_max_level_for_order(int order)
  static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
  					u8 max_level, int gmem_order)
  {
+	u8 req_max_level;
+
  	if (max_level == PG_LEVEL_4K)
  		return PG_LEVEL_4K;
  
@@ -4281,8 +4283,11 @@ static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn,
  	if (max_level == PG_LEVEL_4K)
  		return PG_LEVEL_4K;
  
-	return min(max_level,
-		   static_call(kvm_x86_private_max_mapping_level)(kvm, pfn));
+	req_max_level = static_call(kvm_x86_private_max_mapping_level)(kvm, pfn);
+	if (req_max_level)
+		max_level = min(max_level, req_max_level);
+
+	return req_max_level;
  }
  
  static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,


Not beautiful but it does the job I guess.

Paolo


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (19 preceding siblings ...)
  2024-05-01  8:52 ` [PATCH v15 20/20] crypto: ccp: Add the SNP_VLEK_LOAD command Michael Roth
@ 2024-05-07 18:04 ` Paolo Bonzini
  2024-05-07 18:14   ` Michael Roth
  2024-05-10  1:58 ` [PATCH v15 21/23] KVM: MMU: Disable fast path for private memslots Michael Roth
  21 siblings, 1 reply; 49+ messages in thread
From: Paolo Bonzini @ 2024-05-07 18:04 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

On Wed, May 1, 2024 at 11:03 AM Michael Roth <michael.roth@amd.com> wrote:
>
> This patchset is also available at:
>
>   https://github.com/amdese/linux/commits/snp-host-v15
>
> and is based on top of the series:
>
>   "Add SEV-ES hypervisor support for GHCB protocol version 2"
>   https://lore.kernel.org/kvm/20240501071048.2208265-1-michael.roth@amd.com/
>   https://github.com/amdese/linux/commits/sev-init2-ghcb-v1
>
> which in turn is based on commit 20cc50a0410f (just before v14 SNP patches):
>
>   https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=kvm-coco-queue

I have mostly reviewed this, with the exception of the
snp_begin/complete_psc parts.

I am not sure about removing all the pr_debug() - I am sure it will be
a bit more painful for userspace developers to figure out what exactly
has gone wrong in some cases. But we can add them later, if needed -
I'm certainly not going to make a fuss about it.

Paolo


> Patch Layout
> ------------
>
> 01-02: These patches revert+replace the existing .gmem_validate_fault hook
>        with a similar .private_max_mapping_level as suggested by Sean[1]
>
> 03-04: These patches add some basic infrastructure and introduces a new
>        KVM_X86_SNP_VM vm_type to handle differences verses the existing
>        KVM_X86_SEV_VM and KVM_X86_SEV_ES_VM types.
>
> 05-07: These implement the KVM API to handle the creation of a
>        cryptographic launch context, encrypt/measure the initial image
>        into guest memory, and finalize it before launching it.
>
> 08-12: These implement handling for various guest-generated events such
>        as page state changes, onlining of additional vCPUs, etc.
>
> 13-16: These implement the gmem/mmu hooks needed to prepare gmem-allocated
>        pages before mapping them into guest private memory ranges as
>        well as cleaning them up prior to returning them to the host for
>        use as normal memory. Because this supplants certain activities
>        like issued WBINVDs during KVM MMU invalidations, there's also
>        a patch to avoid duplicating that work to avoid unecessary
>        overhead.
>
> 17:    With all the core support in place, the patch adds a kvm_amd module
>        parameter to enable SNP support.
>
> 18-20: These patches all deal with the servicing of guest requests to handle
>        things like attestation, as well as some related host-management
>        interfaces.
>
> [1] https://lore.kernel.org/kvm/ZimnngU7hn7sKoSc@google.com/#t
>
>
> Testing
> -------
>
> For testing this via QEMU, use the following tree:
>
>   https://github.com/amdese/qemu/commits/snp-v4-wip3c
>
> A patched OVMF is also needed due to upstream KVM no longer supporting MMIO
> ranges that are mapped as private. It is recommended you build the AmdSevX64
> variant as it provides the kernel-hashing support present in this series:
>
>   https://github.com/amdese/ovmf/commits/apic-mmio-fix1d
>
> A basic command-line invocation for SNP would be:
>
>  qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
>   -machine q35,confidential-guest-support=sev0,memory-backend=ram1
>   -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
>   -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=
>   -bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d-AmdSevX64.fd
>
> With kernel-hashing and certificate data supplied:
>
>  qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
>   -machine q35,confidential-guest-support=sev0,memory-backend=ram1
>   -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
>   -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=,certs-path=/home/mroth/cert.blob,kernel-hashes=on
>   -bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d-AmdSevX64.fd
>   -kernel /boot/vmlinuz-$ver
>   -initrd /boot/initrd.img-$ver
>   -append "root=UUID=d72a6d1c-06cf-4b79-af43-f1bac4f620f9 ro console=ttyS0,115200n8"
>
> With standard X64 OVMF package with separate image for persistent NVRAM:
>
>  qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
>   -machine q35,confidential-guest-support=sev0,memory-backend=ram1
>   -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
>   -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=
>   -bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d.fd
>   -drive if=pflash,format=raw,unit=0,file=OVMF_VARS-upstream-20240410-apic-mmio-fix1d.fd,readonly=off
>
>
> Known issues / TODOs
> --------------------
>
>  * Base tree in some cases reports "Unpatched return thunk in use. This should
>    not happen!" the first time it runs an SVM/SEV/SNP guests. This a recent
>    regression upstream and unrelated to this series:
>
>      https://lore.kernel.org/linux-kernel/CANpmjNOcKzEvLHoGGeL-boWDHJobwfwyVxUqMq2kWeka3N4tXA@mail.gmail.com/T/
>
>  * 2MB hugepage support has been dropped pending discussion on how we plan to
>    re-enable it in gmem.
>
>  * Host kexec should work, but there is a known issue with host kdump support
>    while SNP guests are running that will be addressed as a follow-up.
>
>  * SNP kselftests are currently a WIP and will be included as part of SNP
>    upstreaming efforts in the near-term.
>
>
> SEV-SNP Overview
> ----------------
>
> This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
> changes required to add KVM support for SEV-SNP. This series builds upon
> SEV-SNP guest support, which is now in mainline, and and SEV-SNP host
> initialization support, which is now in linux-next.
>
> While series provides the basic building blocks to support booting the
> SEV-SNP VMs, it does not cover all the security enhancement introduced by
> the SEV-SNP such as interrupt protection, which will added in the future.
>
> With SNP, when pages are marked as guest-owned in the RMP table, they are
> assigned to a specific guest/ASID, as well as a specific GFN with in the
> guest. Any attempts to map it in the RMP table to a different guest/ASID,
> or a different GFN within a guest/ASID, will result in an RMP nested page
> fault.
>
> Prior to accessing a guest-owned page, the guest must validate it with a
> special PVALIDATE instruction which will set a special bit in the RMP table
> for the guest. This is the only way to set the validated bit outside of the
> initial pre-encrypted guest payload/image; any attempts outside the guest to
> modify the RMP entry from that point forward will result in the validated
> bit being cleared, at which point the guest will trigger an exception if it
> attempts to access that page so it can be made aware of possible tampering.
>
> One exception to this is the initial guest payload, which is pre-validated
> by the firmware prior to launching. The guest can use Guest Message requests
> to fetch an attestation report which will include the measurement of the
> initial image so that the guest can verify it was booted with the expected
> image/environment.
>
> After boot, guests can use Page State Change requests to switch pages
> between shared/hypervisor-owned and private/guest-owned to share data for
> things like DMA, virtio buffers, and other GHCB requests.
>
> In this implementation of SEV-SNP, private guest memory is managed by a new
> kernel framework called guest_memfd (gmem). With gmem, a new
> KVM_SET_MEMORY_ATTRIBUTES KVM ioctl has been added to tell the KVM
> MMU whether a particular GFN should be backed by shared (normal) memory or
> private (gmem-allocated) memory. To tie into this, Page State Change
> requests are forward to userspace via KVM_EXIT_VMGEXIT exits, which will
> then issue the corresponding KVM_SET_MEMORY_ATTRIBUTES call to set the
> private/shared state in the KVM MMU.
>
> The gmem / KVM MMU hooks implemented in this series will then update the RMP
> table entries for the backing PFNs to set them to guest-owned/private when
> mapping private pages into the guest via KVM MMU, or use the normal KVM MMU
> handling in the case of shared pages where the corresponding RMP table
> entries are left in the default shared/hypervisor-owned state.
>
> Feedback/review is very much appreciated!
>
> -Mike
>
>
> Changes since v14:
>
>  * switch to vendor-agnostic KVM_HC_MAP_GPA_RANGE exit for forwarding
>    page-state change requests to userspace instead of an SNP-specific exit
>    (Sean)
>  * drop SNP_PAUSE_ATTESTATION/SNP_RESUME_ATTESTATION interfaces, instead
>    add handling in KVM_EXIT_VMGEXIT so that VMMs can implement their own
>    mechanisms for keeping userspace-supplied certificates in-sync with
>    firmware's TCB/endorsement key (Sean)
>  * carve out SEV-ES-specific handling for GHCB protocol 2, add control of
>    the protocol version, and post as a separate prereq patchset (Sean)
>  * use more consistent error-handling in snp_launch_{start,update,finish},
>    simplify logic based on review comments (Sean)
>  * rename .gmem_validate_fault to .private_max_mapping_level and rework
>    logic based on review suggestions (Sean)
>  * reduce number of pr_debug()'s in series, avoid multiple WARN's in
>    succession (Sean)
>  * improve documentation and comments throughout
>
> Changes since v13:
>
>  * rebase to new kvm-coco-queue and wire up to PFERR_PRIVATE_ACCESS (Paolo)
>  * handle setting kvm->arch.has_private_mem in same location as
>    kvm->arch.has_protected_state (Paolo)
>  * add flags and additional padding fields to
>    snp_launch{start,update,finish} APIs to address alignment and
>    expandability (Paolo)
>  * update snp_launch_update() to update input struct values to reflect
>    current progress of command in situations where mulitple calls are
>    needed (Paolo)
>  * update snp_launch_update() to avoid copying/accessing 'src' parameter
>    when dealing with zero pages. (Paolo)
>  * update snp_launch_update() to use u64 as length input parameter instead
>    of u32 and adjust padding accordingly
>  * modify ordering of SNP_POLICY_MASK_* definitions to be consistent with
>    bit order of corresponding flags
>  * let firmware handle enforcement of policy bits corresponding to
>    user-specified minimum API version
>  * add missing "0x" prefixs in pr_debug()'s for snp_launch_start()
>  * fix handling of VMSAs during in-place migration (Paolo)
>
> Changes since v12:
>
>  * rebased to latest kvm-coco-queue branch (commit 4d2deb62185f)
>  * add more input validation for SNP_LAUNCH_START, especially for handling
>    things like MBO/MBZ policy bits, and API major/minor minimums. (Paolo)
>  * block SNP KVM instances from being able to run legacy SEV commands (Paolo)
>  * don't attempt to measure VMSA for vcpu 0/BSP before the others, let
>    userspace deal with the ordering just like with SEV-ES (Paolo)
>  * fix up docs for SNP_LAUNCH_FINISH (Paolo)
>  * introduce svm->sev_es.snp_has_guest_vmsa flag to better distinguish
>    handling for guest-mapped vs non-guest-mapped VMSAs, rename
>    'snp_ap_create' flag to 'snp_ap_waiting_for_reset' (Paolo)
>  * drop "KVM: SEV: Use a VMSA physical address variable for populating VMCB"
>    as it is no longer needed due to above VMSA rework
>  * replace pr_debug_ratelimited() messages for RMP #NPFs with a single trace
>    event
>  * handle transient PSMASH_FAIL_INUSE return codes in kvm_gmem_invalidate(),
>    switch to WARN_ON*()'s to indicate remaining error cases are not expected
>    and should not be seen in practice. (Paolo)
>  * add a cond_resched() in kvm_gmem_invalidate() to avoid soft lock-ups when
>    cleaning up large guest memory ranges.
>  * rename VLEK_REQUIRED to VCEK_DISABLE. it's be more applicable if another
>    key type ever gets added.
>  * don't allow attestation to be paused while an attestation request is
>    being processed by firmware (Tom)
>  * add missing Documentation entry for SNP_VLEK_LOAD
>  * collect Reviewed-by's from Paolo and Tom
>
>
> ----------------------------------------------------------------
> Ashish Kalra (1):
>       KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP
>
> Brijesh Singh (8):
>       KVM: SEV: Add initial SEV-SNP support
>       KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
>       KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
>       KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
>       KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
>       KVM: SEV: Add support to handle RMP nested page faults
>       KVM: SVM: Add module parameter to enable SEV-SNP
>       KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
>
> Michael Roth (10):
>       Revert "KVM: x86: Add gmem hook for determining max NPT mapping level"
>       KVM: x86: Add hook for determining max NPT mapping level
>       KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y
>       KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
>       KVM: SEV: Add support to handle Page State Change VMGEXIT
>       KVM: SEV: Implement gmem hook for initializing private pages
>       KVM: SEV: Implement gmem hook for invalidating private pages
>       KVM: x86: Implement hook for determining max NPT mapping level
>       KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
>       crypto: ccp: Add the SNP_VLEK_LOAD command
>
> Tom Lendacky (1):
>       KVM: SEV: Support SEV-SNP AP Creation NAE event
>
>  Documentation/virt/coco/sev-guest.rst              |   19 +
>  Documentation/virt/kvm/api.rst                     |   87 ++
>  .../virt/kvm/x86/amd-memory-encryption.rst         |  110 +-
>  arch/x86/include/asm/kvm-x86-ops.h                 |    2 +-
>  arch/x86/include/asm/kvm_host.h                    |    5 +-
>  arch/x86/include/asm/sev-common.h                  |   25 +
>  arch/x86/include/asm/sev.h                         |    3 +
>  arch/x86/include/asm/svm.h                         |    9 +-
>  arch/x86/include/uapi/asm/kvm.h                    |   48 +
>  arch/x86/kvm/Kconfig                               |    3 +
>  arch/x86/kvm/mmu.h                                 |    2 -
>  arch/x86/kvm/mmu/mmu.c                             |   27 +-
>  arch/x86/kvm/svm/sev.c                             | 1538 +++++++++++++++++++-
>  arch/x86/kvm/svm/svm.c                             |   44 +-
>  arch/x86/kvm/svm/svm.h                             |   52 +
>  arch/x86/kvm/trace.h                               |   31 +
>  arch/x86/kvm/x86.c                                 |   17 +
>  drivers/crypto/ccp/sev-dev.c                       |   36 +
>  include/linux/psp-sev.h                            |    4 +-
>  include/uapi/linux/kvm.h                           |   23 +
>  include/uapi/linux/psp-sev.h                       |   27 +
>  include/uapi/linux/sev-guest.h                     |    9 +
>  virt/kvm/guest_memfd.c                             |    4 +-
>  23 files changed, 2081 insertions(+), 44 deletions(-)
>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2024-05-07 18:04 ` [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Paolo Bonzini
@ 2024-05-07 18:14   ` Michael Roth
  2024-05-10  2:34     ` Michael Roth
  0 siblings, 1 reply; 49+ messages in thread
From: Michael Roth @ 2024-05-07 18:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

On Tue, May 07, 2024 at 08:04:50PM +0200, Paolo Bonzini wrote:
> On Wed, May 1, 2024 at 11:03 AM Michael Roth <michael.roth@amd.com> wrote:
> >
> > This patchset is also available at:
> >
> >   https://github.com/amdese/linux/commits/snp-host-v15
> >
> > and is based on top of the series:
> >
> >   "Add SEV-ES hypervisor support for GHCB protocol version 2"
> >   https://lore.kernel.org/kvm/20240501071048.2208265-1-michael.roth@amd.com/
> >   https://github.com/amdese/linux/commits/sev-init2-ghcb-v1
> >
> > which in turn is based on commit 20cc50a0410f (just before v14 SNP patches):
> >
> >   https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=kvm-coco-queue
> 
> I have mostly reviewed this, with the exception of the
> snp_begin/complete_psc parts.

Thanks Paolo. We actually recently uncovered some issues with
snp_begin/complete_psc using some internal kvm-unit-tests that exercise
some edge cases, so I would hold off on reviewing that. Will send a
fix-up patch today after a bit more testing.

> 
> I am not sure about removing all the pr_debug() - I am sure it will be
> a bit more painful for userspace developers to figure out what exactly
> has gone wrong in some cases. But we can add them later, if needed -
> I'm certainly not going to make a fuss about it.

Yah, they do tend to be useful for that purpose. I think if we do add
them back we can consolidate the information a little better versus what
I had previously.

-Mike

> 
> Paolo
> 
> 
> > Patch Layout
> > ------------
> >
> > 01-02: These patches revert+replace the existing .gmem_validate_fault hook
> >        with a similar .private_max_mapping_level as suggested by Sean[1]
> >
> > 03-04: These patches add some basic infrastructure and introduces a new
> >        KVM_X86_SNP_VM vm_type to handle differences verses the existing
> >        KVM_X86_SEV_VM and KVM_X86_SEV_ES_VM types.
> >
> > 05-07: These implement the KVM API to handle the creation of a
> >        cryptographic launch context, encrypt/measure the initial image
> >        into guest memory, and finalize it before launching it.
> >
> > 08-12: These implement handling for various guest-generated events such
> >        as page state changes, onlining of additional vCPUs, etc.
> >
> > 13-16: These implement the gmem/mmu hooks needed to prepare gmem-allocated
> >        pages before mapping them into guest private memory ranges as
> >        well as cleaning them up prior to returning them to the host for
> >        use as normal memory. Because this supplants certain activities
> >        like issued WBINVDs during KVM MMU invalidations, there's also
> >        a patch to avoid duplicating that work to avoid unecessary
> >        overhead.
> >
> > 17:    With all the core support in place, the patch adds a kvm_amd module
> >        parameter to enable SNP support.
> >
> > 18-20: These patches all deal with the servicing of guest requests to handle
> >        things like attestation, as well as some related host-management
> >        interfaces.
> >
> > [1] https://lore.kernel.org/kvm/ZimnngU7hn7sKoSc@google.com/#t
> >
> >
> > Testing
> > -------
> >
> > For testing this via QEMU, use the following tree:
> >
> >   https://github.com/amdese/qemu/commits/snp-v4-wip3c
> >
> > A patched OVMF is also needed due to upstream KVM no longer supporting MMIO
> > ranges that are mapped as private. It is recommended you build the AmdSevX64
> > variant as it provides the kernel-hashing support present in this series:
> >
> >   https://github.com/amdese/ovmf/commits/apic-mmio-fix1d
> >
> > A basic command-line invocation for SNP would be:
> >
> >  qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
> >   -machine q35,confidential-guest-support=sev0,memory-backend=ram1
> >   -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
> >   -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=
> >   -bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d-AmdSevX64.fd
> >
> > With kernel-hashing and certificate data supplied:
> >
> >  qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
> >   -machine q35,confidential-guest-support=sev0,memory-backend=ram1
> >   -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
> >   -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=,certs-path=/home/mroth/cert.blob,kernel-hashes=on
> >   -bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d-AmdSevX64.fd
> >   -kernel /boot/vmlinuz-$ver
> >   -initrd /boot/initrd.img-$ver
> >   -append "root=UUID=d72a6d1c-06cf-4b79-af43-f1bac4f620f9 ro console=ttyS0,115200n8"
> >
> > With standard X64 OVMF package with separate image for persistent NVRAM:
> >
> >  qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
> >   -machine q35,confidential-guest-support=sev0,memory-backend=ram1
> >   -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
> >   -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=
> >   -bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d.fd
> >   -drive if=pflash,format=raw,unit=0,file=OVMF_VARS-upstream-20240410-apic-mmio-fix1d.fd,readonly=off
> >
> >
> > Known issues / TODOs
> > --------------------
> >
> >  * Base tree in some cases reports "Unpatched return thunk in use. This should
> >    not happen!" the first time it runs an SVM/SEV/SNP guests. This a recent
> >    regression upstream and unrelated to this series:
> >
> >      https://lore.kernel.org/linux-kernel/CANpmjNOcKzEvLHoGGeL-boWDHJobwfwyVxUqMq2kWeka3N4tXA@mail.gmail.com/T/
> >
> >  * 2MB hugepage support has been dropped pending discussion on how we plan to
> >    re-enable it in gmem.
> >
> >  * Host kexec should work, but there is a known issue with host kdump support
> >    while SNP guests are running that will be addressed as a follow-up.
> >
> >  * SNP kselftests are currently a WIP and will be included as part of SNP
> >    upstreaming efforts in the near-term.
> >
> >
> > SEV-SNP Overview
> > ----------------
> >
> > This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
> > changes required to add KVM support for SEV-SNP. This series builds upon
> > SEV-SNP guest support, which is now in mainline, and and SEV-SNP host
> > initialization support, which is now in linux-next.
> >
> > While series provides the basic building blocks to support booting the
> > SEV-SNP VMs, it does not cover all the security enhancement introduced by
> > the SEV-SNP such as interrupt protection, which will added in the future.
> >
> > With SNP, when pages are marked as guest-owned in the RMP table, they are
> > assigned to a specific guest/ASID, as well as a specific GFN with in the
> > guest. Any attempts to map it in the RMP table to a different guest/ASID,
> > or a different GFN within a guest/ASID, will result in an RMP nested page
> > fault.
> >
> > Prior to accessing a guest-owned page, the guest must validate it with a
> > special PVALIDATE instruction which will set a special bit in the RMP table
> > for the guest. This is the only way to set the validated bit outside of the
> > initial pre-encrypted guest payload/image; any attempts outside the guest to
> > modify the RMP entry from that point forward will result in the validated
> > bit being cleared, at which point the guest will trigger an exception if it
> > attempts to access that page so it can be made aware of possible tampering.
> >
> > One exception to this is the initial guest payload, which is pre-validated
> > by the firmware prior to launching. The guest can use Guest Message requests
> > to fetch an attestation report which will include the measurement of the
> > initial image so that the guest can verify it was booted with the expected
> > image/environment.
> >
> > After boot, guests can use Page State Change requests to switch pages
> > between shared/hypervisor-owned and private/guest-owned to share data for
> > things like DMA, virtio buffers, and other GHCB requests.
> >
> > In this implementation of SEV-SNP, private guest memory is managed by a new
> > kernel framework called guest_memfd (gmem). With gmem, a new
> > KVM_SET_MEMORY_ATTRIBUTES KVM ioctl has been added to tell the KVM
> > MMU whether a particular GFN should be backed by shared (normal) memory or
> > private (gmem-allocated) memory. To tie into this, Page State Change
> > requests are forward to userspace via KVM_EXIT_VMGEXIT exits, which will
> > then issue the corresponding KVM_SET_MEMORY_ATTRIBUTES call to set the
> > private/shared state in the KVM MMU.
> >
> > The gmem / KVM MMU hooks implemented in this series will then update the RMP
> > table entries for the backing PFNs to set them to guest-owned/private when
> > mapping private pages into the guest via KVM MMU, or use the normal KVM MMU
> > handling in the case of shared pages where the corresponding RMP table
> > entries are left in the default shared/hypervisor-owned state.
> >
> > Feedback/review is very much appreciated!
> >
> > -Mike
> >
> >
> > Changes since v14:
> >
> >  * switch to vendor-agnostic KVM_HC_MAP_GPA_RANGE exit for forwarding
> >    page-state change requests to userspace instead of an SNP-specific exit
> >    (Sean)
> >  * drop SNP_PAUSE_ATTESTATION/SNP_RESUME_ATTESTATION interfaces, instead
> >    add handling in KVM_EXIT_VMGEXIT so that VMMs can implement their own
> >    mechanisms for keeping userspace-supplied certificates in-sync with
> >    firmware's TCB/endorsement key (Sean)
> >  * carve out SEV-ES-specific handling for GHCB protocol 2, add control of
> >    the protocol version, and post as a separate prereq patchset (Sean)
> >  * use more consistent error-handling in snp_launch_{start,update,finish},
> >    simplify logic based on review comments (Sean)
> >  * rename .gmem_validate_fault to .private_max_mapping_level and rework
> >    logic based on review suggestions (Sean)
> >  * reduce number of pr_debug()'s in series, avoid multiple WARN's in
> >    succession (Sean)
> >  * improve documentation and comments throughout
> >
> > Changes since v13:
> >
> >  * rebase to new kvm-coco-queue and wire up to PFERR_PRIVATE_ACCESS (Paolo)
> >  * handle setting kvm->arch.has_private_mem in same location as
> >    kvm->arch.has_protected_state (Paolo)
> >  * add flags and additional padding fields to
> >    snp_launch{start,update,finish} APIs to address alignment and
> >    expandability (Paolo)
> >  * update snp_launch_update() to update input struct values to reflect
> >    current progress of command in situations where mulitple calls are
> >    needed (Paolo)
> >  * update snp_launch_update() to avoid copying/accessing 'src' parameter
> >    when dealing with zero pages. (Paolo)
> >  * update snp_launch_update() to use u64 as length input parameter instead
> >    of u32 and adjust padding accordingly
> >  * modify ordering of SNP_POLICY_MASK_* definitions to be consistent with
> >    bit order of corresponding flags
> >  * let firmware handle enforcement of policy bits corresponding to
> >    user-specified minimum API version
> >  * add missing "0x" prefixs in pr_debug()'s for snp_launch_start()
> >  * fix handling of VMSAs during in-place migration (Paolo)
> >
> > Changes since v12:
> >
> >  * rebased to latest kvm-coco-queue branch (commit 4d2deb62185f)
> >  * add more input validation for SNP_LAUNCH_START, especially for handling
> >    things like MBO/MBZ policy bits, and API major/minor minimums. (Paolo)
> >  * block SNP KVM instances from being able to run legacy SEV commands (Paolo)
> >  * don't attempt to measure VMSA for vcpu 0/BSP before the others, let
> >    userspace deal with the ordering just like with SEV-ES (Paolo)
> >  * fix up docs for SNP_LAUNCH_FINISH (Paolo)
> >  * introduce svm->sev_es.snp_has_guest_vmsa flag to better distinguish
> >    handling for guest-mapped vs non-guest-mapped VMSAs, rename
> >    'snp_ap_create' flag to 'snp_ap_waiting_for_reset' (Paolo)
> >  * drop "KVM: SEV: Use a VMSA physical address variable for populating VMCB"
> >    as it is no longer needed due to above VMSA rework
> >  * replace pr_debug_ratelimited() messages for RMP #NPFs with a single trace
> >    event
> >  * handle transient PSMASH_FAIL_INUSE return codes in kvm_gmem_invalidate(),
> >    switch to WARN_ON*()'s to indicate remaining error cases are not expected
> >    and should not be seen in practice. (Paolo)
> >  * add a cond_resched() in kvm_gmem_invalidate() to avoid soft lock-ups when
> >    cleaning up large guest memory ranges.
> >  * rename VLEK_REQUIRED to VCEK_DISABLE. it's be more applicable if another
> >    key type ever gets added.
> >  * don't allow attestation to be paused while an attestation request is
> >    being processed by firmware (Tom)
> >  * add missing Documentation entry for SNP_VLEK_LOAD
> >  * collect Reviewed-by's from Paolo and Tom
> >
> >
> > ----------------------------------------------------------------
> > Ashish Kalra (1):
> >       KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP
> >
> > Brijesh Singh (8):
> >       KVM: SEV: Add initial SEV-SNP support
> >       KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
> >       KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
> >       KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
> >       KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
> >       KVM: SEV: Add support to handle RMP nested page faults
> >       KVM: SVM: Add module parameter to enable SEV-SNP
> >       KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
> >
> > Michael Roth (10):
> >       Revert "KVM: x86: Add gmem hook for determining max NPT mapping level"
> >       KVM: x86: Add hook for determining max NPT mapping level
> >       KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y
> >       KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
> >       KVM: SEV: Add support to handle Page State Change VMGEXIT
> >       KVM: SEV: Implement gmem hook for initializing private pages
> >       KVM: SEV: Implement gmem hook for invalidating private pages
> >       KVM: x86: Implement hook for determining max NPT mapping level
> >       KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
> >       crypto: ccp: Add the SNP_VLEK_LOAD command
> >
> > Tom Lendacky (1):
> >       KVM: SEV: Support SEV-SNP AP Creation NAE event
> >
> >  Documentation/virt/coco/sev-guest.rst              |   19 +
> >  Documentation/virt/kvm/api.rst                     |   87 ++
> >  .../virt/kvm/x86/amd-memory-encryption.rst         |  110 +-
> >  arch/x86/include/asm/kvm-x86-ops.h                 |    2 +-
> >  arch/x86/include/asm/kvm_host.h                    |    5 +-
> >  arch/x86/include/asm/sev-common.h                  |   25 +
> >  arch/x86/include/asm/sev.h                         |    3 +
> >  arch/x86/include/asm/svm.h                         |    9 +-
> >  arch/x86/include/uapi/asm/kvm.h                    |   48 +
> >  arch/x86/kvm/Kconfig                               |    3 +
> >  arch/x86/kvm/mmu.h                                 |    2 -
> >  arch/x86/kvm/mmu/mmu.c                             |   27 +-
> >  arch/x86/kvm/svm/sev.c                             | 1538 +++++++++++++++++++-
> >  arch/x86/kvm/svm/svm.c                             |   44 +-
> >  arch/x86/kvm/svm/svm.h                             |   52 +
> >  arch/x86/kvm/trace.h                               |   31 +
> >  arch/x86/kvm/x86.c                                 |   17 +
> >  drivers/crypto/ccp/sev-dev.c                       |   36 +
> >  include/linux/psp-sev.h                            |    4 +-
> >  include/uapi/linux/kvm.h                           |   23 +
> >  include/uapi/linux/psp-sev.h                       |   27 +
> >  include/uapi/linux/sev-guest.h                     |    9 +
> >  virt/kvm/guest_memfd.c                             |    4 +-
> >  23 files changed, 2081 insertions(+), 44 deletions(-)
> >
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v15 21/23] KVM: MMU: Disable fast path for private memslots
  2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (20 preceding siblings ...)
  2024-05-07 18:04 ` [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Paolo Bonzini
@ 2024-05-10  1:58 ` Michael Roth
  2024-05-10  1:58   ` [PATCH v15 22/23] KVM: SEV: Fix return code interpretation for RMP nested page faults Michael Roth
                     ` (2 more replies)
  21 siblings, 3 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-10  1:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, papaluri, Isaku Yamahata

For hardware-protected VMs like SEV-SNP guests, certain conditions like
attempting to perform a write to a page which is not in the state that
the guest expects it to be in can result in a nested/extended #PF which
can only be satisfied by the host performing an implicit page state
change to transition the page into the expected shared/private state.
This is generally handled by generating a KVM_EXIT_MEMORY_FAULT event
that gets forwarded to userspace to handle via
KVM_SET_MEMORY_ATTRIBUTES.

However, the fast_page_fault() code might misconstrue this situation as
being the result of a write-protected access, and treat it as a spurious
case when it sees that writes are already allowed for the sPTE. This
results in the KVM MMU trying to resume the guest rather than taking any
action to satisfy the real source of the #PF such as generating a
KVM_EXIT_MEMORY_FAULT, resulting in the guest spinning on nested #PFs.

For now, just skip the fast path for hardware-protected VMs since they
don't currently utilize any of this access-tracking machinery anyway. In
the future, these considerations will need to be taken into account if
there's any need/desire to re-enable the fast path for
hardware-protected VMs.

Since software-protected VMs don't have a notion of a shared vs. private
that's separate from what KVM is tracking, the above
KVM_EXIT_MEMORY_FAULT condition wouldn't occur, so avoid the special
handling for that case for now.

Cc: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/mmu/mmu.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 62ad38b2a8c9..cecd8360378f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3296,7 +3296,7 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
 	return RET_PF_CONTINUE;
 }
 
-static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
+static bool page_fault_can_be_fast(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	/*
 	 * Page faults with reserved bits set, i.e. faults on MMIO SPTEs, only
@@ -3307,6 +3307,32 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
 	if (fault->rsvd)
 		return false;
 
+	/*
+	 * For hardware-protected VMs, certain conditions like attempting to
+	 * perform a write to a page which is not in the state that the guest
+	 * expects it to be in can result in a nested/extended #PF. In this
+	 * case, the below code might misconstrue this situation as being the
+	 * result of a write-protected access, and treat it as a spurious case
+	 * rather than taking any action to satisfy the real source of the #PF
+	 * such as generating a KVM_EXIT_MEMORY_FAULT. This can lead to the
+	 * guest spinning on a #PF indefinitely.
+	 *
+	 * For now, just skip the fast path for hardware-protected VMs since
+	 * they don't currently utilize any of this machinery anyway. In the
+	 * future, these considerations will need to be taken into account if
+	 * there's any need/desire to re-enable the fast path for
+	 * hardware-protected VMs.
+	 *
+	 * Since software-protected VMs don't have a notion of a shared vs.
+	 * private that's separate from what KVM is tracking, the above
+	 * KVM_EXIT_MEMORY_FAULT condition wouldn't occur, so avoid the
+	 * special handling for that case for now.
+	 */
+	if (kvm_slot_can_be_private(fault->slot) &&
+	    !(IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) &&
+	      vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM))
+		return false;
+
 	/*
 	 * #PF can be fast if:
 	 *
@@ -3407,7 +3433,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	u64 *sptep;
 	uint retry_count = 0;
 
-	if (!page_fault_can_be_fast(fault))
+	if (!page_fault_can_be_fast(vcpu, fault))
 		return ret;
 
 	walk_shadow_page_lockless_begin(vcpu);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v15 22/23] KVM: SEV: Fix return code interpretation for RMP nested page faults
  2024-05-10  1:58 ` [PATCH v15 21/23] KVM: MMU: Disable fast path for private memslots Michael Roth
@ 2024-05-10  1:58   ` Michael Roth
  2024-05-10 13:58     ` Sean Christopherson
  2024-05-10  1:58   ` [PATCH v15 23/23] KVM: SEV: Fix PSC handling for SMASH/UNSMASH and partial update ops Michael Roth
  2024-05-10 13:47   ` [PATCH v15 21/23] KVM: MMU: Disable fast path for private memslots Sean Christopherson
  2 siblings, 1 reply; 49+ messages in thread
From: Michael Roth @ 2024-05-10  1:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, papaluri

The intended logic when handling #NPFs with the RMP bit set (31) is to
first check to see if the #NPF requires a shared<->private transition
and, if so, to go ahead and let the corresponding KVM_EXIT_MEMORY_FAULT
get forwarded on to userspace before proceeding with any handling of
other potential RMP fault conditions like needing to PSMASH the RMP
entry/etc (which will be done later if the guest still re-faults after
the KVM_EXIT_MEMORY_FAULT is processed by userspace).

The determination of whether any userspace handling of
KVM_EXIT_MEMORY_FAULT is needed is done by interpreting the return code
of kvm_mmu_page_fault(). However, the current code misinterprets the
return code, expecting 0 to indicate a userspace exit rather than less
than 0 (-EFAULT). This leads to the following unexpected behavior:

  - for KVM_EXIT_MEMORY_FAULTs resulting for implicit shared->private
    conversions, warnings get printed from sev_handle_rmp_fault()
    because it does not expect to be called for GPAs where
    KVM_MEMORY_ATTRIBUTE_PRIVATE is not set. Standard linux guests don't
    generally do this, but it is allowed and should be handled
    similarly to private->shared conversions rather than triggering any
    sort of warnings

  - if gmem support for 2MB folios is enabled (via currently out-of-tree
    code), implicit shared<->private conversions will always result in
    a PSMASH being attempted, even if it's not actually needed to
    resolve the RMP fault. This doesn't cause any harm, but results in a
    needless PSMASH and zapping of the sPTE

Resolve these issues by calling sev_handle_rmp_fault() only when
kvm_mmu_page_fault()'s return code is greater than or equal to 0,
indicating a KVM_MEMORY_EXIT_FAULT/-EFAULT isn't needed. While here,
simplify the code slightly and fix up the associated comments for better
clarity.

Fixes: ccc9d836c5c3 ("KVM: SEV: Add support to handle RMP nested page faults")

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/svm.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 426ad49325d7..9431ce74c7d4 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2070,14 +2070,12 @@ static int npf_interception(struct kvm_vcpu *vcpu)
 				svm->vmcb->control.insn_len);
 
 	/*
-	 * rc == 0 indicates a userspace exit is needed to handle page
-	 * transitions, so do that first before updating the RMP table.
+	 * rc < 0 indicates a userspace exit may be needed to handle page
+	 * attribute updates, so deal with that first before handling other
+	 * potential RMP fault conditions.
 	 */
-	if (error_code & PFERR_GUEST_RMP_MASK) {
-		if (rc == 0)
-			return rc;
+	if (rc >= 0 && error_code & PFERR_GUEST_RMP_MASK)
 		sev_handle_rmp_fault(vcpu, fault_address, error_code);
-	}
 
 	return rc;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v15 23/23] KVM: SEV: Fix PSC handling for SMASH/UNSMASH and partial update ops
  2024-05-10  1:58 ` [PATCH v15 21/23] KVM: MMU: Disable fast path for private memslots Michael Roth
  2024-05-10  1:58   ` [PATCH v15 22/23] KVM: SEV: Fix return code interpretation for RMP nested page faults Michael Roth
@ 2024-05-10  1:58   ` Michael Roth
  2024-05-10 17:09     ` Paolo Bonzini
  2024-05-10 13:47   ` [PATCH v15 21/23] KVM: MMU: Disable fast path for private memslots Sean Christopherson
  2 siblings, 1 reply; 49+ messages in thread
From: Michael Roth @ 2024-05-10  1:58 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, papaluri

There are a few edge-cases that the current processing for GHCB PSC
requests doesn't handle properly:

 - KVM properly ignores SMASH/UNSMASH ops when they are embedded in a
   PSC request buffer which contains other PSC operations, but
   inadvertantly forwards them to userspace as private->shared PSC
   requests if they appear at the end of the buffer. Make sure these are
   ignored instead, just like cases where they are not at the end of the
   request buffer.

 - Current code handles non-zero 'cur_page' fields when they are at the
   beginning of a new GPA range, but it is not handling properly when
   iterating through subsequent entries which are otherwise part of a
   contiguous range. Fix up the handling so that these entries are not
   combined into a larger contiguous range that include unintended GPA
   ranges and are instead processed later as the start of a new
   contiguous range.

 - The page size variable used to track 2M entries in KVM for inflight PSCs
   might be artifically set to a different value, which can lead to
   unexpected values in the entry's final 'cur_page' update. Use the
   entry's 'pagesize' field instead to determine what the value of
   'cur_page' should be upon completion of processing.

While here, also add a small helper for clearing in-flight PSCs
variables and fix up comments for better readability.

Fixes: 266205d810d2 ("KVM: SEV: Add support to handle Page State Change VMGEXIT")
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 73 +++++++++++++++++++++++++++---------------
 1 file changed, 47 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 35f0bd91f92e..ab23329e2bd0 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3555,43 +3555,50 @@ struct psc_buffer {
 
 static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc);
 
-static int snp_complete_psc(struct kvm_vcpu *vcpu)
+static void snp_reset_inflight_psc(struct vcpu_svm *svm)
+{
+	svm->sev_es.psc_idx = 0;
+	svm->sev_es.psc_inflight = 0;
+	svm->sev_es.psc_2m = false;
+}
+
+static void __snp_complete_psc(struct vcpu_svm *svm)
 {
-	struct vcpu_svm *svm = to_svm(vcpu);
 	struct psc_buffer *psc = svm->sev_es.ghcb_sa;
 	struct psc_entry *entries = psc->entries;
 	struct psc_hdr *hdr = &psc->hdr;
-	__u64 psc_ret;
 	__u16 idx;
 
-	if (vcpu->run->hypercall.ret) {
-		psc_ret = VMGEXIT_PSC_ERROR_GENERIC;
-		goto out_resume;
-	}
-
 	/*
 	 * Everything in-flight has been processed successfully. Update the
-	 * corresponding entries in the guest's PSC buffer.
+	 * corresponding entries in the guest's PSC buffer and zero out the
+	 * count of in-flight PSC entries.
 	 */
 	for (idx = svm->sev_es.psc_idx; svm->sev_es.psc_inflight;
 	     svm->sev_es.psc_inflight--, idx++) {
 		struct psc_entry *entry = &entries[idx];
 
-		entry->cur_page = svm->sev_es.psc_2m ? 512 : 1;
+		entry->cur_page = entry->pagesize ? 512 : 1;
 	}
 
 	hdr->cur_entry = idx;
+}
 
-	/* Handle the next range (if any). */
-	return snp_begin_psc(svm, psc);
+static int snp_complete_psc(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	struct psc_buffer *psc = svm->sev_es.ghcb_sa;
 
-out_resume:
-	svm->sev_es.psc_idx = 0;
-	svm->sev_es.psc_inflight = 0;
-	svm->sev_es.psc_2m = false;
-	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, psc_ret);
+	if (vcpu->run->hypercall.ret) {
+		snp_reset_inflight_psc(svm);
+		ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, VMGEXIT_PSC_ERROR_GENERIC);
+		return 1; /* resume guest */
+	}
 
-	return 1; /* resume guest */
+	__snp_complete_psc(svm);
+
+	/* Handle the next range (if any). */
+	return snp_begin_psc(svm, psc);
 }
 
 static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
@@ -3634,6 +3641,7 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
 		goto out_resume;
 	}
 
+next_range:
 	/* Find the start of the next range which needs processing. */
 	for (idx = idx_start; idx <= idx_end; idx++, hdr->cur_entry++) {
 		__u16 cur_page;
@@ -3642,11 +3650,6 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
 
 		entry_start = entries[idx];
 
-		/* Only private/shared conversions are currently supported. */
-		if (entry_start.operation != VMGEXIT_PSC_OP_PRIVATE &&
-		    entry_start.operation != VMGEXIT_PSC_OP_SHARED)
-			continue;
-
 		gfn = entry_start.gfn;
 		cur_page = entry_start.cur_page;
 		huge = entry_start.pagesize;
@@ -3687,6 +3690,7 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
 
 		if (entry.operation != entry_start.operation ||
 		    entry.gfn != entry_start.gfn + npages ||
+		    entry.cur_page != 0 ||
 		    !!entry.pagesize != svm->sev_es.psc_2m)
 			break;
 
@@ -3694,6 +3698,25 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
 		npages += entry_start.pagesize ? 512 : 1;
 	}
 
+	/*
+	 * Only shared/private PSC operations are currently supported, so if the
+	 * entire range consists of unsupported operations (e.g. SMASH/UNSMASH),
+	 * then consider the entire range completed and avoid exiting to
+	 * userspace. In theory snp_complete_psc() can always be called directly
+	 * at this point to complete the current range and start the next one,
+	 * but that could lead to unexpected levels of recursion, so only do
+	 * that if there are no more entries to process and the entire request
+	 * has been completed.
+	 */
+	if (entry_start.operation != VMGEXIT_PSC_OP_PRIVATE &&
+	    entry_start.operation != VMGEXIT_PSC_OP_SHARED) {
+		if (idx > idx_end)
+			return snp_complete_psc(vcpu);
+
+		__snp_complete_psc(svm);
+		goto next_range;
+	}
+
 	vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
 	vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
 	vcpu->run->hypercall.args[0] = gpa;
@@ -3709,9 +3732,7 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
 	return 0; /* forward request to userspace */
 
 out_resume:
-	svm->sev_es.psc_idx = 0;
-	svm->sev_es.psc_inflight = 0;
-	svm->sev_es.psc_2m = false;
+	snp_reset_inflight_psc(svm);
 	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, psc_ret);
 
 	return 1; /* resume guest */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2024-05-07 18:14   ` Michael Roth
@ 2024-05-10  2:34     ` Michael Roth
  0 siblings, 0 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-10  2:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, papaluri

On Tue, May 07, 2024 at 01:14:24PM -0500, Michael Roth wrote:
> On Tue, May 07, 2024 at 08:04:50PM +0200, Paolo Bonzini wrote:
> > On Wed, May 1, 2024 at 11:03 AM Michael Roth <michael.roth@amd.com> wrote:
> > >
> > > This patchset is also available at:
> > >
> > >   https://github.com/amdese/linux/commits/snp-host-v15
> > >
> > > and is based on top of the series:
> > >
> > >   "Add SEV-ES hypervisor support for GHCB protocol version 2"
> > >   https://lore.kernel.org/kvm/20240501071048.2208265-1-michael.roth@amd.com/
> > >   https://github.com/amdese/linux/commits/sev-init2-ghcb-v1
> > >
> > > which in turn is based on commit 20cc50a0410f (just before v14 SNP patches):
> > >
> > >   https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=kvm-coco-queue
> > 
> > I have mostly reviewed this, with the exception of the
> > snp_begin/complete_psc parts.
> 
> Thanks Paolo. We actually recently uncovered some issues with
> snp_begin/complete_psc using some internal kvm-unit-tests that exercise
> some edge cases, so I would hold off on reviewing that. Will send a
> fix-up patch today after a bit more testing.

In the process if adding some additional units tests we ran uncovered a
couple of other issues in addition to the fixups for PSC:

  [PATCH 21/20] KVM: MMU: Disable fast path for private memslots

    addresses an issue with fast_page_fault() handling that can lead to
    KVM_EXIT_MEMORY_FAULT cases being treated as spurious #NPFs which
    results in the guest spinning forever. This seems like it could be
    generally needed for both SNP/TDX, and would likely replace the need
    for this patch from the TDX series:
    
      KVM: x86/mmu: Disallow fast page fault on private GPA
      https://lore.kernel.org/lkml/91c797997b57056224571e22362321a23947172f.1705965635.git.isaku.yamahata@intel.com/

    This is a standalone patch and not really a fixup for anything

  [PATCH 22/20] KVM: SEV: Fix return code interpretation for RMP nested page faults

    addresses an issue where the return code of kvm_mmu_page_fault() was
    being misinterpreted, leading to sev_handle_rmp_fault() being called
    unecessarily in some cases. Interestingly, because
    sev_handle_rmp_fault() results in zapping sPTEs after PSMASH'ing them,
    this bug was hiding the issue addressed in the above PATCH 21 by
    forcing the fast path to get skipped. This can be squashed into:

      KVM: SEV: Add support to handle RMP nested page faults

  [PATCH 23/20] KVM: SEV: Fix PSC handling for SMASH/UNSMASH and partial update ops 

    fixes up the GHCB PSC handling code to address a number of situations
    that aren't triggered by normal SNP guests, but are allowed by the
    GHCB spec and could become issues with future/other guest
    implementations. This can be squashed into:

      KVM: SEV: Add support to handle Page State Change VMGEXIT

I've sent them all as a response to this series, but have them available
here applied on top of the your current kvm/queue (commit 15889fca49df):

  https://github.com/mdroth/linux/commits/snp-host-v15c2-unsquashed
  (the patch at the top can be ignored, it's only for testing 2MB gmem
   backing pages)

I've also put together a branch with the patches already squashed in
(except for "KVM: MMU: Disable fast path for private memslots" which is
a standalone patch that is likely applicable to both TDX and SNP, so
I've simply moved it to the beginning of the SNP series)

  https://github.com/mdroth/linux/commits/snp-host-v15c2

Sorry for the late fixes. Let me know if you want me to submit any of
these by some other means.

-Mike


> 
> 
> -Mike
> 
> > 
> > Paolo
> > 
> > 
> > > Patch Layout
> > > ------------
> > >
> > > 01-02: These patches revert+replace the existing .gmem_validate_fault hook
> > >        with a similar .private_max_mapping_level as suggested by Sean[1]
> > >
> > > 03-04: These patches add some basic infrastructure and introduces a new
> > >        KVM_X86_SNP_VM vm_type to handle differences verses the existing
> > >        KVM_X86_SEV_VM and KVM_X86_SEV_ES_VM types.
> > >
> > > 05-07: These implement the KVM API to handle the creation of a
> > >        cryptographic launch context, encrypt/measure the initial image
> > >        into guest memory, and finalize it before launching it.
> > >
> > > 08-12: These implement handling for various guest-generated events such
> > >        as page state changes, onlining of additional vCPUs, etc.
> > >
> > > 13-16: These implement the gmem/mmu hooks needed to prepare gmem-allocated
> > >        pages before mapping them into guest private memory ranges as
> > >        well as cleaning them up prior to returning them to the host for
> > >        use as normal memory. Because this supplants certain activities
> > >        like issued WBINVDs during KVM MMU invalidations, there's also
> > >        a patch to avoid duplicating that work to avoid unecessary
> > >        overhead.
> > >
> > > 17:    With all the core support in place, the patch adds a kvm_amd module
> > >        parameter to enable SNP support.
> > >
> > > 18-20: These patches all deal with the servicing of guest requests to handle
> > >        things like attestation, as well as some related host-management
> > >        interfaces.
> > >
> > > [1] https://lore.kernel.org/kvm/ZimnngU7hn7sKoSc@google.com/#t
> > >
> > >
> > > Testing
> > > -------
> > >
> > > For testing this via QEMU, use the following tree:
> > >
> > >   https://github.com/amdese/qemu/commits/snp-v4-wip3c
> > >
> > > A patched OVMF is also needed due to upstream KVM no longer supporting MMIO
> > > ranges that are mapped as private. It is recommended you build the AmdSevX64
> > > variant as it provides the kernel-hashing support present in this series:
> > >
> > >   https://github.com/amdese/ovmf/commits/apic-mmio-fix1d
> > >
> > > A basic command-line invocation for SNP would be:
> > >
> > >  qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
> > >   -machine q35,confidential-guest-support=sev0,memory-backend=ram1
> > >   -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
> > >   -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=
> > >   -bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d-AmdSevX64.fd
> > >
> > > With kernel-hashing and certificate data supplied:
> > >
> > >  qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
> > >   -machine q35,confidential-guest-support=sev0,memory-backend=ram1
> > >   -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
> > >   -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=,certs-path=/home/mroth/cert.blob,kernel-hashes=on
> > >   -bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d-AmdSevX64.fd
> > >   -kernel /boot/vmlinuz-$ver
> > >   -initrd /boot/initrd.img-$ver
> > >   -append "root=UUID=d72a6d1c-06cf-4b79-af43-f1bac4f620f9 ro console=ttyS0,115200n8"
> > >
> > > With standard X64 OVMF package with separate image for persistent NVRAM:
> > >
> > >  qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
> > >   -machine q35,confidential-guest-support=sev0,memory-backend=ram1
> > >   -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
> > >   -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=
> > >   -bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d.fd
> > >   -drive if=pflash,format=raw,unit=0,file=OVMF_VARS-upstream-20240410-apic-mmio-fix1d.fd,readonly=off
> > >
> > >
> > > Known issues / TODOs
> > > --------------------
> > >
> > >  * Base tree in some cases reports "Unpatched return thunk in use. This should
> > >    not happen!" the first time it runs an SVM/SEV/SNP guests. This a recent
> > >    regression upstream and unrelated to this series:
> > >
> > >      https://lore.kernel.org/linux-kernel/CANpmjNOcKzEvLHoGGeL-boWDHJobwfwyVxUqMq2kWeka3N4tXA@mail.gmail.com/T/
> > >
> > >  * 2MB hugepage support has been dropped pending discussion on how we plan to
> > >    re-enable it in gmem.
> > >
> > >  * Host kexec should work, but there is a known issue with host kdump support
> > >    while SNP guests are running that will be addressed as a follow-up.
> > >
> > >  * SNP kselftests are currently a WIP and will be included as part of SNP
> > >    upstreaming efforts in the near-term.
> > >
> > >
> > > SEV-SNP Overview
> > > ----------------
> > >
> > > This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
> > > changes required to add KVM support for SEV-SNP. This series builds upon
> > > SEV-SNP guest support, which is now in mainline, and and SEV-SNP host
> > > initialization support, which is now in linux-next.
> > >
> > > While series provides the basic building blocks to support booting the
> > > SEV-SNP VMs, it does not cover all the security enhancement introduced by
> > > the SEV-SNP such as interrupt protection, which will added in the future.
> > >
> > > With SNP, when pages are marked as guest-owned in the RMP table, they are
> > > assigned to a specific guest/ASID, as well as a specific GFN with in the
> > > guest. Any attempts to map it in the RMP table to a different guest/ASID,
> > > or a different GFN within a guest/ASID, will result in an RMP nested page
> > > fault.
> > >
> > > Prior to accessing a guest-owned page, the guest must validate it with a
> > > special PVALIDATE instruction which will set a special bit in the RMP table
> > > for the guest. This is the only way to set the validated bit outside of the
> > > initial pre-encrypted guest payload/image; any attempts outside the guest to
> > > modify the RMP entry from that point forward will result in the validated
> > > bit being cleared, at which point the guest will trigger an exception if it
> > > attempts to access that page so it can be made aware of possible tampering.
> > >
> > > One exception to this is the initial guest payload, which is pre-validated
> > > by the firmware prior to launching. The guest can use Guest Message requests
> > > to fetch an attestation report which will include the measurement of the
> > > initial image so that the guest can verify it was booted with the expected
> > > image/environment.
> > >
> > > After boot, guests can use Page State Change requests to switch pages
> > > between shared/hypervisor-owned and private/guest-owned to share data for
> > > things like DMA, virtio buffers, and other GHCB requests.
> > >
> > > In this implementation of SEV-SNP, private guest memory is managed by a new
> > > kernel framework called guest_memfd (gmem). With gmem, a new
> > > KVM_SET_MEMORY_ATTRIBUTES KVM ioctl has been added to tell the KVM
> > > MMU whether a particular GFN should be backed by shared (normal) memory or
> > > private (gmem-allocated) memory. To tie into this, Page State Change
> > > requests are forward to userspace via KVM_EXIT_VMGEXIT exits, which will
> > > then issue the corresponding KVM_SET_MEMORY_ATTRIBUTES call to set the
> > > private/shared state in the KVM MMU.
> > >
> > > The gmem / KVM MMU hooks implemented in this series will then update the RMP
> > > table entries for the backing PFNs to set them to guest-owned/private when
> > > mapping private pages into the guest via KVM MMU, or use the normal KVM MMU
> > > handling in the case of shared pages where the corresponding RMP table
> > > entries are left in the default shared/hypervisor-owned state.
> > >
> > > Feedback/review is very much appreciated!
> > >
> > > -Mike
> > >
> > >
> > > Changes since v14:
> > >
> > >  * switch to vendor-agnostic KVM_HC_MAP_GPA_RANGE exit for forwarding
> > >    page-state change requests to userspace instead of an SNP-specific exit
> > >    (Sean)
> > >  * drop SNP_PAUSE_ATTESTATION/SNP_RESUME_ATTESTATION interfaces, instead
> > >    add handling in KVM_EXIT_VMGEXIT so that VMMs can implement their own
> > >    mechanisms for keeping userspace-supplied certificates in-sync with
> > >    firmware's TCB/endorsement key (Sean)
> > >  * carve out SEV-ES-specific handling for GHCB protocol 2, add control of
> > >    the protocol version, and post as a separate prereq patchset (Sean)
> > >  * use more consistent error-handling in snp_launch_{start,update,finish},
> > >    simplify logic based on review comments (Sean)
> > >  * rename .gmem_validate_fault to .private_max_mapping_level and rework
> > >    logic based on review suggestions (Sean)
> > >  * reduce number of pr_debug()'s in series, avoid multiple WARN's in
> > >    succession (Sean)
> > >  * improve documentation and comments throughout
> > >
> > > Changes since v13:
> > >
> > >  * rebase to new kvm-coco-queue and wire up to PFERR_PRIVATE_ACCESS (Paolo)
> > >  * handle setting kvm->arch.has_private_mem in same location as
> > >    kvm->arch.has_protected_state (Paolo)
> > >  * add flags and additional padding fields to
> > >    snp_launch{start,update,finish} APIs to address alignment and
> > >    expandability (Paolo)
> > >  * update snp_launch_update() to update input struct values to reflect
> > >    current progress of command in situations where mulitple calls are
> > >    needed (Paolo)
> > >  * update snp_launch_update() to avoid copying/accessing 'src' parameter
> > >    when dealing with zero pages. (Paolo)
> > >  * update snp_launch_update() to use u64 as length input parameter instead
> > >    of u32 and adjust padding accordingly
> > >  * modify ordering of SNP_POLICY_MASK_* definitions to be consistent with
> > >    bit order of corresponding flags
> > >  * let firmware handle enforcement of policy bits corresponding to
> > >    user-specified minimum API version
> > >  * add missing "0x" prefixs in pr_debug()'s for snp_launch_start()
> > >  * fix handling of VMSAs during in-place migration (Paolo)
> > >
> > > Changes since v12:
> > >
> > >  * rebased to latest kvm-coco-queue branch (commit 4d2deb62185f)
> > >  * add more input validation for SNP_LAUNCH_START, especially for handling
> > >    things like MBO/MBZ policy bits, and API major/minor minimums. (Paolo)
> > >  * block SNP KVM instances from being able to run legacy SEV commands (Paolo)
> > >  * don't attempt to measure VMSA for vcpu 0/BSP before the others, let
> > >    userspace deal with the ordering just like with SEV-ES (Paolo)
> > >  * fix up docs for SNP_LAUNCH_FINISH (Paolo)
> > >  * introduce svm->sev_es.snp_has_guest_vmsa flag to better distinguish
> > >    handling for guest-mapped vs non-guest-mapped VMSAs, rename
> > >    'snp_ap_create' flag to 'snp_ap_waiting_for_reset' (Paolo)
> > >  * drop "KVM: SEV: Use a VMSA physical address variable for populating VMCB"
> > >    as it is no longer needed due to above VMSA rework
> > >  * replace pr_debug_ratelimited() messages for RMP #NPFs with a single trace
> > >    event
> > >  * handle transient PSMASH_FAIL_INUSE return codes in kvm_gmem_invalidate(),
> > >    switch to WARN_ON*()'s to indicate remaining error cases are not expected
> > >    and should not be seen in practice. (Paolo)
> > >  * add a cond_resched() in kvm_gmem_invalidate() to avoid soft lock-ups when
> > >    cleaning up large guest memory ranges.
> > >  * rename VLEK_REQUIRED to VCEK_DISABLE. it's be more applicable if another
> > >    key type ever gets added.
> > >  * don't allow attestation to be paused while an attestation request is
> > >    being processed by firmware (Tom)
> > >  * add missing Documentation entry for SNP_VLEK_LOAD
> > >  * collect Reviewed-by's from Paolo and Tom
> > >
> > >
> > > ----------------------------------------------------------------
> > > Ashish Kalra (1):
> > >       KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP
> > >
> > > Brijesh Singh (8):
> > >       KVM: SEV: Add initial SEV-SNP support
> > >       KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
> > >       KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
> > >       KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
> > >       KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
> > >       KVM: SEV: Add support to handle RMP nested page faults
> > >       KVM: SVM: Add module parameter to enable SEV-SNP
> > >       KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
> > >
> > > Michael Roth (10):
> > >       Revert "KVM: x86: Add gmem hook for determining max NPT mapping level"
> > >       KVM: x86: Add hook for determining max NPT mapping level
> > >       KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y
> > >       KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
> > >       KVM: SEV: Add support to handle Page State Change VMGEXIT
> > >       KVM: SEV: Implement gmem hook for initializing private pages
> > >       KVM: SEV: Implement gmem hook for invalidating private pages
> > >       KVM: x86: Implement hook for determining max NPT mapping level
> > >       KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
> > >       crypto: ccp: Add the SNP_VLEK_LOAD command
> > >
> > > Tom Lendacky (1):
> > >       KVM: SEV: Support SEV-SNP AP Creation NAE event
> > >
> > >  Documentation/virt/coco/sev-guest.rst              |   19 +
> > >  Documentation/virt/kvm/api.rst                     |   87 ++
> > >  .../virt/kvm/x86/amd-memory-encryption.rst         |  110 +-
> > >  arch/x86/include/asm/kvm-x86-ops.h                 |    2 +-
> > >  arch/x86/include/asm/kvm_host.h                    |    5 +-
> > >  arch/x86/include/asm/sev-common.h                  |   25 +
> > >  arch/x86/include/asm/sev.h                         |    3 +
> > >  arch/x86/include/asm/svm.h                         |    9 +-
> > >  arch/x86/include/uapi/asm/kvm.h                    |   48 +
> > >  arch/x86/kvm/Kconfig                               |    3 +
> > >  arch/x86/kvm/mmu.h                                 |    2 -
> > >  arch/x86/kvm/mmu/mmu.c                             |   27 +-
> > >  arch/x86/kvm/svm/sev.c                             | 1538 +++++++++++++++++++-
> > >  arch/x86/kvm/svm/svm.c                             |   44 +-
> > >  arch/x86/kvm/svm/svm.h                             |   52 +
> > >  arch/x86/kvm/trace.h                               |   31 +
> > >  arch/x86/kvm/x86.c                                 |   17 +
> > >  drivers/crypto/ccp/sev-dev.c                       |   36 +
> > >  include/linux/psp-sev.h                            |    4 +-
> > >  include/uapi/linux/kvm.h                           |   23 +
> > >  include/uapi/linux/psp-sev.h                       |   27 +
> > >  include/uapi/linux/sev-guest.h                     |    9 +
> > >  virt/kvm/guest_memfd.c                             |    4 +-
> > >  23 files changed, 2081 insertions(+), 44 deletions(-)
> > >
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 21/23] KVM: MMU: Disable fast path for private memslots
  2024-05-10  1:58 ` [PATCH v15 21/23] KVM: MMU: Disable fast path for private memslots Michael Roth
  2024-05-10  1:58   ` [PATCH v15 22/23] KVM: SEV: Fix return code interpretation for RMP nested page faults Michael Roth
  2024-05-10  1:58   ` [PATCH v15 23/23] KVM: SEV: Fix PSC handling for SMASH/UNSMASH and partial update ops Michael Roth
@ 2024-05-10 13:47   ` Sean Christopherson
  2024-05-10 13:50     ` Paolo Bonzini
  2 siblings, 1 reply; 49+ messages in thread
From: Sean Christopherson @ 2024-05-10 13:47 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, papaluri, Isaku Yamahata

On Thu, May 09, 2024, Michael Roth wrote:
> ---
>  arch/x86/kvm/mmu/mmu.c | 30 ++++++++++++++++++++++++++++--
>  1 file changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 62ad38b2a8c9..cecd8360378f 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3296,7 +3296,7 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
>  	return RET_PF_CONTINUE;
>  }
>  
> -static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
> +static bool page_fault_can_be_fast(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  {
>  	/*
>  	 * Page faults with reserved bits set, i.e. faults on MMIO SPTEs, only
> @@ -3307,6 +3307,32 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
>  	if (fault->rsvd)
>  		return false;
>  
> +	/*
> +	 * For hardware-protected VMs, certain conditions like attempting to
> +	 * perform a write to a page which is not in the state that the guest
> +	 * expects it to be in can result in a nested/extended #PF. In this
> +	 * case, the below code might misconstrue this situation as being the
> +	 * result of a write-protected access, and treat it as a spurious case
> +	 * rather than taking any action to satisfy the real source of the #PF
> +	 * such as generating a KVM_EXIT_MEMORY_FAULT. This can lead to the
> +	 * guest spinning on a #PF indefinitely.
> +	 *
> +	 * For now, just skip the fast path for hardware-protected VMs since
> +	 * they don't currently utilize any of this machinery anyway. In the
> +	 * future, these considerations will need to be taken into account if
> +	 * there's any need/desire to re-enable the fast path for
> +	 * hardware-protected VMs.
> +	 *
> +	 * Since software-protected VMs don't have a notion of a shared vs.
> +	 * private that's separate from what KVM is tracking, the above
> +	 * KVM_EXIT_MEMORY_FAULT condition wouldn't occur, so avoid the
> +	 * special handling for that case for now.

Very technically, it can occur if userspace _just_ modified the attributes.  And
as I've said multiple times, at least for now, I want to avoid special casing
SW-protected VMs unless it is *absolutely* necessary, because their sole purpose
is to allow testing flows that are impossible to excercise without SNP/TDX hardware.

> +	 */
> +	if (kvm_slot_can_be_private(fault->slot) &&
> +	    !(IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) &&
> +	      vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM))

Heh, !(x && y) kills me, I misread this like 4 times.

Anyways, I don't like the heuristic.  It doesn't tie the restriction back to the
cause in any reasonable way.  Can't this simply be?

	if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)
		return false;

Which is much, much more self-explanatory.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 21/23] KVM: MMU: Disable fast path for private memslots
  2024-05-10 13:47   ` [PATCH v15 21/23] KVM: MMU: Disable fast path for private memslots Sean Christopherson
@ 2024-05-10 13:50     ` Paolo Bonzini
  2024-05-10 15:27       ` Michael Roth
  0 siblings, 1 reply; 49+ messages in thread
From: Paolo Bonzini @ 2024-05-10 13:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, papaluri, Isaku Yamahata

On Fri, May 10, 2024 at 3:47 PM Sean Christopherson <seanjc@google.com> wrote:
>
> > +      * Since software-protected VMs don't have a notion of a shared vs.
> > +      * private that's separate from what KVM is tracking, the above
> > +      * KVM_EXIT_MEMORY_FAULT condition wouldn't occur, so avoid the
> > +      * special handling for that case for now.
>
> Very technically, it can occur if userspace _just_ modified the attributes.  And
> as I've said multiple times, at least for now, I want to avoid special casing
> SW-protected VMs unless it is *absolutely* necessary, because their sole purpose
> is to allow testing flows that are impossible to excercise without SNP/TDX hardware.

Yep, it is not like they have to be optimized.

> > +      */
> > +     if (kvm_slot_can_be_private(fault->slot) &&
> > +         !(IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) &&
> > +           vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM))
>
> Heh, !(x && y) kills me, I misread this like 4 times.
>
> Anyways, I don't like the heuristic.  It doesn't tie the restriction back to the
> cause in any reasonable way.  Can't this simply be?
>
>         if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)
>                 return false;

You beat me to it by seconds. And it can also be guarded by a check on
kvm->arch.has_private_mem to avoid the attributes lookup.

> Which is much, much more self-explanatory.

Both more self-explanatory and more correct.

Paolo


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 22/23] KVM: SEV: Fix return code interpretation for RMP nested page faults
  2024-05-10  1:58   ` [PATCH v15 22/23] KVM: SEV: Fix return code interpretation for RMP nested page faults Michael Roth
@ 2024-05-10 13:58     ` Sean Christopherson
  2024-05-10 15:36       ` Michael Roth
  2024-05-10 16:01       ` Paolo Bonzini
  0 siblings, 2 replies; 49+ messages in thread
From: Sean Christopherson @ 2024-05-10 13:58 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, papaluri

On Thu, May 09, 2024, Michael Roth wrote:
> The intended logic when handling #NPFs with the RMP bit set (31) is to
> first check to see if the #NPF requires a shared<->private transition
> and, if so, to go ahead and let the corresponding KVM_EXIT_MEMORY_FAULT
> get forwarded on to userspace before proceeding with any handling of
> other potential RMP fault conditions like needing to PSMASH the RMP
> entry/etc (which will be done later if the guest still re-faults after
> the KVM_EXIT_MEMORY_FAULT is processed by userspace).
> 
> The determination of whether any userspace handling of
> KVM_EXIT_MEMORY_FAULT is needed is done by interpreting the return code
> of kvm_mmu_page_fault(). However, the current code misinterprets the
> return code, expecting 0 to indicate a userspace exit rather than less
> than 0 (-EFAULT). This leads to the following unexpected behavior:
> 
>   - for KVM_EXIT_MEMORY_FAULTs resulting for implicit shared->private
>     conversions, warnings get printed from sev_handle_rmp_fault()
>     because it does not expect to be called for GPAs where
>     KVM_MEMORY_ATTRIBUTE_PRIVATE is not set. Standard linux guests don't
>     generally do this, but it is allowed and should be handled
>     similarly to private->shared conversions rather than triggering any
>     sort of warnings
> 
>   - if gmem support for 2MB folios is enabled (via currently out-of-tree
>     code), implicit shared<->private conversions will always result in
>     a PSMASH being attempted, even if it's not actually needed to
>     resolve the RMP fault. This doesn't cause any harm, but results in a
>     needless PSMASH and zapping of the sPTE
> 
> Resolve these issues by calling sev_handle_rmp_fault() only when
> kvm_mmu_page_fault()'s return code is greater than or equal to 0,
> indicating a KVM_MEMORY_EXIT_FAULT/-EFAULT isn't needed. While here,
> simplify the code slightly and fix up the associated comments for better
> clarity.
> 
> Fixes: ccc9d836c5c3 ("KVM: SEV: Add support to handle RMP nested page faults")
> 
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/kvm/svm/svm.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 426ad49325d7..9431ce74c7d4 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2070,14 +2070,12 @@ static int npf_interception(struct kvm_vcpu *vcpu)
>  				svm->vmcb->control.insn_len);
>  
>  	/*
> -	 * rc == 0 indicates a userspace exit is needed to handle page
> -	 * transitions, so do that first before updating the RMP table.
> +	 * rc < 0 indicates a userspace exit may be needed to handle page
> +	 * attribute updates, so deal with that first before handling other
> +	 * potential RMP fault conditions.
>  	 */
> -	if (error_code & PFERR_GUEST_RMP_MASK) {
> -		if (rc == 0)
> -			return rc;
> +	if (rc >= 0 && error_code & PFERR_GUEST_RMP_MASK)

This isn't correct either.  A return of '0' also indiciates "exit to userspace",
it just doesn't happen with SNP because '0' is returned only when KVM attempts
emulation, and that too gets short-circuited by svm_check_emulate_instruction().

And I would honestly drop the comment, KVM's less-than-pleasant 1/0/-errno return
values overload is ubiquitous enough that it should be relatively self-explanatory.

Or if you prefer to keep a comment, drop the part that specifically calls out
attributes updates, because that incorrectly implies that's the _only_ reason
why KVM checks the return.  But my vote is to drop the comment, because it
essentially becomes "don't proceed to step 2 if step 1 failed", which kind of
makes the reader go "well, yeah".

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 21/23] KVM: MMU: Disable fast path for private memslots
  2024-05-10 13:50     ` Paolo Bonzini
@ 2024-05-10 15:27       ` Michael Roth
  2024-05-10 15:59         ` Sean Christopherson
  0 siblings, 1 reply; 49+ messages in thread
From: Michael Roth @ 2024-05-10 15:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, kvm, linux-coco, linux-mm, linux-crypto,
	x86, linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa,
	ardb, vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, papaluri, Isaku Yamahata

On Fri, May 10, 2024 at 03:50:26PM +0200, Paolo Bonzini wrote:
> On Fri, May 10, 2024 at 3:47 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > > +      * Since software-protected VMs don't have a notion of a shared vs.
> > > +      * private that's separate from what KVM is tracking, the above
> > > +      * KVM_EXIT_MEMORY_FAULT condition wouldn't occur, so avoid the
> > > +      * special handling for that case for now.
> >
> > Very technically, it can occur if userspace _just_ modified the attributes.  And
> > as I've said multiple times, at least for now, I want to avoid special casing
> > SW-protected VMs unless it is *absolutely* necessary, because their sole purpose
> > is to allow testing flows that are impossible to excercise without SNP/TDX hardware.
> 
> Yep, it is not like they have to be optimized.

Ok, I thought there were maybe some future plans to use sw-protected VMs
to get some added protections from userspace. But even then there'd
probably still be extra considerations for how to handle access tracking
so white-listing them probably isn't right anyway.

I was also partly tempted to take this route because it would cover this
TDX patch as well:

  https://lore.kernel.org/lkml/91c797997b57056224571e22362321a23947172f.1705965635.git.isaku.yamahata@intel.com/

and avoid any weirdness about checking kvm_mem_is_private() without
checking mmu_invalidate_seq, but I think those cases all end up
resolving themselves eventually and added some comments around that.

> 
> > > +      */
> > > +     if (kvm_slot_can_be_private(fault->slot) &&
> > > +         !(IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) &&
> > > +           vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM))
> >
> > Heh, !(x && y) kills me, I misread this like 4 times.
> >
> > Anyways, I don't like the heuristic.  It doesn't tie the restriction back to the
> > cause in any reasonable way.  Can't this simply be?
> >
> >         if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)
> >                 return false;
> 
> You beat me to it by seconds. And it can also be guarded by a check on
> kvm->arch.has_private_mem to avoid the attributes lookup.

I re-tested with things implemented this way and everything seems to
look good. It's not clear to me whether this would cover the cases the
above-mentioned TDX patch handles, but no biggie if that's still needed.

The new version of the patch is here:

  https://github.com/mdroth/linux/commit/39643f9f6da6265d39d633a703c53997985c1208

And I've updated my branches with to replace the old patch and also
incorporate Sean's suggestions for patch 22:

  https://github.com/mdroth/linux/commits/snp-host-v15c3-unsquashed

and have them here with things already squashed in/relocated:

  https://github.com/mdroth/linux/commits/snp-host-v15c3

Thanks for the feedback Sean, Paolo.

-Mike
  
> 
> > Which is much, much more self-explanatory.
> 
> Both more self-explanatory and more correct.
> 
> Paolo
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 22/23] KVM: SEV: Fix return code interpretation for RMP nested page faults
  2024-05-10 13:58     ` Sean Christopherson
@ 2024-05-10 15:36       ` Michael Roth
  2024-05-10 16:01       ` Paolo Bonzini
  1 sibling, 0 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-10 15:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, papaluri

On Fri, May 10, 2024 at 06:58:45AM -0700, Sean Christopherson wrote:
> On Thu, May 09, 2024, Michael Roth wrote:
> > The intended logic when handling #NPFs with the RMP bit set (31) is to
> > first check to see if the #NPF requires a shared<->private transition
> > and, if so, to go ahead and let the corresponding KVM_EXIT_MEMORY_FAULT
> > get forwarded on to userspace before proceeding with any handling of
> > other potential RMP fault conditions like needing to PSMASH the RMP
> > entry/etc (which will be done later if the guest still re-faults after
> > the KVM_EXIT_MEMORY_FAULT is processed by userspace).
> > 
> > The determination of whether any userspace handling of
> > KVM_EXIT_MEMORY_FAULT is needed is done by interpreting the return code
> > of kvm_mmu_page_fault(). However, the current code misinterprets the
> > return code, expecting 0 to indicate a userspace exit rather than less
> > than 0 (-EFAULT). This leads to the following unexpected behavior:
> > 
> >   - for KVM_EXIT_MEMORY_FAULTs resulting for implicit shared->private
> >     conversions, warnings get printed from sev_handle_rmp_fault()
> >     because it does not expect to be called for GPAs where
> >     KVM_MEMORY_ATTRIBUTE_PRIVATE is not set. Standard linux guests don't
> >     generally do this, but it is allowed and should be handled
> >     similarly to private->shared conversions rather than triggering any
> >     sort of warnings
> > 
> >   - if gmem support for 2MB folios is enabled (via currently out-of-tree
> >     code), implicit shared<->private conversions will always result in
> >     a PSMASH being attempted, even if it's not actually needed to
> >     resolve the RMP fault. This doesn't cause any harm, but results in a
> >     needless PSMASH and zapping of the sPTE
> > 
> > Resolve these issues by calling sev_handle_rmp_fault() only when
> > kvm_mmu_page_fault()'s return code is greater than or equal to 0,
> > indicating a KVM_MEMORY_EXIT_FAULT/-EFAULT isn't needed. While here,
> > simplify the code slightly and fix up the associated comments for better
> > clarity.
> > 
> > Fixes: ccc9d836c5c3 ("KVM: SEV: Add support to handle RMP nested page faults")
> > 
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> >  arch/x86/kvm/svm/svm.c | 10 ++++------
> >  1 file changed, 4 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 426ad49325d7..9431ce74c7d4 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -2070,14 +2070,12 @@ static int npf_interception(struct kvm_vcpu *vcpu)
> >  				svm->vmcb->control.insn_len);
> >  
> >  	/*
> > -	 * rc == 0 indicates a userspace exit is needed to handle page
> > -	 * transitions, so do that first before updating the RMP table.
> > +	 * rc < 0 indicates a userspace exit may be needed to handle page
> > +	 * attribute updates, so deal with that first before handling other
> > +	 * potential RMP fault conditions.
> >  	 */
> > -	if (error_code & PFERR_GUEST_RMP_MASK) {
> > -		if (rc == 0)
> > -			return rc;
> > +	if (rc >= 0 && error_code & PFERR_GUEST_RMP_MASK)
> 
> This isn't correct either.  A return of '0' also indiciates "exit to userspace",
> it just doesn't happen with SNP because '0' is returned only when KVM attempts
> emulation, and that too gets short-circuited by svm_check_emulate_instruction().
> 
> And I would honestly drop the comment, KVM's less-than-pleasant 1/0/-errno return
> values overload is ubiquitous enough that it should be relatively self-explanatory.
> 
> Or if you prefer to keep a comment, drop the part that specifically calls out
> attributes updates, because that incorrectly implies that's the _only_ reason
> why KVM checks the return.  But my vote is to drop the comment, because it
> essentially becomes "don't proceed to step 2 if step 1 failed", which kind of
> makes the reader go "well, yeah".

Ok, I think I was just paranoid after missing this. I've gone ahead and
dropped the comment, and hopefully it's now drilled into my head enough
that it's obvious to me now as well :) I've also changed the logic to
skip the extra RMP handling for rc==0 as well (should that ever arise
for any future reason):

  https://github.com/mdroth/linux/commit/0a0ba0d7f7571a31f0abc68acc51f24c2a14a8cf

Thanks!

-Mike

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 21/23] KVM: MMU: Disable fast path for private memslots
  2024-05-10 15:27       ` Michael Roth
@ 2024-05-10 15:59         ` Sean Christopherson
  2024-05-10 17:47           ` Isaku Yamahata
  0 siblings, 1 reply; 49+ messages in thread
From: Sean Christopherson @ 2024-05-10 15:59 UTC (permalink / raw)
  To: Michael Roth
  Cc: Paolo Bonzini, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, papaluri, Isaku Yamahata

On Fri, May 10, 2024, Michael Roth wrote:
> On Fri, May 10, 2024 at 03:50:26PM +0200, Paolo Bonzini wrote:
> > On Fri, May 10, 2024 at 3:47 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > > +      * Since software-protected VMs don't have a notion of a shared vs.
> > > > +      * private that's separate from what KVM is tracking, the above
> > > > +      * KVM_EXIT_MEMORY_FAULT condition wouldn't occur, so avoid the
> > > > +      * special handling for that case for now.
> > >
> > > Very technically, it can occur if userspace _just_ modified the attributes.  And
> > > as I've said multiple times, at least for now, I want to avoid special casing
> > > SW-protected VMs unless it is *absolutely* necessary, because their sole purpose
> > > is to allow testing flows that are impossible to excercise without SNP/TDX hardware.
> > 
> > Yep, it is not like they have to be optimized.
> 
> Ok, I thought there were maybe some future plans to use sw-protected VMs
> to get some added protections from userspace. But even then there'd
> probably still be extra considerations for how to handle access tracking
> so white-listing them probably isn't right anyway.
> 
> I was also partly tempted to take this route because it would cover this
> TDX patch as well:
> 
>   https://lore.kernel.org/lkml/91c797997b57056224571e22362321a23947172f.1705965635.git.isaku.yamahata@intel.com/

Hmm, I'm pretty sure that patch is trying to fix the exact same issue you are
fixing, just in a less precise way.  S-EPT entries only support RWX=0 and RWX=111b,
i.e. it should be impossible to have a write-fault to a present S-EPT entry.

And if TDX is running afoul of this code:

	if (!fault->present)
		return !kvm_ad_enabled();

then KVM should do the sane thing and require A/D support be enabled for TDX.

And if it's something else entirely, that changelog has some explaining to do.

> and avoid any weirdness about checking kvm_mem_is_private() without
> checking mmu_invalidate_seq, but I think those cases all end up
> resolving themselves eventually and added some comments around that.

Yep, checking state that is protected by mmu_invalidate_seq outside of mmu_lock
is definitely allowed, e.g. the entire fast page fault path operates outside of
mmu_lock and thus outside of mmu_invalidate_seq's purview.

It's a-ok because the SPTE are done with an atomic CMPXCHG, and so KVM only needs
to ensure its page tables aren't outright _freed_.  If the zap triggered by the
attributes change "wins", then the fast #PF path will fail the CMPXCHG and be an
expensive NOP.  If the fast #PF wins, the zap will pave over the fast #PF fix,
and the IPI+flush that is needed for all zaps, to ensure vCPUs don't have stale
references, does the rest.

And if there's an attributes race that causes the fast #PF to bail early, the vCPU
will see the correct state on the next page fault.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 22/23] KVM: SEV: Fix return code interpretation for RMP nested page faults
  2024-05-10 13:58     ` Sean Christopherson
  2024-05-10 15:36       ` Michael Roth
@ 2024-05-10 16:01       ` Paolo Bonzini
  2024-05-10 16:37         ` Michael Roth
  1 sibling, 1 reply; 49+ messages in thread
From: Paolo Bonzini @ 2024-05-10 16:01 UTC (permalink / raw)
  To: Sean Christopherson, Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, vkuznets, jmattson,
	luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
	rientjes, dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck,
	sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
	nikunj.dadhania, pankaj.gupta, liam.merwick, papaluri

On 5/10/24 15:58, Sean Christopherson wrote:
> On Thu, May 09, 2024, Michael Roth wrote:
>> The intended logic when handling #NPFs with the RMP bit set (31) is to
>> first check to see if the #NPF requires a shared<->private transition
>> and, if so, to go ahead and let the corresponding KVM_EXIT_MEMORY_FAULT
>> get forwarded on to userspace before proceeding with any handling of
>> other potential RMP fault conditions like needing to PSMASH the RMP
>> entry/etc (which will be done later if the guest still re-faults after
>> the KVM_EXIT_MEMORY_FAULT is processed by userspace).
>>
>> The determination of whether any userspace handling of
>> KVM_EXIT_MEMORY_FAULT is needed is done by interpreting the return code
>> of kvm_mmu_page_fault(). However, the current code misinterprets the
>> return code, expecting 0 to indicate a userspace exit rather than less
>> than 0 (-EFAULT). This leads to the following unexpected behavior:
>>
>>    - for KVM_EXIT_MEMORY_FAULTs resulting for implicit shared->private
>>      conversions, warnings get printed from sev_handle_rmp_fault()
>>      because it does not expect to be called for GPAs where
>>      KVM_MEMORY_ATTRIBUTE_PRIVATE is not set. Standard linux guests don't
>>      generally do this, but it is allowed and should be handled
>>      similarly to private->shared conversions rather than triggering any
>>      sort of warnings
>>
>>    - if gmem support for 2MB folios is enabled (via currently out-of-tree
>>      code), implicit shared<->private conversions will always result in
>>      a PSMASH being attempted, even if it's not actually needed to
>>      resolve the RMP fault. This doesn't cause any harm, but results in a
>>      needless PSMASH and zapping of the sPTE
>>
>> Resolve these issues by calling sev_handle_rmp_fault() only when
>> kvm_mmu_page_fault()'s return code is greater than or equal to 0,
>> indicating a KVM_MEMORY_EXIT_FAULT/-EFAULT isn't needed. While here,
>> simplify the code slightly and fix up the associated comments for better
>> clarity.
>>
>> Fixes: ccc9d836c5c3 ("KVM: SEV: Add support to handle RMP nested page faults")
>>
>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>> ---
>>   arch/x86/kvm/svm/svm.c | 10 ++++------
>>   1 file changed, 4 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index 426ad49325d7..9431ce74c7d4 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -2070,14 +2070,12 @@ static int npf_interception(struct kvm_vcpu *vcpu)
>>   				svm->vmcb->control.insn_len);
>>   
>>   	/*
>> -	 * rc == 0 indicates a userspace exit is needed to handle page
>> -	 * transitions, so do that first before updating the RMP table.
>> +	 * rc < 0 indicates a userspace exit may be needed to handle page
>> +	 * attribute updates, so deal with that first before handling other
>> +	 * potential RMP fault conditions.
>>   	 */
>> -	if (error_code & PFERR_GUEST_RMP_MASK) {
>> -		if (rc == 0)
>> -			return rc;
>> +	if (rc >= 0 && error_code & PFERR_GUEST_RMP_MASK)
> 
> This isn't correct either.  A return of '0' also indiciates "exit to userspace",
> it just doesn't happen with SNP because '0' is returned only when KVM attempts
> emulation, and that too gets short-circuited by svm_check_emulate_instruction().
> 
> And I would honestly drop the comment, KVM's less-than-pleasant 1/0/-errno return
> values overload is ubiquitous enough that it should be relatively self-explanatory.
> 
> Or if you prefer to keep a comment, drop the part that specifically calls out
> attributes updates, because that incorrectly implies that's the _only_ reason
> why KVM checks the return.  But my vote is to drop the comment, because it
> essentially becomes "don't proceed to step 2 if step 1 failed", which kind of
> makes the reader go "well, yeah".

So IIUC you're suggesting

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 426ad49325d7..c39eaeb21981 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2068,16 +2068,11 @@ static int npf_interception(struct kvm_vcpu *vcpu)
  				static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
  				svm->vmcb->control.insn_bytes : NULL,
  				svm->vmcb->control.insn_len);
+	if (rc <= 0)
+		return rc;
  
-	/*
-	 * rc == 0 indicates a userspace exit is needed to handle page
-	 * transitions, so do that first before updating the RMP table.
-	 */
-	if (error_code & PFERR_GUEST_RMP_MASK) {
-		if (rc == 0)
-			return rc;
+	if (error_code & PFERR_GUEST_RMP_MASK)
  		sev_handle_rmp_fault(vcpu, fault_address, error_code);
-	}
  
  	return rc;
  }

?

So, we're... a bit tight for 6.10 to include SNP and that is an
understatement.  My plan is to merge it for 6.11, but do so
immediately after the merge window ends.  In other words, it
is a delay in terms of release but not in terms of time.  I
don't want QEMU and kvm-unit-tests work to be delayed any
further, in particular.

Once we sort out the loose ends of patches 21-23, you could send
it as a pull request.

Paolo


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 22/23] KVM: SEV: Fix return code interpretation for RMP nested page faults
  2024-05-10 16:01       ` Paolo Bonzini
@ 2024-05-10 16:37         ` Michael Roth
  2024-05-10 16:59           ` Paolo Bonzini
  0 siblings, 1 reply; 49+ messages in thread
From: Michael Roth @ 2024-05-10 16:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, kvm, linux-coco, linux-mm, linux-crypto,
	x86, linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa,
	ardb, vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, papaluri

On Fri, May 10, 2024 at 06:01:52PM +0200, Paolo Bonzini wrote:
> On 5/10/24 15:58, Sean Christopherson wrote:
> > On Thu, May 09, 2024, Michael Roth wrote:
> > > The intended logic when handling #NPFs with the RMP bit set (31) is to
> > > first check to see if the #NPF requires a shared<->private transition
> > > and, if so, to go ahead and let the corresponding KVM_EXIT_MEMORY_FAULT
> > > get forwarded on to userspace before proceeding with any handling of
> > > other potential RMP fault conditions like needing to PSMASH the RMP
> > > entry/etc (which will be done later if the guest still re-faults after
> > > the KVM_EXIT_MEMORY_FAULT is processed by userspace).
> > > 
> > > The determination of whether any userspace handling of
> > > KVM_EXIT_MEMORY_FAULT is needed is done by interpreting the return code
> > > of kvm_mmu_page_fault(). However, the current code misinterprets the
> > > return code, expecting 0 to indicate a userspace exit rather than less
> > > than 0 (-EFAULT). This leads to the following unexpected behavior:
> > > 
> > >    - for KVM_EXIT_MEMORY_FAULTs resulting for implicit shared->private
> > >      conversions, warnings get printed from sev_handle_rmp_fault()
> > >      because it does not expect to be called for GPAs where
> > >      KVM_MEMORY_ATTRIBUTE_PRIVATE is not set. Standard linux guests don't
> > >      generally do this, but it is allowed and should be handled
> > >      similarly to private->shared conversions rather than triggering any
> > >      sort of warnings
> > > 
> > >    - if gmem support for 2MB folios is enabled (via currently out-of-tree
> > >      code), implicit shared<->private conversions will always result in
> > >      a PSMASH being attempted, even if it's not actually needed to
> > >      resolve the RMP fault. This doesn't cause any harm, but results in a
> > >      needless PSMASH and zapping of the sPTE
> > > 
> > > Resolve these issues by calling sev_handle_rmp_fault() only when
> > > kvm_mmu_page_fault()'s return code is greater than or equal to 0,
> > > indicating a KVM_MEMORY_EXIT_FAULT/-EFAULT isn't needed. While here,
> > > simplify the code slightly and fix up the associated comments for better
> > > clarity.
> > > 
> > > Fixes: ccc9d836c5c3 ("KVM: SEV: Add support to handle RMP nested page faults")
> > > 
> > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > ---
> > >   arch/x86/kvm/svm/svm.c | 10 ++++------
> > >   1 file changed, 4 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index 426ad49325d7..9431ce74c7d4 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -2070,14 +2070,12 @@ static int npf_interception(struct kvm_vcpu *vcpu)
> > >   				svm->vmcb->control.insn_len);
> > >   	/*
> > > -	 * rc == 0 indicates a userspace exit is needed to handle page
> > > -	 * transitions, so do that first before updating the RMP table.
> > > +	 * rc < 0 indicates a userspace exit may be needed to handle page
> > > +	 * attribute updates, so deal with that first before handling other
> > > +	 * potential RMP fault conditions.
> > >   	 */
> > > -	if (error_code & PFERR_GUEST_RMP_MASK) {
> > > -		if (rc == 0)
> > > -			return rc;
> > > +	if (rc >= 0 && error_code & PFERR_GUEST_RMP_MASK)
> > 
> > This isn't correct either.  A return of '0' also indiciates "exit to userspace",
> > it just doesn't happen with SNP because '0' is returned only when KVM attempts
> > emulation, and that too gets short-circuited by svm_check_emulate_instruction().
> > 
> > And I would honestly drop the comment, KVM's less-than-pleasant 1/0/-errno return
> > values overload is ubiquitous enough that it should be relatively self-explanatory.
> > 
> > Or if you prefer to keep a comment, drop the part that specifically calls out
> > attributes updates, because that incorrectly implies that's the _only_ reason
> > why KVM checks the return.  But my vote is to drop the comment, because it
> > essentially becomes "don't proceed to step 2 if step 1 failed", which kind of
> > makes the reader go "well, yeah".
> 
> So IIUC you're suggesting
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 426ad49325d7..c39eaeb21981 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2068,16 +2068,11 @@ static int npf_interception(struct kvm_vcpu *vcpu)
>  				static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
>  				svm->vmcb->control.insn_bytes : NULL,
>  				svm->vmcb->control.insn_len);
> +	if (rc <= 0)
> +		return rc;
> -	/*
> -	 * rc == 0 indicates a userspace exit is needed to handle page
> -	 * transitions, so do that first before updating the RMP table.
> -	 */
> -	if (error_code & PFERR_GUEST_RMP_MASK) {
> -		if (rc == 0)
> -			return rc;
> +	if (error_code & PFERR_GUEST_RMP_MASK)
>  		sev_handle_rmp_fault(vcpu, fault_address, error_code);
> -	}
>  	return rc;
>  }
> 
> ?
> 
> So, we're... a bit tight for 6.10 to include SNP and that is an
> understatement.  My plan is to merge it for 6.11, but do so
> immediately after the merge window ends.  In other words, it
> is a delay in terms of release but not in terms of time.  I
> don't want QEMU and kvm-unit-tests work to be delayed any
> further, in particular.

That's unfortunate, I'd thought from the PUCK call that we still had
some time to stabilize things before merge window. But whatever you
think is best.

> 
> Once we sort out the loose ends of patches 21-23, you could send
> it as a pull request.

Ok, as a pull request against kvm/next, or kvm/queue?

Thanks,

Mike

> 
> Paolo
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 22/23] KVM: SEV: Fix return code interpretation for RMP nested page faults
  2024-05-10 16:37         ` Michael Roth
@ 2024-05-10 16:59           ` Paolo Bonzini
  2024-05-10 17:25             ` Paolo Bonzini
  2024-05-14  8:10             ` Borislav Petkov
  0 siblings, 2 replies; 49+ messages in thread
From: Paolo Bonzini @ 2024-05-10 16:59 UTC (permalink / raw)
  To: Michael Roth
  Cc: Sean Christopherson, kvm, linux-coco, linux-mm, linux-crypto,
	x86, linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa,
	ardb, vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, papaluri

On 5/10/24 18:37, Michael Roth wrote:
>> So, we're... a bit tight for 6.10 to include SNP and that is an
>> understatement.  My plan is to merge it for 6.11, but do so
>> immediately after the merge window ends.  In other words, it
>> is a delay in terms of release but not in terms of time.  I
>> don't want QEMU and kvm-unit-tests work to be delayed any
>> further, in particular.
>
> That's unfortunate, I'd thought from the PUCK call that we still had
> some time to stabilize things before merge window. But whatever you
> think is best.

Well, the merge window starts next sunday, doesn't it?  If there's an 
-rc8 I agree there's some leeway, but that is not too likely.

>> Once we sort out the loose ends of patches 21-23, you could send
>> it as a pull request.
> Ok, as a pull request against kvm/next, or kvm/queue?

Against kvm/next.

Paolo


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 23/23] KVM: SEV: Fix PSC handling for SMASH/UNSMASH and partial update ops
  2024-05-10  1:58   ` [PATCH v15 23/23] KVM: SEV: Fix PSC handling for SMASH/UNSMASH and partial update ops Michael Roth
@ 2024-05-10 17:09     ` Paolo Bonzini
  2024-05-10 19:08       ` Michael Roth
  0 siblings, 1 reply; 49+ messages in thread
From: Paolo Bonzini @ 2024-05-10 17:09 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, papaluri

On 5/10/24 03:58, Michael Roth wrote:
> There are a few edge-cases that the current processing for GHCB PSC
> requests doesn't handle properly:
> 
>   - KVM properly ignores SMASH/UNSMASH ops when they are embedded in a
>     PSC request buffer which contains other PSC operations, but
>     inadvertantly forwards them to userspace as private->shared PSC
>     requests if they appear at the end of the buffer. Make sure these are
>     ignored instead, just like cases where they are not at the end of the
>     request buffer.
> 
>   - Current code handles non-zero 'cur_page' fields when they are at the
>     beginning of a new GPA range, but it is not handling properly when
>     iterating through subsequent entries which are otherwise part of a
>     contiguous range. Fix up the handling so that these entries are not
>     combined into a larger contiguous range that include unintended GPA
>     ranges and are instead processed later as the start of a new
>     contiguous range.
> 
>   - The page size variable used to track 2M entries in KVM for inflight PSCs
>     might be artifically set to a different value, which can lead to
>     unexpected values in the entry's final 'cur_page' update. Use the
>     entry's 'pagesize' field instead to determine what the value of
>     'cur_page' should be upon completion of processing.
> 
> While here, also add a small helper for clearing in-flight PSCs
> variables and fix up comments for better readability.
> 
> Fixes: 266205d810d2 ("KVM: SEV: Add support to handle Page State Change VMGEXIT")
> Signed-off-by: Michael Roth <michael.roth@amd.com>

There are some more improvements that can be made to the readability of
the code... this one is already better than the patch is fixing up, but I
don't like the code that is in the loop even though it is unconditionally
followed by "break".

Here's my attempt at replacing this patch, which is really more of a
rewrite of the whole function...  Untested beyond compilation.

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 35f0bd91f92e..6e612789c35f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3555,23 +3555,25 @@ struct psc_buffer {
  
  static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc);
  
-static int snp_complete_psc(struct kvm_vcpu *vcpu)
+static void snp_complete_psc(struct vcpu_svm *svm, u64 psc_ret)
+{
+	svm->sev_es.psc_inflight = 0;
+	svm->sev_es.psc_idx = 0;
+	svm->sev_es.psc_2m = 0;
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, psc_ret);
+}
+
+static void __snp_complete_one_psc(struct vcpu_svm *svm)
  {
-	struct vcpu_svm *svm = to_svm(vcpu);
  	struct psc_buffer *psc = svm->sev_es.ghcb_sa;
  	struct psc_entry *entries = psc->entries;
  	struct psc_hdr *hdr = &psc->hdr;
-	__u64 psc_ret;
  	__u16 idx;
  
-	if (vcpu->run->hypercall.ret) {
-		psc_ret = VMGEXIT_PSC_ERROR_GENERIC;
-		goto out_resume;
-	}
-
  	/*
  	 * Everything in-flight has been processed successfully. Update the
-	 * corresponding entries in the guest's PSC buffer.
+	 * corresponding entries in the guest's PSC buffer and zero out the
+	 * count of in-flight PSC entries.
  	 */
  	for (idx = svm->sev_es.psc_idx; svm->sev_es.psc_inflight;
  	     svm->sev_es.psc_inflight--, idx++) {
@@ -3581,17 +3583,22 @@ static int snp_complete_psc(struct kvm_vcpu *vcpu)
  	}
  
  	hdr->cur_entry = idx;
+}
+
+static int snp_complete_one_psc(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	struct psc_buffer *psc = svm->sev_es.ghcb_sa;
+
+	if (vcpu->run->hypercall.ret) {
+		snp_complete_psc(svm, VMGEXIT_PSC_ERROR_GENERIC);
+		return 1; /* resume guest */
+	}
+
+	__snp_complete_one_psc(svm);
  
  	/* Handle the next range (if any). */
  	return snp_begin_psc(svm, psc);
-
-out_resume:
-	svm->sev_es.psc_idx = 0;
-	svm->sev_es.psc_inflight = 0;
-	svm->sev_es.psc_2m = false;
-	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, psc_ret);
-
-	return 1; /* resume guest */
  }
  
  static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
@@ -3601,18 +3608,20 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
  	struct psc_hdr *hdr = &psc->hdr;
  	struct psc_entry entry_start;
  	u16 idx, idx_start, idx_end;
-	__u64 psc_ret, gpa;
+	u64 gfn;
  	int npages;
-
-	/* There should be no other PSCs in-flight at this point. */
-	if (WARN_ON_ONCE(svm->sev_es.psc_inflight)) {
-		psc_ret = VMGEXIT_PSC_ERROR_GENERIC;
-		goto out_resume;
-	}
+	bool huge;
  
  	if (!(vcpu->kvm->arch.hypercall_exit_enabled & (1 << KVM_HC_MAP_GPA_RANGE))) {
-		psc_ret = VMGEXIT_PSC_ERROR_GENERIC;
-		goto out_resume;
+		snp_complete_psc(svm, VMGEXIT_PSC_ERROR_GENERIC);
+		return 1;
+	}
+
+next_range:
+	/* There should be no other PSCs in-flight at this point. */
+	if (WARN_ON_ONCE(svm->sev_es.psc_inflight)) {
+		snp_complete_psc(svm, VMGEXIT_PSC_ERROR_GENERIC);
+		return 1;
  	}
  
  	/*
@@ -3624,97 +3633,99 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
  	idx_end = hdr->end_entry;
  
  	if (idx_end >= VMGEXIT_PSC_MAX_COUNT) {
-		psc_ret = VMGEXIT_PSC_ERROR_INVALID_HDR;
-		goto out_resume;
-	}
-
-	/* Nothing more to process. */
-	if (idx_start > idx_end) {
-		psc_ret = 0;
-		goto out_resume;
+		snp_complete_psc(svm, VMGEXIT_PSC_ERROR_INVALID_HDR);
+		return 1;
  	}
  
  	/* Find the start of the next range which needs processing. */
  	for (idx = idx_start; idx <= idx_end; idx++, hdr->cur_entry++) {
-		__u16 cur_page;
-		gfn_t gfn;
-		bool huge;
-
  		entry_start = entries[idx];
-
-		/* Only private/shared conversions are currently supported. */
-		if (entry_start.operation != VMGEXIT_PSC_OP_PRIVATE &&
-		    entry_start.operation != VMGEXIT_PSC_OP_SHARED)
-			continue;
-
  		gfn = entry_start.gfn;
-		cur_page = entry_start.cur_page;
  		huge = entry_start.pagesize;
+		npages = huge ? 512 : 1;
  
-		if ((huge && (cur_page > 512 || !IS_ALIGNED(gfn, 512))) ||
-		    (!huge && cur_page > 1)) {
-			psc_ret = VMGEXIT_PSC_ERROR_INVALID_ENTRY;
-			goto out_resume;
+		if (entry_start.cur_page > npages || !IS_ALIGNED(gfn, npages)) {
+			snp_complete_psc(svm, VMGEXIT_PSC_ERROR_INVALID_ENTRY);
+			return 1;
  		}
  
+		if (entry_start.cur_page) {
+			/*
+			 * If this is a partially-completed 2M range, force 4K
+			 * handling for the remaining pages since they're effectively
+			 * split at this point. Subsequent code should ensure this
+			 * doesn't get combined with adjacent PSC entries where 2M
+			 * handling is still possible.
+			 */
+			npages -= entry_start.cur_page;
+			gfn += entry_start.cur_page;
+			huge = false;
+		}
+		if (npages)
+			break;
+
  		/* All sub-pages already processed. */
-		if ((huge && cur_page == 512) || (!huge && cur_page == 1))
-			continue;
-
-		/*
-		 * If this is a partially-completed 2M range, force 4K handling
-		 * for the remaining pages since they're effectively split at
-		 * this point. Subsequent code should ensure this doesn't get
-		 * combined with adjacent PSC entries where 2M handling is still
-		 * possible.
-		 */
-		svm->sev_es.psc_2m = cur_page ? false : huge;
-		svm->sev_es.psc_idx = idx;
-		svm->sev_es.psc_inflight = 1;
-
-		gpa = gfn_to_gpa(gfn + cur_page);
-		npages = huge ? 512 - cur_page : 1;
-		break;
  	}
  
+	if (idx > idx_end) {
+		/* Nothing more to process. */
+		snp_complete_psc(svm, 0);
+		return 1;
+	}
+
+	svm->sev_es.psc_2m = huge;
+	svm->sev_es.psc_idx = idx;
+	svm->sev_es.psc_inflight = 1;
+
  	/*
  	 * Find all subsequent PSC entries that contain adjacent GPA
  	 * ranges/operations and can be combined into a single
  	 * KVM_HC_MAP_GPA_RANGE exit.
  	 */
-	for (idx = svm->sev_es.psc_idx + 1; idx <= idx_end; idx++) {
+	while (++idx <= idx_end) {
  		struct psc_entry entry = entries[idx];
  
  		if (entry.operation != entry_start.operation ||
-		    entry.gfn != entry_start.gfn + npages ||
-		    !!entry.pagesize != svm->sev_es.psc_2m)
+		    entry.gfn != gfn + npages ||
+		    entry.cur_page ||
+		    !!entry.pagesize != huge)
  			break;
  
  		svm->sev_es.psc_inflight++;
-		npages += entry_start.pagesize ? 512 : 1;
+		npages += huge ? 512 : 1;
  	}
  
-	vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
-	vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
-	vcpu->run->hypercall.args[0] = gpa;
-	vcpu->run->hypercall.args[1] = npages;
-	vcpu->run->hypercall.args[2] = entry_start.operation == VMGEXIT_PSC_OP_PRIVATE
-				       ? KVM_MAP_GPA_RANGE_ENCRYPTED
-				       : KVM_MAP_GPA_RANGE_DECRYPTED;
-	vcpu->run->hypercall.args[2] |= entry_start.pagesize
-					? KVM_MAP_GPA_RANGE_PAGE_SZ_2M
-					: KVM_MAP_GPA_RANGE_PAGE_SZ_4K;
-	vcpu->arch.complete_userspace_io = snp_complete_psc;
+	switch (entry_start.operation) {
+	case VMGEXIT_PSC_OP_PRIVATE:
+	case VMGEXIT_PSC_OP_SHARED:
+		vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
+		vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
+		vcpu->run->hypercall.args[0] = gfn_to_gpa(gfn);
+		vcpu->run->hypercall.args[1] = npages;
+		vcpu->run->hypercall.args[2] = entry_start.operation == VMGEXIT_PSC_OP_PRIVATE
+			? KVM_MAP_GPA_RANGE_ENCRYPTED
+			: KVM_MAP_GPA_RANGE_DECRYPTED;
+		vcpu->run->hypercall.args[2] |= huge
+			? KVM_MAP_GPA_RANGE_PAGE_SZ_2M
+			: KVM_MAP_GPA_RANGE_PAGE_SZ_4K;
+		vcpu->arch.complete_userspace_io = snp_complete_one_psc;
  
-	return 0; /* forward request to userspace */
+		return 0; /* forward request to userspace */
  
-out_resume:
-	svm->sev_es.psc_idx = 0;
-	svm->sev_es.psc_inflight = 0;
-	svm->sev_es.psc_2m = false;
-	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, psc_ret);
+	default:
+		/*
+		 * Only shared/private PSC operations are currently supported, so if the
+		 * entire range consists of unsupported operations (e.g. SMASH/UNSMASH),
+		 * then consider the entire range completed and avoid exiting to
+		 * userspace. In theory snp_complete_psc() can be called directly
+		 * at this point to complete the current range and start the next one,
+		 * but that could lead to unexpected levels of recursion.
+		 */
+		__snp_complete_one_psc(svm);
+		goto next_range;
+	}
  
-	return 1; /* resume guest */
+	unreachable();
  }
  
  static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 22/23] KVM: SEV: Fix return code interpretation for RMP nested page faults
  2024-05-10 16:59           ` Paolo Bonzini
@ 2024-05-10 17:25             ` Paolo Bonzini
  2024-05-14  8:10             ` Borislav Petkov
  1 sibling, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2024-05-10 17:25 UTC (permalink / raw)
  To: Michael Roth
  Cc: Sean Christopherson, kvm, linux-coco, linux-mm, linux-crypto,
	x86, linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa,
	ardb, vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, papaluri

On Fri, May 10, 2024 at 6:59 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> Well, the merge window starts next sunday, doesn't it?  If there's an
> -rc8 I agree there's some leeway, but that is not too likely.
>
> >> Once we sort out the loose ends of patches 21-23, you could send
> >> it as a pull request.
> > Ok, as a pull request against kvm/next, or kvm/queue?
>
> Against kvm/next.

Ah no, only kvm/queue has the preparatory hooks - they make no sense
without something that uses them.  kvm/queue is ready now.

Also, please send the pull request "QEMU style", i.e. with patches
as replies.

If there's an -rc8, I'll probably pull it on Thursday morning.

Paolo


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 21/23] KVM: MMU: Disable fast path for private memslots
  2024-05-10 15:59         ` Sean Christopherson
@ 2024-05-10 17:47           ` Isaku Yamahata
  0 siblings, 0 replies; 49+ messages in thread
From: Isaku Yamahata @ 2024-05-10 17:47 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Michael Roth, Paolo Bonzini, kvm, linux-coco, linux-mm,
	linux-crypto, x86, linux-kernel, tglx, mingo, jroedel,
	thomas.lendacky, hpa, ardb, vkuznets, jmattson, luto,
	dave.hansen, slp, pgonda, peterz, srinivas.pandruvada, rientjes,
	dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck,
	sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
	nikunj.dadhania, pankaj.gupta, liam.merwick, papaluri,
	Isaku Yamahata, isaku.yamahata, rick.p.edgecombe

On Fri, May 10, 2024 at 08:59:09AM -0700,
Sean Christopherson <seanjc@google.com> wrote:

> On Fri, May 10, 2024, Michael Roth wrote:
> > On Fri, May 10, 2024 at 03:50:26PM +0200, Paolo Bonzini wrote:
> > > On Fri, May 10, 2024 at 3:47 PM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > > +      * Since software-protected VMs don't have a notion of a shared vs.
> > > > > +      * private that's separate from what KVM is tracking, the above
> > > > > +      * KVM_EXIT_MEMORY_FAULT condition wouldn't occur, so avoid the
> > > > > +      * special handling for that case for now.
> > > >
> > > > Very technically, it can occur if userspace _just_ modified the attributes.  And
> > > > as I've said multiple times, at least for now, I want to avoid special casing
> > > > SW-protected VMs unless it is *absolutely* necessary, because their sole purpose
> > > > is to allow testing flows that are impossible to excercise without SNP/TDX hardware.
> > > 
> > > Yep, it is not like they have to be optimized.
> > 
> > Ok, I thought there were maybe some future plans to use sw-protected VMs
> > to get some added protections from userspace. But even then there'd
> > probably still be extra considerations for how to handle access tracking
> > so white-listing them probably isn't right anyway.
> > 
> > I was also partly tempted to take this route because it would cover this
> > TDX patch as well:
> > 
> >   https://lore.kernel.org/lkml/91c797997b57056224571e22362321a23947172f.1705965635.git.isaku.yamahata@intel.com/
> 
> Hmm, I'm pretty sure that patch is trying to fix the exact same issue you are
> fixing, just in a less precise way.  S-EPT entries only support RWX=0 and RWX=111b,
> i.e. it should be impossible to have a write-fault to a present S-EPT entry.
> 
> And if TDX is running afoul of this code:
> 
> 	if (!fault->present)
> 		return !kvm_ad_enabled();
> 
> then KVM should do the sane thing and require A/D support be enabled for TDX.
> 
> And if it's something else entirely, that changelog has some explaining to do.

Yes, it's for KVM_EXIT_MEMORY_FAULT case.  Because Secure-EPT has non-present or
all RWX allowed, fast page fault always returns RET_PF_INVALID by
is_shadow_present_pte() check.

I lightly tested the patch at [1] and it works for TDX KVM.

[1] https://github.com/mdroth/linux/commit/39643f9f6da6265d39d633a703c53997985c1208

Just in case for that patch,
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
-- 
Isaku Yamahata <isaku.yamahata@intel.com>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 23/23] KVM: SEV: Fix PSC handling for SMASH/UNSMASH and partial update ops
  2024-05-10 17:09     ` Paolo Bonzini
@ 2024-05-10 19:08       ` Michael Roth
  0 siblings, 0 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-10 19:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, papaluri

On Fri, May 10, 2024 at 07:09:07PM +0200, Paolo Bonzini wrote:
> On 5/10/24 03:58, Michael Roth wrote:
> > There are a few edge-cases that the current processing for GHCB PSC
> > requests doesn't handle properly:
> > 
> >   - KVM properly ignores SMASH/UNSMASH ops when they are embedded in a
> >     PSC request buffer which contains other PSC operations, but
> >     inadvertantly forwards them to userspace as private->shared PSC
> >     requests if they appear at the end of the buffer. Make sure these are
> >     ignored instead, just like cases where they are not at the end of the
> >     request buffer.
> > 
> >   - Current code handles non-zero 'cur_page' fields when they are at the
> >     beginning of a new GPA range, but it is not handling properly when
> >     iterating through subsequent entries which are otherwise part of a
> >     contiguous range. Fix up the handling so that these entries are not
> >     combined into a larger contiguous range that include unintended GPA
> >     ranges and are instead processed later as the start of a new
> >     contiguous range.
> > 
> >   - The page size variable used to track 2M entries in KVM for inflight PSCs
> >     might be artifically set to a different value, which can lead to
> >     unexpected values in the entry's final 'cur_page' update. Use the
> >     entry's 'pagesize' field instead to determine what the value of
> >     'cur_page' should be upon completion of processing.
> > 
> > While here, also add a small helper for clearing in-flight PSCs
> > variables and fix up comments for better readability.
> > 
> > Fixes: 266205d810d2 ("KVM: SEV: Add support to handle Page State Change VMGEXIT")
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> 
> There are some more improvements that can be made to the readability of
> the code... this one is already better than the patch is fixing up, but I
> don't like the code that is in the loop even though it is unconditionally
> followed by "break".
> 
> Here's my attempt at replacing this patch, which is really more of a
> rewrite of the whole function...  Untested beyond compilation.

Thanks for the suggested rework. I tested with/without 2MB pages and
everything worked as-written. This is the full/squashed patch I plan to
include in the pull request:

  https://github.com/mdroth/linux/commit/91f6d31c4dfc88dd1ac378e2db6117b0c982e63c

-Mike

> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 35f0bd91f92e..6e612789c35f 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3555,23 +3555,25 @@ struct psc_buffer {
>  static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc);
> -static int snp_complete_psc(struct kvm_vcpu *vcpu)
> +static void snp_complete_psc(struct vcpu_svm *svm, u64 psc_ret)
> +{
> +	svm->sev_es.psc_inflight = 0;
> +	svm->sev_es.psc_idx = 0;
> +	svm->sev_es.psc_2m = 0;
> +	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, psc_ret);
> +}
> +
> +static void __snp_complete_one_psc(struct vcpu_svm *svm)
>  {
> -	struct vcpu_svm *svm = to_svm(vcpu);
>  	struct psc_buffer *psc = svm->sev_es.ghcb_sa;
>  	struct psc_entry *entries = psc->entries;
>  	struct psc_hdr *hdr = &psc->hdr;
> -	__u64 psc_ret;
>  	__u16 idx;
> -	if (vcpu->run->hypercall.ret) {
> -		psc_ret = VMGEXIT_PSC_ERROR_GENERIC;
> -		goto out_resume;
> -	}
> -
>  	/*
>  	 * Everything in-flight has been processed successfully. Update the
> -	 * corresponding entries in the guest's PSC buffer.
> +	 * corresponding entries in the guest's PSC buffer and zero out the
> +	 * count of in-flight PSC entries.
>  	 */
>  	for (idx = svm->sev_es.psc_idx; svm->sev_es.psc_inflight;
>  	     svm->sev_es.psc_inflight--, idx++) {
> @@ -3581,17 +3583,22 @@ static int snp_complete_psc(struct kvm_vcpu *vcpu)
>  	}
>  	hdr->cur_entry = idx;
> +}
> +
> +static int snp_complete_one_psc(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +	struct psc_buffer *psc = svm->sev_es.ghcb_sa;
> +
> +	if (vcpu->run->hypercall.ret) {
> +		snp_complete_psc(svm, VMGEXIT_PSC_ERROR_GENERIC);
> +		return 1; /* resume guest */
> +	}
> +
> +	__snp_complete_one_psc(svm);
>  	/* Handle the next range (if any). */
>  	return snp_begin_psc(svm, psc);
> -
> -out_resume:
> -	svm->sev_es.psc_idx = 0;
> -	svm->sev_es.psc_inflight = 0;
> -	svm->sev_es.psc_2m = false;
> -	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, psc_ret);
> -
> -	return 1; /* resume guest */
>  }
>  static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
> @@ -3601,18 +3608,20 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
>  	struct psc_hdr *hdr = &psc->hdr;
>  	struct psc_entry entry_start;
>  	u16 idx, idx_start, idx_end;
> -	__u64 psc_ret, gpa;
> +	u64 gfn;
>  	int npages;
> -
> -	/* There should be no other PSCs in-flight at this point. */
> -	if (WARN_ON_ONCE(svm->sev_es.psc_inflight)) {
> -		psc_ret = VMGEXIT_PSC_ERROR_GENERIC;
> -		goto out_resume;
> -	}
> +	bool huge;
>  	if (!(vcpu->kvm->arch.hypercall_exit_enabled & (1 << KVM_HC_MAP_GPA_RANGE))) {
> -		psc_ret = VMGEXIT_PSC_ERROR_GENERIC;
> -		goto out_resume;
> +		snp_complete_psc(svm, VMGEXIT_PSC_ERROR_GENERIC);
> +		return 1;
> +	}
> +
> +next_range:
> +	/* There should be no other PSCs in-flight at this point. */
> +	if (WARN_ON_ONCE(svm->sev_es.psc_inflight)) {
> +		snp_complete_psc(svm, VMGEXIT_PSC_ERROR_GENERIC);
> +		return 1;
>  	}
>  	/*
> @@ -3624,97 +3633,99 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
>  	idx_end = hdr->end_entry;
>  	if (idx_end >= VMGEXIT_PSC_MAX_COUNT) {
> -		psc_ret = VMGEXIT_PSC_ERROR_INVALID_HDR;
> -		goto out_resume;
> -	}
> -
> -	/* Nothing more to process. */
> -	if (idx_start > idx_end) {
> -		psc_ret = 0;
> -		goto out_resume;
> +		snp_complete_psc(svm, VMGEXIT_PSC_ERROR_INVALID_HDR);
> +		return 1;
>  	}
>  	/* Find the start of the next range which needs processing. */
>  	for (idx = idx_start; idx <= idx_end; idx++, hdr->cur_entry++) {
> -		__u16 cur_page;
> -		gfn_t gfn;
> -		bool huge;
> -
>  		entry_start = entries[idx];
> -
> -		/* Only private/shared conversions are currently supported. */
> -		if (entry_start.operation != VMGEXIT_PSC_OP_PRIVATE &&
> -		    entry_start.operation != VMGEXIT_PSC_OP_SHARED)
> -			continue;
> -
>  		gfn = entry_start.gfn;
> -		cur_page = entry_start.cur_page;
>  		huge = entry_start.pagesize;
> +		npages = huge ? 512 : 1;
> -		if ((huge && (cur_page > 512 || !IS_ALIGNED(gfn, 512))) ||
> -		    (!huge && cur_page > 1)) {
> -			psc_ret = VMGEXIT_PSC_ERROR_INVALID_ENTRY;
> -			goto out_resume;
> +		if (entry_start.cur_page > npages || !IS_ALIGNED(gfn, npages)) {
> +			snp_complete_psc(svm, VMGEXIT_PSC_ERROR_INVALID_ENTRY);
> +			return 1;
>  		}
> +		if (entry_start.cur_page) {
> +			/*
> +			 * If this is a partially-completed 2M range, force 4K
> +			 * handling for the remaining pages since they're effectively
> +			 * split at this point. Subsequent code should ensure this
> +			 * doesn't get combined with adjacent PSC entries where 2M
> +			 * handling is still possible.
> +			 */
> +			npages -= entry_start.cur_page;
> +			gfn += entry_start.cur_page;
> +			huge = false;
> +		}
> +		if (npages)
> +			break;
> +
>  		/* All sub-pages already processed. */
> -		if ((huge && cur_page == 512) || (!huge && cur_page == 1))
> -			continue;
> -
> -		/*
> -		 * If this is a partially-completed 2M range, force 4K handling
> -		 * for the remaining pages since they're effectively split at
> -		 * this point. Subsequent code should ensure this doesn't get
> -		 * combined with adjacent PSC entries where 2M handling is still
> -		 * possible.
> -		 */
> -		svm->sev_es.psc_2m = cur_page ? false : huge;
> -		svm->sev_es.psc_idx = idx;
> -		svm->sev_es.psc_inflight = 1;
> -
> -		gpa = gfn_to_gpa(gfn + cur_page);
> -		npages = huge ? 512 - cur_page : 1;
> -		break;
>  	}
> +	if (idx > idx_end) {
> +		/* Nothing more to process. */
> +		snp_complete_psc(svm, 0);
> +		return 1;
> +	}
> +
> +	svm->sev_es.psc_2m = huge;
> +	svm->sev_es.psc_idx = idx;
> +	svm->sev_es.psc_inflight = 1;
> +
>  	/*
>  	 * Find all subsequent PSC entries that contain adjacent GPA
>  	 * ranges/operations and can be combined into a single
>  	 * KVM_HC_MAP_GPA_RANGE exit.
>  	 */
> -	for (idx = svm->sev_es.psc_idx + 1; idx <= idx_end; idx++) {
> +	while (++idx <= idx_end) {
>  		struct psc_entry entry = entries[idx];
>  		if (entry.operation != entry_start.operation ||
> -		    entry.gfn != entry_start.gfn + npages ||
> -		    !!entry.pagesize != svm->sev_es.psc_2m)
> +		    entry.gfn != gfn + npages ||
> +		    entry.cur_page ||
> +		    !!entry.pagesize != huge)
>  			break;
>  		svm->sev_es.psc_inflight++;
> -		npages += entry_start.pagesize ? 512 : 1;
> +		npages += huge ? 512 : 1;
>  	}
> -	vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
> -	vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
> -	vcpu->run->hypercall.args[0] = gpa;
> -	vcpu->run->hypercall.args[1] = npages;
> -	vcpu->run->hypercall.args[2] = entry_start.operation == VMGEXIT_PSC_OP_PRIVATE
> -				       ? KVM_MAP_GPA_RANGE_ENCRYPTED
> -				       : KVM_MAP_GPA_RANGE_DECRYPTED;
> -	vcpu->run->hypercall.args[2] |= entry_start.pagesize
> -					? KVM_MAP_GPA_RANGE_PAGE_SZ_2M
> -					: KVM_MAP_GPA_RANGE_PAGE_SZ_4K;
> -	vcpu->arch.complete_userspace_io = snp_complete_psc;
> +	switch (entry_start.operation) {
> +	case VMGEXIT_PSC_OP_PRIVATE:
> +	case VMGEXIT_PSC_OP_SHARED:
> +		vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
> +		vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
> +		vcpu->run->hypercall.args[0] = gfn_to_gpa(gfn);
> +		vcpu->run->hypercall.args[1] = npages;
> +		vcpu->run->hypercall.args[2] = entry_start.operation == VMGEXIT_PSC_OP_PRIVATE
> +			? KVM_MAP_GPA_RANGE_ENCRYPTED
> +			: KVM_MAP_GPA_RANGE_DECRYPTED;
> +		vcpu->run->hypercall.args[2] |= huge
> +			? KVM_MAP_GPA_RANGE_PAGE_SZ_2M
> +			: KVM_MAP_GPA_RANGE_PAGE_SZ_4K;
> +		vcpu->arch.complete_userspace_io = snp_complete_one_psc;
> -	return 0; /* forward request to userspace */
> +		return 0; /* forward request to userspace */
> -out_resume:
> -	svm->sev_es.psc_idx = 0;
> -	svm->sev_es.psc_inflight = 0;
> -	svm->sev_es.psc_2m = false;
> -	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, psc_ret);
> +	default:
> +		/*
> +		 * Only shared/private PSC operations are currently supported, so if the
> +		 * entire range consists of unsupported operations (e.g. SMASH/UNSMASH),
> +		 * then consider the entire range completed and avoid exiting to
> +		 * userspace. In theory snp_complete_psc() can be called directly
> +		 * at this point to complete the current range and start the next one,
> +		 * but that could lead to unexpected levels of recursion.
> +		 */
> +		__snp_complete_one_psc(svm);
> +		goto next_range;
> +	}
> -	return 1; /* resume guest */
> +	unreachable();
>  }
>  static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 19/20] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
  2024-05-01  8:52 ` [PATCH v15 19/20] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST " Michael Roth
@ 2024-05-13 23:48   ` Sean Christopherson
  2024-05-14  2:51     ` Michael Roth
  0 siblings, 1 reply; 49+ messages in thread
From: Sean Christopherson @ 2024-05-13 23:48 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

On Wed, May 01, 2024, Michael Roth wrote:
> Version 2 of GHCB specification added support for the SNP Extended Guest
> Request Message NAE event. This event serves a nearly identical purpose
> to the previously-added SNP_GUEST_REQUEST event, but allows for
> additional certificate data to be supplied via an additional
> guest-supplied buffer to be used mainly for verifying the signature of
> an attestation report as returned by firmware.
> 
> This certificate data is supplied by userspace, so unlike with
> SNP_GUEST_REQUEST events, SNP_EXTENDED_GUEST_REQUEST events are first
> forwarded to userspace via a KVM_EXIT_VMGEXIT exit structure, and then
> the firmware request is made after the certificate data has been fetched
> from userspace.
> 
> Since there is a potential for race conditions where the
> userspace-supplied certificate data may be out-of-sync relative to the
> reported TCB or VLEK that firmware will use when signing attestation
> reports, a hook is also provided so that userspace can be informed once
> the attestation request is actually completed. See the updates to
> Documentation/ for more details on these aspects.
> 
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  Documentation/virt/kvm/api.rst | 87 ++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/svm/sev.c         | 86 +++++++++++++++++++++++++++++++++
>  arch/x86/kvm/svm/svm.h         |  3 ++
>  include/uapi/linux/kvm.h       | 23 +++++++++
>  4 files changed, 199 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index f0b76ff5030d..f3780ac98d56 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -7060,6 +7060,93 @@ Please note that the kernel is allowed to use the kvm_run structure as the
>  primary storage for certain register types. Therefore, the kernel may use the
>  values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
>  
> +::
> +
> +		/* KVM_EXIT_VMGEXIT */
> +		struct kvm_user_vmgexit {

LOL, it looks dumb, but maybe kvm_vmgexit_exit to avoid confusing about whether
the struct refers to host userspace vs. guest userspace?

Actually, I vote to punt on naming until more exits need to be kicked to userspace,
and just do (see below for details on how I got here):

		/* KVM_EXIT_VMGEXIT */
		struct {
			__u64 exit_code;
			union {
				struct {
					__u64 data_gpa;
					__u64 data_npages;
					__u64 ret;
				} req_certs;
			};
		} vmgexit;

> +  #define KVM_USER_VMGEXIT_REQ_CERTS		1
> +			__u32 type; /* KVM_USER_VMGEXIT_* type */

Regardless of whether or not requesting a certificate is vendor specific enough
to justify its own exit reason, I don't think KVM should have a #VMGEXIT that
adds its own layer.  Structuring the user exit this way will make it weird and/or
difficult to handle #VMGEXITs that _do_ fit a generic pattern, e.g. a user might
wonder why PSC #VMGEXITs don't show up here.

And defining an exit reason that is, for all intents and purposes, a regurgitation
of the raw #VMGEXIT reason, but with a different value, is also confusing.  E.g.
it wouldn't be unreasonable for a reader to expect that "type" matches the value
defined in the GHCB (or whever the values are defined).

Ah, you copied what KVM does for Hyper-V and Xen emulation.  Hrm.  But only
partially.

Assuming it's impractical to have a generic user exit for this, and we think
there is a high likelihood of needing to punt more #VMGEXITs to userspace, then
we should more closely (perhaps even exactly) follow the Hyper-V and Xen models.
I.e. for all values and whanot that are controlled/defined by a third party
(Hyper-V, Xen, the GHCB, etc.) #define those values in a header that is clearly
"owned" by the third party.

E.g. IIRC, include/xen/interface/xen.h is copied verbatim from Xen documentation
(source?).  And include/asm-generic/hyperv-tlfs.h is the kernel's copy of the
TLFS, which dictates all of the Hyper-V hypercalls.

If we do that, then my concerns/objections largely go away, e.g. KVM isn't
defining magic values, there's less chance for confusion about what "type" holds,
etc.

Oh, and if we go that route, the sizes for all fields should follow the GHCB,
e.g. I believe the "type" should be a __u64.

> +			union {
> +				struct {
> +					__u64 data_gpa;
> +					__u64 data_npages;
> +  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_INVALID_LEN   1
> +  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_BUSY          2
> +  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_GENERIC       (1 << 31)

Hopefully it won't matter, but are BUSY and GENERIC actually defined somewhere?
I don't see them in GHCB 2.0.

In a perfect world, it would be nice for KVM to not have to care about the error
codes.  But KVM disallows KVM_{G,S}ET_REGS for guest with protected state, which
means it's not feasible for userspace to set registers, at least not in any sane
way.

Heh, we could abuse KVM_SYNC_X86_REGS to let userspace specify RBX, but (a) that's
gross, and (b) KVM_SYNC_X86_REGS and KVM_SYNC_X86_SREGS really ought to be rejected
if guest state is protected.

> +					__u32 ret;
> +  #define KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE	BIT(0)

This has no business being buried in a VMGEXIT_REQ_CERTS flags.  Notifying
userspace that KVM completed its portion of a userspace exit is completely generic.

And aside from where the notification flag lives, _if_ we add a notification
mechanism, it belongs in a separate patch, because it's purely a performance
optimization.  Userspace can use immediate_exit to force KVM to re-exit after
completing an exit.

Actually, I take that back, this isn't even an optimization, it's literally a
non-generic implementation of kvm_run.immediate_exit.

If this were an optimization, i.e. KVM truly notified userspace without exiting,
then it would need to be a lot more robust, e.g. to ensure userspace actually
received the notification before KVM moved on.

> +					__u8 flags;
> +  #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING	0
> +  #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE		1

This is also a weird reimplementation of generic functionality.  KVM nullifies
vcpu->arch.complete_userspace_io _before_ invoking the callback.  So if a callback
needs to run again on the next KVM_RUN, it can simply set complete_userspace_io
again.  In other words, literally doing nothing will get you what you want :-)

> +					__u8 status;
> +				} req_certs;
> +			};
> +		};

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 19/20] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
  2024-05-13 23:48   ` Sean Christopherson
@ 2024-05-14  2:51     ` Michael Roth
  2024-05-14 14:36       ` Sean Christopherson
  0 siblings, 1 reply; 49+ messages in thread
From: Michael Roth @ 2024-05-14  2:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

On Mon, May 13, 2024 at 04:48:25PM -0700, Sean Christopherson wrote:
> On Wed, May 01, 2024, Michael Roth wrote:
> > Version 2 of GHCB specification added support for the SNP Extended Guest
> > Request Message NAE event. This event serves a nearly identical purpose
> > to the previously-added SNP_GUEST_REQUEST event, but allows for
> > additional certificate data to be supplied via an additional
> > guest-supplied buffer to be used mainly for verifying the signature of
> > an attestation report as returned by firmware.
> > 
> > This certificate data is supplied by userspace, so unlike with
> > SNP_GUEST_REQUEST events, SNP_EXTENDED_GUEST_REQUEST events are first
> > forwarded to userspace via a KVM_EXIT_VMGEXIT exit structure, and then
> > the firmware request is made after the certificate data has been fetched
> > from userspace.
> > 
> > Since there is a potential for race conditions where the
> > userspace-supplied certificate data may be out-of-sync relative to the
> > reported TCB or VLEK that firmware will use when signing attestation
> > reports, a hook is also provided so that userspace can be informed once
> > the attestation request is actually completed. See the updates to
> > Documentation/ for more details on these aspects.
> > 
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> >  Documentation/virt/kvm/api.rst | 87 ++++++++++++++++++++++++++++++++++
> >  arch/x86/kvm/svm/sev.c         | 86 +++++++++++++++++++++++++++++++++
> >  arch/x86/kvm/svm/svm.h         |  3 ++
> >  include/uapi/linux/kvm.h       | 23 +++++++++
> >  4 files changed, 199 insertions(+)
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index f0b76ff5030d..f3780ac98d56 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -7060,6 +7060,93 @@ Please note that the kernel is allowed to use the kvm_run structure as the
> >  primary storage for certain register types. Therefore, the kernel may use the
> >  values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
> >  
> > +::
> > +
> > +		/* KVM_EXIT_VMGEXIT */
> > +		struct kvm_user_vmgexit {
> 
> LOL, it looks dumb, but maybe kvm_vmgexit_exit to avoid confusing about whether
> the struct refers to host userspace vs. guest userspace?
> 
> Actually, I vote to punt on naming until more exits need to be kicked to userspace,
> and just do (see below for details on how I got here):
> 
> 		/* KVM_EXIT_VMGEXIT */
> 		struct {
> 			__u64 exit_code;
> 			union {
> 				struct {
> 					__u64 data_gpa;
> 					__u64 data_npages;
> 					__u64 ret;
> 				} req_certs;
> 			};
> 		} vmgexit;
> 
> > +  #define KVM_USER_VMGEXIT_REQ_CERTS		1
> > +			__u32 type; /* KVM_USER_VMGEXIT_* type */
> 
> Regardless of whether or not requesting a certificate is vendor specific enough
> to justify its own exit reason, I don't think KVM should have a #VMGEXIT that
> adds its own layer.  Structuring the user exit this way will make it weird and/or
> difficult to handle #VMGEXITs that _do_ fit a generic pattern, e.g. a user might
> wonder why PSC #VMGEXITs don't show up here.
> 
> And defining an exit reason that is, for all intents and purposes, a regurgitation
> of the raw #VMGEXIT reason, but with a different value, is also confusing.  E.g.
> it wouldn't be unreasonable for a reader to expect that "type" matches the value
> defined in the GHCB (or whever the values are defined).

The type in this case is actually "extended guest request". You'd rightly
pointed out that that is miles away from describing what KVM wants
userspace to do, so I named it "request certificate". And now with PSC being
handled as seperate KVM_HC_MAP_GPA_RANGE event with no exposure of GHCB/etc
to userspace, it made further sense to not lean too heavily on the GHCB for
defining the types.

But continuing to name it KVM_EXIT_VMGEXIT sort of goes against that
decoupling, so I can see some potential for confusion there. KVM_EXIT_SNP is
probably a better generic name for what this exit is meant to cover. But I'm
not aware of anything specific that would involve requiring extending this in
the near-term, though maybe there's some potential with live migration. So a
renaming to something more generic and less specific to VMGEXIT/GHCB,
like KVM_EXIT_SNP, or something more specific like KVM_EXIT_SNP_REQ_CERTS,
both seem warranted, but I don't think moving to something more coupled to
VMGEXIT/GHCB would provide much benefit long-term.

> 
> Ah, you copied what KVM does for Hyper-V and Xen emulation.  Hrm.  But only
> partially.
> 
> Assuming it's impractical to have a generic user exit for this, and we think
> there is a high likelihood of needing to punt more #VMGEXITs to userspace, then
> we should more closely (perhaps even exactly) follow the Hyper-V and Xen models.
> I.e. for all values and whanot that are controlled/defined by a third party
> (Hyper-V, Xen, the GHCB, etc.) #define those values in a header that is clearly
> "owned" by the third party.
> 
> E.g. IIRC, include/xen/interface/xen.h is copied verbatim from Xen documentation
> (source?).  And include/asm-generic/hyperv-tlfs.h is the kernel's copy of the
> TLFS, which dictates all of the Hyper-V hypercalls.
> 
> If we do that, then my concerns/objections largely go away, e.g. KVM isn't
> defining magic values, there's less chance for confusion about what "type" holds,
> etc.
> 
> Oh, and if we go that route, the sizes for all fields should follow the GHCB,
> e.g. I believe the "type" should be a __u64.
> 
> > +			union {
> > +				struct {
> > +					__u64 data_gpa;
> > +					__u64 data_npages;
> > +  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_INVALID_LEN   1
> > +  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_BUSY          2
> > +  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_GENERIC       (1 << 31)
> 
> Hopefully it won't matter, but are BUSY and GENERIC actually defined somewhere?
> I don't see them in GHCB 2.0.

BUSY is defined in 4.1.7:

  It is not expected that a guest would issue many Guest Request NAE
  events. However, access to the SNP firmware is a sequential and
  synchronous operation. To avoid the possibility of a guest creating a
  denial-of-service attack against the SNP firmware, it is recommended
  that some form of rate limiting be implemented should it be detected
  that a high number of Guest Request NAE events are being issued. To
  allow for this, the hypervisor may set the SW_EXITINFO2 field to
  0x0000000200000000, which will inform the guest to retry the request

INVALID_LEN in 4.1.8.1:

  The hypervisor must validate that the guest has supplied enough pages
  to hold the certificates that will be returned before performing the SNP
  guest request. If there are not enough guest pages to hold the certificate
  table and certificate data, the hypervisor will return the required number
  of pages needed to hold the certificate table and certificate data in the
  RBX register and set the SW_EXITINFO2 field to 0x0000000100000000.

and GENERIC chosen to provide an non-zero error code that doesn't
conflict with that above (or future) GHCB-defined values. But KVM isn't
trying to expose the actual GHCB details, like how these values are to be in
the upper 32-bits of SW_EXITINFO2, it just re-uses the values to avoid
purposefully obfuscating the GHCB return codes they relate to.

> 
> In a perfect world, it would be nice for KVM to not have to care about the error
> codes.  But KVM disallows KVM_{G,S}ET_REGS for guest with protected state, which
> means it's not feasible for userspace to set registers, at least not in any sane
> way.
> 
> Heh, we could abuse KVM_SYNC_X86_REGS to let userspace specify RBX, but (a) that's
> gross, and (b) KVM_SYNC_X86_REGS and KVM_SYNC_X86_SREGS really ought to be rejected
> if guest state is protected.
> 
> > +					__u32 ret;
> > +  #define KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE	BIT(0)
> 
> This has no business being buried in a VMGEXIT_REQ_CERTS flags.  Notifying
> userspace that KVM completed its portion of a userspace exit is completely generic.
> 
> And aside from where the notification flag lives, _if_ we add a notification
> mechanism, it belongs in a separate patch, because it's purely a performance
> optimization.  Userspace can use immediate_exit to force KVM to re-exit after
> completing an exit.
> 
> Actually, I take that back, this isn't even an optimization, it's literally a
> non-generic implementation of kvm_run.immediate_exit.

Relying on a generic -EINTR response resulting from kvm_run.immediate_exit
doesn't seem like a very robust way to ensure the attestation request
was made to firmware. It seems fully possible that future code changes
could result in EINTR being returned for other reasons. So how do you
reliably detect that the EINTR is a result of immediate_exit being called
after the attestation request is made to firmware? We could squirrel something
away in struct kvm_run to probe for, but delivering another
KVM_EXIT_SNP_REQ_CERT with an extra flag set seems to be reasonably
userspace-friendly.

> 
> If this were an optimization, i.e. KVM truly notified userspace without exiting,
> then it would need to be a lot more robust, e.g. to ensure userspace actually
> received the notification before KVM moved on.

Right, this does rely on exiting via , not userspace polling for flags or
anything along that line.

> 
> > +					__u8 flags;
> > +  #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING	0
> > +  #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE		1
> 
> This is also a weird reimplementation of generic functionality.  KVM nullifies
> vcpu->arch.complete_userspace_io _before_ invoking the callback.  So if a callback
> needs to run again on the next KVM_RUN, it can simply set complete_userspace_io
> again.  In other words, literally doing nothing will get you what you want :-)

We could just have the completion callback set complete_userspace_io
again, but then you'd always get 2 userspace exit events per attestation
request. There could be some userspaces that don't implement the
file-locking scheme, in which case they wouldn't need the 2nd notification.
That's why the KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE flag is provided
as an opt-in.

The pending/done status bits are so userspace can distinguish between the
start of a certificate request and the completion side of it after it gets
bound a completed attestation request and the filelock can be released.

Thanks,

Mike

> 
> > +					__u8 status;
> > +				} req_certs;
> > +			};
> > +		};

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 22/23] KVM: SEV: Fix return code interpretation for RMP nested page faults
  2024-05-10 16:59           ` Paolo Bonzini
  2024-05-10 17:25             ` Paolo Bonzini
@ 2024-05-14  8:10             ` Borislav Petkov
  1 sibling, 0 replies; 49+ messages in thread
From: Borislav Petkov @ 2024-05-14  8:10 UTC (permalink / raw)
  To: Paolo Bonzini, Michael Roth
  Cc: Sean Christopherson, kvm, linux-coco, linux-mm, linux-crypto,
	x86, linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa,
	ardb, vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
	ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	papaluri

On May 10, 2024 6:59:37 PM GMT+02:00, Paolo Bonzini <pbonzini@redhat.com> wrote:
>Well, the merge window starts next sunday, doesn't it?  If there's an -rc8 I agree there's some leeway, but that is not too likely.

Nah, the merge window just opened yesterday. 

-- 
Sent from a small device: formatting sucks and brevity is inevitable.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 19/20] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
  2024-05-14  2:51     ` Michael Roth
@ 2024-05-14 14:36       ` Sean Christopherson
  2024-05-15  1:25         ` [PATCH] KVM: SEV: Replace KVM_EXIT_VMGEXIT with KVM_EXIT_SNP_REQ_CERTS Michael Roth
  0 siblings, 1 reply; 49+ messages in thread
From: Sean Christopherson @ 2024-05-14 14:36 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

On Mon, May 13, 2024, Michael Roth wrote:
> On Mon, May 13, 2024 at 04:48:25PM -0700, Sean Christopherson wrote:
> > Actually, I take that back, this isn't even an optimization, it's literally a
> > non-generic implementation of kvm_run.immediate_exit.
> 
> Relying on a generic -EINTR response resulting from kvm_run.immediate_exit
> doesn't seem like a very robust way to ensure the attestation request
> was made to firmware. It seems fully possible that future code changes
> could result in EINTR being returned for other reasons. So how do you
> reliably detect that the EINTR is a result of immediate_exit being called
> after the attestation request is made to firmware? We could squirrel something
> away in struct kvm_run to probe for, but delivering another
> KVM_EXIT_SNP_REQ_CERT with an extra flag set seems to be reasonably
> userspace-friendly.

And unnecessarily specific to a single exit.  But it's a non-issue (except
possibly on ARM).

I doubt it's formally documented anywhere, but userspace absolutely relies on
kvm_run.immediate_exit to be processed _after_ complete_userspace_io().  If KVM
exits with -EINTR before invoking cui(), live migration will break due to taking
a snapshot of vCPU state in the middle of an instruction.

Given that userspace has likely built up rigid expectations for immediate_exit,
I don't see any problem formally documenting KVM's behavior, i.e. signing a
contract guaranteeing that KVM will complete the "back half" of emulation if
immediate_exit is set and KVM_RUN return -EINTR.

ARM is the only arch that is at all suspect, due to its rather massive
kvm_arch_vcpu_run_pid_change() hook.  At a quick glance, it seems to be ok, too.
And if it's not, we need to fix that asap, because it's like a bug waiting to
happen.

> > If this were an optimization, i.e. KVM truly notified userspace without exiting,
> > then it would need to be a lot more robust, e.g. to ensure userspace actually
> > received the notification before KVM moved on.
> 
> Right, this does rely on exiting via , not userspace polling for flags or
> anything along that line.
> 
> > 
> > > +					__u8 flags;
> > > +  #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING	0
> > > +  #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE		1
> > 
> > This is also a weird reimplementation of generic functionality.  KVM nullifies
> > vcpu->arch.complete_userspace_io _before_ invoking the callback.  So if a callback
> > needs to run again on the next KVM_RUN, it can simply set complete_userspace_io
> > again.  In other words, literally doing nothing will get you what you want :-)
> 
> We could just have the completion callback set complete_userspace_io
> again, but then you'd always get 2 userspace exit events per attestation
> request. There could be some userspaces that don't implement the
> file-locking scheme, in which case they wouldn't need the 2nd notification.

Then they don't set immediate_exit.

> That's why the KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE flag is provided
> as an opt-in.
> 
> The pending/done status bits are so userspace can distinguish between the
> start of a certificate request and the completion side of it after it gets
> bound a completed attestation request and the filelock can be released.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH] KVM: SEV: Replace KVM_EXIT_VMGEXIT with KVM_EXIT_SNP_REQ_CERTS
  2024-05-14 14:36       ` Sean Christopherson
@ 2024-05-15  1:25         ` Michael Roth
  0 siblings, 0 replies; 49+ messages in thread
From: Michael Roth @ 2024-05-15  1:25 UTC (permalink / raw)
  To: seanjc
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, papaluri

It's not clear if SNP guests will need any other SNP-specific userspace
exit types in the future, and if they do, it's not clear that they would
necessary be related to VMGEXIT events or something else entirely.

So, rather than trying to anticipate future use-cases and have a single
union structure to manage the associated parameters, just use a common
KVM_EXIT_SNP_* prefix, but otherwise treat these as separate events, and
go ahead and convert the only VMGEXIT type currently defined,
KVM_USER_VMGEXIT_REQ_CERTS, over to KVM_EXIT_SNP_REQ_CERTS.

Also, formally document that kvm_run->immediate_exit is guaranteed to
handle userspace IO completion callbacks before returning to userspace
with -EINTR, and use this mechanism to allow userspace to use this
mechanism as a means to know when an attestation request is complete and
the KVM_EXIT_SNP_REQ_CERTS event is fully-finished, allowing userspace
to at that point handle any certificate synchronization cleanup like
releasing file locks/etc.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
Hi Sean,

Here's an attempt to address your concerns regarding
KVM_EXIT_VMGEXIT/KVM_USER_VMGEXIT_REQ_CERTS. Please let me know what
you think.

The main gist of it is that it leverages kvm->immediate_edit rather than
a flag in the kvm_run exit struct to receive notification when the
attestation has been completed and the certificate is no longer needed.

Additionally it drops the confusing KVM_EXIT_VMGEXIT naming in favor of
the KVM_EXIT_SNP_* prefix, and rather than introduce infrastructure and
a union to handle other SNP-specific types in the future, it simply
defines KVM_EXIT_SNP_REQ_CERTS as a one-off event so we are free to
handle future SNP-specific/SNP-related userspace exits however makes
sense for future cases.

Thanks,

Mike

 Documentation/virt/kvm/api.rst | 64 +++++++++++-----------------------
 arch/x86/kvm/svm/sev.c         | 28 +++------------
 include/uapi/linux/kvm.h       | 31 ++++++++--------
 3 files changed, 43 insertions(+), 80 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 6ab8b5b7c64e..ea05c16f3438 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6381,6 +6381,11 @@ to avoid usage of KVM_SET_SIGNAL_MASK, which has worse scalability.
 Rather than blocking the signal outside KVM_RUN, userspace can set up
 a signal handler that sets run->immediate_exit to a non-zero value.
 
+Also note that any KVM_EXIT_* events that have associated completion
+callbacks that KVM needs to process when KVM_RUN is called will be
+processed *before* exiting again to userspace with -EINTR as a result
+of run->immediate_exit.
+
 This field is ignored if KVM_CAP_IMMEDIATE_EXIT is not available.
 
 ::
@@ -7069,50 +7074,24 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
 
 ::
 
-		/* KVM_EXIT_VMGEXIT */
-		struct kvm_user_vmgexit {
-  #define KVM_USER_VMGEXIT_REQ_CERTS		1
-			__u32 type; /* KVM_USER_VMGEXIT_* type */
-			union {
-				struct {
-					__u64 data_gpa;
-					__u64 data_npages;
-  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_INVALID_LEN   1
-  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_BUSY          2
-  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_GENERIC       (1 << 31)
-					__u32 ret;
-  #define KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE	BIT(0)
-					__u8 flags;
-  #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING	0
-  #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE		1
-					__u8 status;
-				} req_certs;
-			};
-		};
-
-
-If exit reason is KVM_EXIT_VMGEXIT then it indicates that an SEV-SNP guest
-has issued a VMGEXIT instruction (as documented by the AMD Architecture
-Programmer's Manual (APM)) to the hypervisor that needs to be serviced by
-userspace. These are generally handled by the host kernel, but in some
-cases some aspects of handling a VMGEXIT are done in userspace.
-
-A kvm_user_vmgexit structure is defined to encapsulate the data to be
-sent to or returned by userspace. The type field defines the specific type
-of exit that needs to be serviced, and that type is used as a discriminator
-to determine which union type should be used for input/output.
-
-KVM_USER_VMGEXIT_REQ_CERTS
---------------------------
+		/* KVM_EXIT_SNP_REQ_CERTS */
+		struct {
+			__u64 data_gpa;
+			__u64 data_npages;
+  #define KVM_EXIT_SNP_REQ_CERTS_ERROR_INVALID_LEN   1
+  #define KVM_EXIT_SNP_REQ_CERTS_ERROR_BUSY          2
+  #define KVM_EXIT_SNP_REQ_CERTS_ERROR_GENERIC       (1 << 31)
+			__u32 ret;
+		} snp_req_certs;
 
 When an SEV-SNP issues a guest request for an attestation report, it has the
 option of issuing it in the form an *extended* guest request when a
 certificate blob is returned alongside the attestation report so the guest
 can validate the endorsement key used by SNP firmware to sign the report.
 These certificates are managed by userspace and are requested via
-KVM_EXIT_VMGEXITs using the KVM_USER_VMGEXIT_REQ_CERTS type.
+KVM_EXIT_SNP exits using the KVM_EXIT_SNP_REQ_CERTS type.
 
-For the KVM_USER_VMGEXIT_REQ_CERTS type, the req_certs union type
+For the KVM_EXIT_SNP_REQ_CERTS type, the req_certs union type
 is used. The kernel will supply in 'data_gpa' the value the guest supplies
 via the RAX field of the GHCB when issuing extended guest requests.
 'data_npages' will similarly contain the value the guest supplies in RBX
@@ -7139,12 +7118,11 @@ this is for the VMM to obtain a shared or exclusive lock on the path the
 certificate blob file resides at before reading it and returning it to KVM,
 and that it continues to hold the lock until the attestation request is
 actually sent to firmware. To facilitate this, the VMM can set the
-KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE flag before returning the
-certificate blob, in which case another KVM_EXIT_VMGEXIT of type
-KVM_USER_VMGEXIT_REQ_CERTS will be sent to userspace with
-KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE being set in the status field to
-indicate the request is fully-completed and that any associated locks can be
-released.
+run->immediate_exit flag before returning the certificate blob, in which
+case KVM is guaranteed to complete the issuing of all pending IO completion
+callbacks before exiting to userspace with EINTR. At this point userspace
+can release any locks it may have taken when the KVM_EXIT_SNP_REQ_CERTS exit
+was originally received.
 
 Tools/libraries that perform updates to SNP firmware TCB values or endorsement
 keys (e.g. firmware interfaces such as SNP_COMMIT, SNP_SET_CONFIG, or
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6cf665c410b2..e6318bbd8a6a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4006,21 +4006,14 @@ static int snp_complete_ext_guest_req(struct kvm_vcpu *vcpu)
 	sev_ret_code fw_err = 0;
 	int vmm_ret;
 
-	vmm_ret = vcpu->run->vmgexit.req_certs.ret;
+	vmm_ret = vcpu->run->snp_req_certs.ret;
 	if (vmm_ret) {
 		if (vmm_ret == SNP_GUEST_VMM_ERR_INVALID_LEN)
 			vcpu->arch.regs[VCPU_REGS_RBX] =
-				vcpu->run->vmgexit.req_certs.data_npages;
+				vcpu->run->snp_req_certs.data_npages;
 		goto out;
 	}
 
-	/*
-	 * The request was completed on the previous completion callback and
-	 * this completion is only for the STATUS_DONE userspace notification.
-	 */
-	if (vcpu->run->vmgexit.req_certs.status == KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE)
-		goto out_resume;
-
 	control = &svm->vmcb->control;
 
 	if (__snp_handle_guest_req(kvm, control->exit_info_1,
@@ -4029,14 +4022,6 @@ static int snp_complete_ext_guest_req(struct kvm_vcpu *vcpu)
 
 out:
 	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
-
-	if (vcpu->run->vmgexit.req_certs.flags & KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE) {
-		vcpu->run->vmgexit.req_certs.status = KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE;
-		vcpu->run->vmgexit.req_certs.flags = 0;
-		return 0; /* notify userspace of completion */
-	}
-
-out_resume:
 	return 1; /* resume guest */
 }
 
@@ -4060,12 +4045,9 @@ static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
 	 * Grab the certificates from userspace so that can be bundled with
 	 * attestation/guest requests.
 	 */
-	vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
-	vcpu->run->vmgexit.type = KVM_USER_VMGEXIT_REQ_CERTS;
-	vcpu->run->vmgexit.req_certs.data_gpa = data_gpa;
-	vcpu->run->vmgexit.req_certs.data_npages = data_npages;
-	vcpu->run->vmgexit.req_certs.flags = 0;
-	vcpu->run->vmgexit.req_certs.status = KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING;
+	vcpu->run->exit_reason = KVM_EXIT_SNP_REQ_CERTS;
+	vcpu->run->snp_req_certs.data_gpa = data_gpa;
+	vcpu->run->snp_req_certs.data_npages = data_npages;
 	vcpu->arch.complete_userspace_io = snp_complete_ext_guest_req;
 
 	return 0; /* forward request to userspace */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 106367d87189..8ebfc91dc967 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -135,22 +135,17 @@ struct kvm_xen_exit {
 	} u;
 };
 
-struct kvm_user_vmgexit {
-#define KVM_USER_VMGEXIT_REQ_CERTS		1
-	__u32 type; /* KVM_USER_VMGEXIT_* type */
+struct kvm_exit_snp {
+#define KVM_EXIT_SNP_REQ_CERTS		1
+	__u32 type; /* KVM_EXIT_SNP_* type */
 	union {
 		struct {
 			__u64 data_gpa;
 			__u64 data_npages;
-#define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_INVALID_LEN   1
-#define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_BUSY          2
-#define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_GENERIC       (1 << 31)
+#define KVM_EXIT_SNP_REQ_CERTS_ERROR_INVALID_LEN   1
+#define KVM_EXIT_SNP_REQ_CERTS_ERROR_BUSY          2
+#define KVM_EXIT_SNP_REQ_CERTS_ERROR_GENERIC       (1 << 31)
 			__u32 ret;
-#define KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE	BIT(0)
-			__u8 flags;
-#define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING	0
-#define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE		1
-			__u8 status;
 		} req_certs;
 	};
 };
@@ -198,7 +193,7 @@ struct kvm_user_vmgexit {
 #define KVM_EXIT_NOTIFY           37
 #define KVM_EXIT_LOONGARCH_IOCSR  38
 #define KVM_EXIT_MEMORY_FAULT     39
-#define KVM_EXIT_VMGEXIT          40
+#define KVM_EXIT_SNP_REQUEST_CERTS 40
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -454,8 +449,16 @@ struct kvm_run {
 			__u64 gpa;
 			__u64 size;
 		} memory_fault;
-		/* KVM_EXIT_VMGEXIT */
-		struct kvm_user_vmgexit vmgexit;
+		/* KVM_EXIT_SNP_REQ_CERTS */
+		struct {
+			__u64 data_gpa;
+			__u64 data_npages;
+#define KVM_EXIT_SNP_REQ_CERTS_ERROR_INVALID_LEN   1
+#define KVM_EXIT_SNP_REQ_CERTS_ERROR_BUSY          2
+#define KVM_EXIT_SNP_REQ_CERTS_ERROR_GENERIC       (1 << 31)
+			__u32 ret;
+		} snp_req_certs;
+
 		/* Fix the size of the union. */
 		char padding[256];
 	};
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
  2024-05-01  8:51 ` [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT Michael Roth
@ 2024-05-16  8:28   ` Binbin Wu
  2024-05-16 17:23     ` Paolo Bonzini
  0 siblings, 1 reply; 49+ messages in thread
From: Binbin Wu @ 2024-05-16  8:28 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh, Yamahata, Isaku



On 5/1/2024 4:51 PM, Michael Roth wrote:
> SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
> table to be private or shared using the Page State Change MSR protocol
> as defined in the GHCB specification.
>
> When using gmem, private/shared memory is allocated through separate
> pools, and KVM relies on userspace issuing a KVM_SET_MEMORY_ATTRIBUTES
> KVM ioctl to tell the KVM MMU whether or not a particular GFN should be
> backed by private memory or not.
>
> Forward these page state change requests to userspace so that it can
> issue the expected KVM ioctls. The KVM MMU will handle updating the RMP
> entries when it is ready to map a private page into a guest.
>
> Use the existing KVM_HC_MAP_GPA_RANGE hypercall format to deliver these
> requests to userspace via KVM_EXIT_HYPERCALL.
>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> Co-developed-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>   arch/x86/include/asm/sev-common.h |  6 ++++
>   arch/x86/kvm/svm/sev.c            | 48 +++++++++++++++++++++++++++++++
>   2 files changed, 54 insertions(+)
>
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index 1006bfffe07a..6d68db812de1 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -101,11 +101,17 @@ enum psc_op {
>   	/* GHCBData[11:0] */				\
>   	GHCB_MSR_PSC_REQ)
>   
> +#define GHCB_MSR_PSC_REQ_TO_GFN(msr) (((msr) & GENMASK_ULL(51, 12)) >> 12)
> +#define GHCB_MSR_PSC_REQ_TO_OP(msr) (((msr) & GENMASK_ULL(55, 52)) >> 52)
> +
>   #define GHCB_MSR_PSC_RESP		0x015
>   #define GHCB_MSR_PSC_RESP_VAL(val)			\
>   	/* GHCBData[63:32] */				\
>   	(((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
>   
> +/* Set highest bit as a generic error response */
> +#define GHCB_MSR_PSC_RESP_ERROR (BIT_ULL(63) | GHCB_MSR_PSC_RESP)
> +
>   /* GHCB Hypervisor Feature Request/Response */
>   #define GHCB_MSR_HV_FT_REQ		0x080
>   #define GHCB_MSR_HV_FT_RESP		0x081
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index e1ac5af4cb74..720775c9d0b8 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3461,6 +3461,48 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
>   	svm->vmcb->control.ghcb_gpa = value;
>   }
>   
> +static int snp_complete_psc_msr(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	if (vcpu->run->hypercall.ret)

Do we have definition of ret? I didn't find clear documentation about it.
According to the code, 0 means succssful. Is there any other error codes 
need to or can be interpreted?

For TDX, it may also want to use KVM_HC_MAP_GPA_RANGE hypercall  to 
userspace via KVM_EXIT_HYPERCALL.


> +		set_ghcb_msr(svm, GHCB_MSR_PSC_RESP_ERROR);
> +	else
> +		set_ghcb_msr(svm, GHCB_MSR_PSC_RESP);
> +
> +	return 1; /* resume guest */
> +}
>
[...]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
  2024-05-16  8:28   ` Binbin Wu
@ 2024-05-16 17:23     ` Paolo Bonzini
  0 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2024-05-16 17:23 UTC (permalink / raw)
  To: Binbin Wu
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	seanjc, vkuznets, jmattson, luto, dave.hansen, slp, pgonda,
	peterz, srinivas.pandruvada, rientjes, dovmurik, tobin, bp,
	vbabka, kirill, ak, tony.luck, sathyanarayanan.kuppuswamy,
	alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh, Yamahata, Isaku

On Thu, May 16, 2024 at 10:29 AM Binbin Wu <binbin.wu@linux.intel.com> wrote:
>
>
>
> On 5/1/2024 4:51 PM, Michael Roth wrote:
> > SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
> > table to be private or shared using the Page State Change MSR protocol
> > as defined in the GHCB specification.
> >
> > When using gmem, private/shared memory is allocated through separate
> > pools, and KVM relies on userspace issuing a KVM_SET_MEMORY_ATTRIBUTES
> > KVM ioctl to tell the KVM MMU whether or not a particular GFN should be
> > backed by private memory or not.
> >
> > Forward these page state change requests to userspace so that it can
> > issue the expected KVM ioctls. The KVM MMU will handle updating the RMP
> > entries when it is ready to map a private page into a guest.
> >
> > Use the existing KVM_HC_MAP_GPA_RANGE hypercall format to deliver these
> > requests to userspace via KVM_EXIT_HYPERCALL.
> >
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > Co-developed-by: Brijesh Singh <brijesh.singh@amd.com>
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > ---
> >   arch/x86/include/asm/sev-common.h |  6 ++++
> >   arch/x86/kvm/svm/sev.c            | 48 +++++++++++++++++++++++++++++++
> >   2 files changed, 54 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> > index 1006bfffe07a..6d68db812de1 100644
> > --- a/arch/x86/include/asm/sev-common.h
> > +++ b/arch/x86/include/asm/sev-common.h
> > @@ -101,11 +101,17 @@ enum psc_op {
> >       /* GHCBData[11:0] */                            \
> >       GHCB_MSR_PSC_REQ)
> >
> > +#define GHCB_MSR_PSC_REQ_TO_GFN(msr) (((msr) & GENMASK_ULL(51, 12)) >> 12)
> > +#define GHCB_MSR_PSC_REQ_TO_OP(msr) (((msr) & GENMASK_ULL(55, 52)) >> 52)
> > +
> >   #define GHCB_MSR_PSC_RESP           0x015
> >   #define GHCB_MSR_PSC_RESP_VAL(val)                  \
> >       /* GHCBData[63:32] */                           \
> >       (((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
> >
> > +/* Set highest bit as a generic error response */
> > +#define GHCB_MSR_PSC_RESP_ERROR (BIT_ULL(63) | GHCB_MSR_PSC_RESP)
> > +
> >   /* GHCB Hypervisor Feature Request/Response */
> >   #define GHCB_MSR_HV_FT_REQ          0x080
> >   #define GHCB_MSR_HV_FT_RESP         0x081
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index e1ac5af4cb74..720775c9d0b8 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -3461,6 +3461,48 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
> >       svm->vmcb->control.ghcb_gpa = value;
> >   }
> >
> > +static int snp_complete_psc_msr(struct kvm_vcpu *vcpu)
> > +{
> > +     struct vcpu_svm *svm = to_svm(vcpu);
> > +
> > +     if (vcpu->run->hypercall.ret)
>
> Do we have definition of ret? I didn't find clear documentation about it.
> According to the code, 0 means succssful. Is there any other error codes
> need to or can be interpreted?

They are defined in include/uapi/linux/kvm_para.h

#define KVM_ENOSYS        1000
#define KVM_EFAULT        EFAULT /* 14 */
#define KVM_EINVAL        EINVAL /* 22 */
#define KVM_E2BIG        E2BIG /* 7 */
#define KVM_EPERM        EPERM /* 1*/
#define KVM_EOPNOTSUPP        95

Linux however does not expect the hypercall to fail for SEV/SEV-ES; and
it will terminate the guest if the PSC operation fails for SEV-SNP.  So
it's best for userspace if the hypercall always succeeds. :)

> For TDX, it may also want to use KVM_HC_MAP_GPA_RANGE hypercall  to
> userspace via KVM_EXIT_HYPERCALL.

Yes, definitely.

Paolo


^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2024-05-16 17:23 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-01  8:51 [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
2024-05-01  8:51 ` [PATCH v15 01/20] Revert "KVM: x86: Add gmem hook for determining max NPT mapping level" Michael Roth
2024-05-01  8:51 ` [PATCH v15 02/20] KVM: x86: Add hook for determining max NPT mapping level Michael Roth
2024-05-02 23:11   ` Isaku Yamahata
2024-05-07 17:48   ` Paolo Bonzini
2024-05-01  8:51 ` [PATCH v15 03/20] KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y Michael Roth
2024-05-01  8:51 ` [PATCH v15 04/20] KVM: SEV: Add initial SEV-SNP support Michael Roth
2024-05-01  8:51 ` [PATCH v15 05/20] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command Michael Roth
2024-05-01  8:51 ` [PATCH v15 06/20] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command Michael Roth
2024-05-01  8:51 ` [PATCH v15 07/20] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command Michael Roth
2024-05-01  8:51 ` [PATCH v15 08/20] KVM: SEV: Add support to handle GHCB GPA register VMGEXIT Michael Roth
2024-05-01  8:51 ` [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT Michael Roth
2024-05-16  8:28   ` Binbin Wu
2024-05-16 17:23     ` Paolo Bonzini
2024-05-01  8:52 ` [PATCH v15 10/20] KVM: SEV: Add support to handle " Michael Roth
2024-05-01  8:52 ` [PATCH v15 11/20] KVM: SEV: Add support to handle RMP nested page faults Michael Roth
2024-05-01  8:52 ` [PATCH v15 12/20] KVM: SEV: Support SEV-SNP AP Creation NAE event Michael Roth
2024-05-01  8:52 ` [PATCH v15 13/20] KVM: SEV: Implement gmem hook for initializing private pages Michael Roth
2024-05-01  8:52 ` [PATCH v15 14/20] KVM: SEV: Implement gmem hook for invalidating " Michael Roth
2024-05-01  8:52 ` [PATCH v15 15/20] KVM: x86: Implement hook for determining max NPT mapping level Michael Roth
2024-05-01  8:52 ` [PATCH v15 16/20] KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP Michael Roth
2024-05-01  8:52 ` [PATCH v15 17/20] KVM: SVM: Add module parameter to enable SEV-SNP Michael Roth
2024-05-01  8:52 ` [PATCH v15 18/20] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
2024-05-01  8:52 ` [PATCH v15 19/20] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST " Michael Roth
2024-05-13 23:48   ` Sean Christopherson
2024-05-14  2:51     ` Michael Roth
2024-05-14 14:36       ` Sean Christopherson
2024-05-15  1:25         ` [PATCH] KVM: SEV: Replace KVM_EXIT_VMGEXIT with KVM_EXIT_SNP_REQ_CERTS Michael Roth
2024-05-01  8:52 ` [PATCH v15 20/20] crypto: ccp: Add the SNP_VLEK_LOAD command Michael Roth
2024-05-07 18:04 ` [PATCH v15 00/20] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Paolo Bonzini
2024-05-07 18:14   ` Michael Roth
2024-05-10  2:34     ` Michael Roth
2024-05-10  1:58 ` [PATCH v15 21/23] KVM: MMU: Disable fast path for private memslots Michael Roth
2024-05-10  1:58   ` [PATCH v15 22/23] KVM: SEV: Fix return code interpretation for RMP nested page faults Michael Roth
2024-05-10 13:58     ` Sean Christopherson
2024-05-10 15:36       ` Michael Roth
2024-05-10 16:01       ` Paolo Bonzini
2024-05-10 16:37         ` Michael Roth
2024-05-10 16:59           ` Paolo Bonzini
2024-05-10 17:25             ` Paolo Bonzini
2024-05-14  8:10             ` Borislav Petkov
2024-05-10  1:58   ` [PATCH v15 23/23] KVM: SEV: Fix PSC handling for SMASH/UNSMASH and partial update ops Michael Roth
2024-05-10 17:09     ` Paolo Bonzini
2024-05-10 19:08       ` Michael Roth
2024-05-10 13:47   ` [PATCH v15 21/23] KVM: MMU: Disable fast path for private memslots Sean Christopherson
2024-05-10 13:50     ` Paolo Bonzini
2024-05-10 15:27       ` Michael Roth
2024-05-10 15:59         ` Sean Christopherson
2024-05-10 17:47           ` Isaku Yamahata

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.