All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] PCI: vmd: always enable host bridge hotplug support flags
@ 2024-04-08 18:39 Paul M Stillwell Jr
  2024-05-02 22:08 ` Bjorn Helgaas
  0 siblings, 1 reply; 4+ messages in thread
From: Paul M Stillwell Jr @ 2024-04-08 18:39 UTC (permalink / raw)
  To: linux-pci; +Cc: Paul M Stillwell Jr, Nirmal Patel

Commit 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features") added
code to copy the _OSC flags from the root bridge to the host bridge for each
vmd device because the AER bits were not being set correctly which was
causing an AER interrupt storm for certain NVMe devices.

This works fine in bare metal environments, but causes problems when the
vmd driver is run in a hypervisor environment. In a hypervisor all the
_OSC bits are 0 despite what the underlying hardware indicates. This is
a problem for vmd users because if vmd is enabled the user *always*
wants hotplug support enabled. To solve this issue the vmd driver always
enables the hotplug bits in the host bridge structure for each vmd.

Fixes: 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features")
Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
---
 drivers/pci/controller/vmd.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 87b7856f375a..583b10bd5eb7 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -730,8 +730,14 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
 static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
 				       struct pci_host_bridge *vmd_bridge)
 {
-	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
-	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
+	/*
+	 * there is an issue when the vmd driver is running within a hypervisor
+	 * because all of the _OSC bits are 0 in that case. this disables
+	 * hotplug support, but users who enable VMD in their BIOS always want
+	 * hotplug suuport so always enable it.
+	 */
+	vmd_bridge->native_pcie_hotplug = 1;
+	vmd_bridge->native_shpc_hotplug = 1;
 	vmd_bridge->native_aer = root_bridge->native_aer;
 	vmd_bridge->native_pme = root_bridge->native_pme;
 	vmd_bridge->native_ltr = root_bridge->native_ltr;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] PCI: vmd: always enable host bridge hotplug support flags
  2024-04-08 18:39 [PATCH v3] PCI: vmd: always enable host bridge hotplug support flags Paul M Stillwell Jr
@ 2024-05-02 22:08 ` Bjorn Helgaas
  2024-05-02 22:38   ` Paul M Stillwell Jr
  0 siblings, 1 reply; 4+ messages in thread
From: Bjorn Helgaas @ 2024-05-02 22:08 UTC (permalink / raw)
  To: Paul M Stillwell Jr; +Cc: linux-pci, Nirmal Patel

On Mon, Apr 08, 2024 at 11:39:27AM -0700, Paul M Stillwell Jr wrote:
> Commit 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features") added
> code to copy the _OSC flags from the root bridge to the host bridge for each
> vmd device because the AER bits were not being set correctly which was
> causing an AER interrupt storm for certain NVMe devices.
> 
> This works fine in bare metal environments, but causes problems when the
> vmd driver is run in a hypervisor environment. In a hypervisor all the
> _OSC bits are 0 despite what the underlying hardware indicates. This is
> a problem for vmd users because if vmd is enabled the user *always*
> wants hotplug support enabled. To solve this issue the vmd driver always
> enables the hotplug bits in the host bridge structure for each vmd.
> 
> Fixes: 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features")
> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
> ---
>  drivers/pci/controller/vmd.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index 87b7856f375a..583b10bd5eb7 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -730,8 +730,14 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
>  static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
>  				       struct pci_host_bridge *vmd_bridge)
>  {
> -	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
> -	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
> +	/*
> +	 * there is an issue when the vmd driver is running within a hypervisor
> +	 * because all of the _OSC bits are 0 in that case. this disables
> +	 * hotplug support, but users who enable VMD in their BIOS always want
> +	 * hotplug suuport so always enable it.
> +	 */
> +	vmd_bridge->native_pcie_hotplug = 1;
> +	vmd_bridge->native_shpc_hotplug = 1;

Deferred for now because I think we need to figure out how to set all
these bits the same, or at least with a better algorithm than "here's
what we want in this environment."

Extended discussion about this at
https://lore.kernel.org/r/20240417201542.102-1-paul.m.stillwell.jr@intel.com

>  	vmd_bridge->native_aer = root_bridge->native_aer;
>  	vmd_bridge->native_pme = root_bridge->native_pme;
>  	vmd_bridge->native_ltr = root_bridge->native_ltr;
> -- 
> 2.39.1
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] PCI: vmd: always enable host bridge hotplug support flags
  2024-05-02 22:08 ` Bjorn Helgaas
@ 2024-05-02 22:38   ` Paul M Stillwell Jr
  2024-05-02 22:56     ` Bjorn Helgaas
  0 siblings, 1 reply; 4+ messages in thread
From: Paul M Stillwell Jr @ 2024-05-02 22:38 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Nirmal Patel

On 5/2/2024 3:08 PM, Bjorn Helgaas wrote:
> On Mon, Apr 08, 2024 at 11:39:27AM -0700, Paul M Stillwell Jr wrote:
>> Commit 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features") added
>> code to copy the _OSC flags from the root bridge to the host bridge for each
>> vmd device because the AER bits were not being set correctly which was
>> causing an AER interrupt storm for certain NVMe devices.
>>
>> This works fine in bare metal environments, but causes problems when the
>> vmd driver is run in a hypervisor environment. In a hypervisor all the
>> _OSC bits are 0 despite what the underlying hardware indicates. This is
>> a problem for vmd users because if vmd is enabled the user *always*
>> wants hotplug support enabled. To solve this issue the vmd driver always
>> enables the hotplug bits in the host bridge structure for each vmd.
>>
>> Fixes: 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features")
>> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
>> Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
>> ---
>>   drivers/pci/controller/vmd.c | 10 ++++++++--
>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
>> index 87b7856f375a..583b10bd5eb7 100644
>> --- a/drivers/pci/controller/vmd.c
>> +++ b/drivers/pci/controller/vmd.c
>> @@ -730,8 +730,14 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
>>   static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
>>   				       struct pci_host_bridge *vmd_bridge)
>>   {
>> -	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
>> -	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
>> +	/*
>> +	 * there is an issue when the vmd driver is running within a hypervisor
>> +	 * because all of the _OSC bits are 0 in that case. this disables
>> +	 * hotplug support, but users who enable VMD in their BIOS always want
>> +	 * hotplug suuport so always enable it.
>> +	 */
>> +	vmd_bridge->native_pcie_hotplug = 1;
>> +	vmd_bridge->native_shpc_hotplug = 1;
> 
> Deferred for now because I think we need to figure out how to set all
> these bits the same, or at least with a better algorithm than "here's
> what we want in this environment."
> 
> Extended discussion about this at
> https://lore.kernel.org/r/20240417201542.102-1-paul.m.stillwell.jr@intel.com
> 

That's ok by me. I thought where we left it was that if we could find a 
solution to the Correctable Errors from the original issue that maybe we 
could revert 04b12ef163d1.

I'm not sure I would know if a patch that fixes the Correctable Errors 
comes in... We have a test case we would like to test against that was 
pre 04b12ef163d1 (BIOS has AER disabled and we hotplug a disk which 
results in AER interrupts) so we would be curious if the issues we saw 
before goes away with a new patch for Correctable Errors.

Paul
>>   	vmd_bridge->native_aer = root_bridge->native_aer;
>>   	vmd_bridge->native_pme = root_bridge->native_pme;
>>   	vmd_bridge->native_ltr = root_bridge->native_ltr;
>> -- 
>> 2.39.1
>>
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] PCI: vmd: always enable host bridge hotplug support flags
  2024-05-02 22:38   ` Paul M Stillwell Jr
@ 2024-05-02 22:56     ` Bjorn Helgaas
  0 siblings, 0 replies; 4+ messages in thread
From: Bjorn Helgaas @ 2024-05-02 22:56 UTC (permalink / raw)
  To: Paul M Stillwell Jr; +Cc: linux-pci, Nirmal Patel

On Thu, May 02, 2024 at 03:38:00PM -0700, Paul M Stillwell Jr wrote:
> On 5/2/2024 3:08 PM, Bjorn Helgaas wrote:
> > On Mon, Apr 08, 2024 at 11:39:27AM -0700, Paul M Stillwell Jr wrote:
> > > Commit 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features") added
> > > code to copy the _OSC flags from the root bridge to the host bridge for each
> > > vmd device because the AER bits were not being set correctly which was
> > > causing an AER interrupt storm for certain NVMe devices.
> > > 
> > > This works fine in bare metal environments, but causes problems when the
> > > vmd driver is run in a hypervisor environment. In a hypervisor all the
> > > _OSC bits are 0 despite what the underlying hardware indicates. This is
> > > a problem for vmd users because if vmd is enabled the user *always*
> > > wants hotplug support enabled. To solve this issue the vmd driver always
> > > enables the hotplug bits in the host bridge structure for each vmd.
> > > 
> > > Fixes: 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features")
> > > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> > > Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
> > > ---
> > >   drivers/pci/controller/vmd.c | 10 ++++++++--
> > >   1 file changed, 8 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> > > index 87b7856f375a..583b10bd5eb7 100644
> > > --- a/drivers/pci/controller/vmd.c
> > > +++ b/drivers/pci/controller/vmd.c
> > > @@ -730,8 +730,14 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
> > >   static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
> > >   				       struct pci_host_bridge *vmd_bridge)
> > >   {
> > > -	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
> > > -	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
> > > +	/*
> > > +	 * there is an issue when the vmd driver is running within a hypervisor
> > > +	 * because all of the _OSC bits are 0 in that case. this disables
> > > +	 * hotplug support, but users who enable VMD in their BIOS always want
> > > +	 * hotplug suuport so always enable it.
> > > +	 */
> > > +	vmd_bridge->native_pcie_hotplug = 1;
> > > +	vmd_bridge->native_shpc_hotplug = 1;
> > 
> > Deferred for now because I think we need to figure out how to set all
> > these bits the same, or at least with a better algorithm than "here's
> > what we want in this environment."
> > 
> > Extended discussion about this at
> > https://lore.kernel.org/r/20240417201542.102-1-paul.m.stillwell.jr@intel.com
> 
> That's ok by me. I thought where we left it was that if we could find a
> solution to the Correctable Errors from the original issue that maybe we
> could revert 04b12ef163d1.
> 
> I'm not sure I would know if a patch that fixes the Correctable Errors comes
> in... We have a test case we would like to test against that was pre
> 04b12ef163d1 (BIOS has AER disabled and we hotplug a disk which results in
> AER interrupts) so we would be curious if the issues we saw before goes away
> with a new patch for Correctable Errors.

My current theory is that there's some issue with that particular
Samsung NVMe device that causes the Correctable Error flood.  Kai-Heng
says they happen even with VMD disabled.

And there are other reports that don't seem to involve VMD but do
involve this NVMe device ([144d:a80a]):

  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852420
  https://forums.unraid.net/topic/113521-constant-errors-on-logs-after-nvme-upgrade/
  https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/
  https://forum.proxmox.com/threads/pve-kernel-panics-on-reboots.144481/
  https://www.eevblog.com/forum/general-computing/linux-mint-21-02-clone-replace-1tb-nvme-with-a-2tb-nvme/

NVMe has weird power management stuff, so it's always possible we're
doing something wrong in a driver.

But I think we really need to handle Correctable Errors better by:

  - Possibly having drivers mask errors if they know about defects
  - Making the log messages less alarming, e.g.,  a single line report
  - Rate-limiting them so they're never overwhelming
  - Maybe automatically masking them in the PCI core to avoid interrupts

> > >   	vmd_bridge->native_aer = root_bridge->native_aer;
> > >   	vmd_bridge->native_pme = root_bridge->native_pme;
> > >   	vmd_bridge->native_ltr = root_bridge->native_ltr;
> > > -- 
> > > 2.39.1
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-05-02 22:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-08 18:39 [PATCH v3] PCI: vmd: always enable host bridge hotplug support flags Paul M Stillwell Jr
2024-05-02 22:08 ` Bjorn Helgaas
2024-05-02 22:38   ` Paul M Stillwell Jr
2024-05-02 22:56     ` Bjorn Helgaas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.