Message ID | 20180517022030.GB21959@localhost.localdomain (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, May 16, 2018 at 08:20:31PM -0600, Keith Busch wrote: > On Thu, May 17, 2018 at 07:10:59AM +0800, Ming Lei wrote: > > All simulation in block/011 may happen in reality. > > If this test actually simulates reality, then the following one line > patch (plus explanation for why) would be a real "fix" as this is very > successful in passing block/011. :) > > --- > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index 1faa32cd07da..dcc5746304c4 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -2118,6 +2118,12 @@ static int nvme_pci_enable(struct nvme_dev *dev) > > if (pci_enable_device_mem(pdev)) > return result; > + /* > + * blktests block/011 disables the device without the driver knowing. > + * We'll just enable the device twice to get the enable_cnt > 1 > + * so that the test's disabling does absolutely nothing. > + */ > + pci_enable_device_mem(pdev); Heh. But yes, this test and the PCI "enable" interface in sysfs look horribly wrong. pci_disable_device/pci_enable_device aren't something we can just do underneath the driver. Even if injecting the lowlevel config space manipulations done by it might be useful and a good test the Linux state ones are just killing the driver. The enable attribute appears to have been added by Arjan for the Xorg driver. I think if we have a driver bound to the device we should not allow it.
On Thu, May 17, 2018 at 10:41:29AM +0200, Christoph Hellwig wrote: > On Wed, May 16, 2018 at 08:20:31PM -0600, Keith Busch wrote: > > On Thu, May 17, 2018 at 07:10:59AM +0800, Ming Lei wrote: > > > All simulation in block/011 may happen in reality. > > > > If this test actually simulates reality, then the following one line > > patch (plus explanation for why) would be a real "fix" as this is very > > successful in passing block/011. :) > > > > --- > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > > index 1faa32cd07da..dcc5746304c4 100644 > > --- a/drivers/nvme/host/pci.c > > +++ b/drivers/nvme/host/pci.c > > @@ -2118,6 +2118,12 @@ static int nvme_pci_enable(struct nvme_dev *dev) > > > > if (pci_enable_device_mem(pdev)) > > return result; > > + /* > > + * blktests block/011 disables the device without the driver knowing. > > + * We'll just enable the device twice to get the enable_cnt > 1 > > + * so that the test's disabling does absolutely nothing. > > + */ > > + pci_enable_device_mem(pdev); > > Heh. But yes, this test and the PCI "enable" interface in sysfs look > horribly wrong. pci_disable_device/pci_enable_device aren't something we > can just do underneath the driver. Even if injecting the lowlevel > config space manipulations done by it might be useful and a good test > the Linux state ones are just killing the driver. Yes, I'm totally fine with injecting errors into the config space, but for goodness sakes, don't fuck with the internal kernel structures out from under drivers using them. My suggestion is to use 'setpci' to disable the device if you want to inject this scenario. That way you get the desired broken device scenario you want to test, but struct pci_dev hasn't been altered. > The enable attribute appears to have been added by Arjan for the > Xorg driver. I think if we have a driver bound to the device we > should not allow it. Agreed. Alternatively possibly call the driver's reset_preparei/done callbacks.
On Thu, May 17, 2018 at 08:20:51AM -0600, Keith Busch wrote: > > Heh. But yes, this test and the PCI "enable" interface in sysfs look > > horribly wrong. pci_disable_device/pci_enable_device aren't something we > > can just do underneath the driver. Even if injecting the lowlevel > > config space manipulations done by it might be useful and a good test > > the Linux state ones are just killing the driver. > > Yes, I'm totally fine with injecting errors into the config space, but > for goodness sakes, don't fuck with the internal kernel structures out > from under drivers using them. > > My suggestion is to use 'setpci' to disable the device if you want to > inject this scenario. That way you get the desired broken device > scenario you want to test, but struct pci_dev hasn't been altered. > > > The enable attribute appears to have been added by Arjan for the > > Xorg driver. I think if we have a driver bound to the device we > > should not allow it. > > Agreed. Alternatively possibly call the driver's reset_preparei/done > callbacks. Exactly, but as long as we can issue the reset via sysfs the test-case is still valid.
On Wed, May 16, 2018 at 08:20:31PM -0600, Keith Busch wrote: > Hi Ming, > > I'm developing the answers in code the issues you raised. It will just > take a moment to complete flushing those out. In the meantime just want > to point out why I think block/011 isn't a real test. > > On Thu, May 17, 2018 at 07:10:59AM +0800, Ming Lei wrote: > > All simulation in block/011 may happen in reality. > > If this test actually simulates reality, then the following one line > patch (plus explanation for why) would be a real "fix" as this is very > successful in passing block/011. :) > > --- > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index 1faa32cd07da..dcc5746304c4 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -2118,6 +2118,12 @@ static int nvme_pci_enable(struct nvme_dev *dev) > > if (pci_enable_device_mem(pdev)) > return result; > + /* > + * blktests block/011 disables the device without the driver knowing. > + * We'll just enable the device twice to get the enable_cnt > 1 > + * so that the test's disabling does absolutely nothing. > + */ > + pci_enable_device_mem(pdev); What I think block/011 is helpful is that it can trigger IO timeout during reset, which can be triggered in reality too. That is one big problem of NVMe driver, IMO. And this patch does fix this issue, together other timeout related races. Thanks, Ming
On Fri, May 18, 2018 at 08:19:58AM +0800, Ming Lei wrote: > On Wed, May 16, 2018 at 08:20:31PM -0600, Keith Busch wrote: > > Hi Ming, > > > > I'm developing the answers in code the issues you raised. It will just > > take a moment to complete flushing those out. In the meantime just want > > to point out why I think block/011 isn't a real test. > > > > On Thu, May 17, 2018 at 07:10:59AM +0800, Ming Lei wrote: > > > All simulation in block/011 may happen in reality. > > > > If this test actually simulates reality, then the following one line > > patch (plus explanation for why) would be a real "fix" as this is very > > successful in passing block/011. :) > > > > --- > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > > index 1faa32cd07da..dcc5746304c4 100644 > > --- a/drivers/nvme/host/pci.c > > +++ b/drivers/nvme/host/pci.c > > @@ -2118,6 +2118,12 @@ static int nvme_pci_enable(struct nvme_dev *dev) > > > > if (pci_enable_device_mem(pdev)) > > return result; > > + /* > > + * blktests block/011 disables the device without the driver knowing. > > + * We'll just enable the device twice to get the enable_cnt > 1 > > + * so that the test's disabling does absolutely nothing. > > + */ > > + pci_enable_device_mem(pdev); > > What I think block/011 is helpful is that it can trigger IO timeout > during reset, which can be triggered in reality too. > > That is one big problem of NVMe driver, IMO. > > And this patch does fix this issue, together other timeout related > races. BTW, the above patch may not be enough to 'fix' this NVMe issue, I guess you may have to figure out another one to remove fault-injection, or at least disable io-timeout-fail on NVMe. Thanks, Ming
On Fri, May 18, 2018 at 08:20:05AM +0800, Ming Lei wrote: > What I think block/011 is helpful is that it can trigger IO timeout > during reset, which can be triggered in reality too. As I mentioned earlier, there is nothing wrong with the spirit of the test. What's wrong with it is the misguided implemention. Do you underestand why it ever passes? The success happens when the enabling part of the loop happens to coincide with the driver's enabling, creating the pci_dev->enable_cnt > 1, making subsequent disable parts of the loop do absolutely nothing; the exact same as the one-liner (non-serious) patch I sent to defeat the test. A better way to induce the timeout is: # setpci -s <B:D.f> 4.w=0:6 This will halt the device without messing with the kernel structures, just like how a real device failure would occur.
On Thu, May 17, 2018 at 04:23:45PM +0200, Johannes Thumshirn wrote: > > Agreed. Alternatively possibly call the driver's reset_preparei/done > > callbacks. > > Exactly, but as long as we can issue the reset via sysfs the test-case > is still valid. I disagree the test case is valid. The test writes '0' to the pci-sysfs 'enable', but the driver also disables the pci device as part of resetting, which is a perfectly reasonable thing for a driver to do. If the timing of the test's loop happens to write '0' right after the driver disabled the device that it owns, a 'write error' on that sysfs write occurs, and blktests then incorrectly claims the test failed.
On 5/18/18 7:57 AM, Keith Busch wrote: > On Fri, May 18, 2018 at 08:20:05AM +0800, Ming Lei wrote: >> What I think block/011 is helpful is that it can trigger IO timeout >> during reset, which can be triggered in reality too. > > As I mentioned earlier, there is nothing wrong with the spirit of > the test. What's wrong with it is the misguided implemention. > > Do you underestand why it ever passes? The success happens when the > enabling part of the loop happens to coincide with the driver's enabling, > creating the pci_dev->enable_cnt > 1, making subsequent disable parts > of the loop do absolutely nothing; the exact same as the one-liner > (non-serious) patch I sent to defeat the test. > > A better way to induce the timeout is: > > # setpci -s <B:D.f> 4.w=0:6 > > This will halt the device without messing with the kernel structures, > just like how a real device failure would occur. Let's just improve/fix the test case. Sounds like the 'enable' sysfs attribute should never have been exported, and hence the test should never have used it. blktests is not the source of truth, necessarily, and it would be silly to work around cases in the kernel if it's a clear case of "doctor it hurts when I shoot myself in the foot".
On Fri, May 18, 2018 at 07:57:51AM -0600, Keith Busch wrote: > On Fri, May 18, 2018 at 08:20:05AM +0800, Ming Lei wrote: > > What I think block/011 is helpful is that it can trigger IO timeout > > during reset, which can be triggered in reality too. > > As I mentioned earlier, there is nothing wrong with the spirit of > the test. What's wrong with it is the misguided implemention. > > Do you underestand why it ever passes? The success happens when the > enabling part of the loop happens to coincide with the driver's enabling, > creating the pci_dev->enable_cnt > 1, making subsequent disable parts > of the loop do absolutely nothing; the exact same as the one-liner > (non-serious) patch I sent to defeat the test. > > A better way to induce the timeout is: > > # setpci -s <B:D.f> 4.w=0:6 > > This will halt the device without messing with the kernel structures, > just like how a real device failure would occur. Frankly speaking, I don't care how the test-case is implemented at all. The big problem is that NVMe driver can't handle IO time-out during reset context, and finally either the controller becomes DEAD or reset context hangs forever, and everything can't move on. The issue can be reproduced easier via io-timeout-fail fault injection. So could we please face to the real issue instead of working around test case? Thanks, Ming
On Sat, May 19, 2018 at 06:26:28AM +0800, Ming Lei wrote: > So could we please face to the real issue instead of working around > test case? Yes, that's why I want you to stop referencing the broken test.
On Fri, May 18, 2018 at 05:45:00PM -0600, Keith Busch wrote: > On Sat, May 19, 2018 at 06:26:28AM +0800, Ming Lei wrote: > > So could we please face to the real issue instead of working around > > test case? > > Yes, that's why I want you to stop referencing the broken test. Unfortunately I don't care if it is broken or not, and I think the test case is great because it can expose the real problem. That is its value! The reason why I referenced this test is that everyone can reproduce this real issue easily, that is it. Thanks, Ming
On Fri, May 18, 2018 at 10:28:04AM -0600, Keith Busch wrote: > On Thu, May 17, 2018 at 04:23:45PM +0200, Johannes Thumshirn wrote: > > > Agreed. Alternatively possibly call the driver's reset_preparei/done > > > callbacks. > > > > Exactly, but as long as we can issue the reset via sysfs the test-case > > is still valid. > > I disagree the test case is valid. The test writes '0' to the > pci-sysfs 'enable', but the driver also disables the pci device as part > of resetting, which is a perfectly reasonable thing for a driver to do. > > If the timing of the test's loop happens to write '0' right after the > driver disabled the device that it owns, a 'write error' on that sysfs > write occurs, and blktests then incorrectly claims the test failed. Hmm, ok that's a valid point. But seeing you have sent a patch for blktests anyways I think it's gone now.
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 1faa32cd07da..dcc5746304c4 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -2118,6 +2118,12 @@ static int nvme_pci_enable(struct nvme_dev *dev) if (pci_enable_device_mem(pdev)) return result; + /* + * blktests block/011 disables the device without the driver knowing. + * We'll just enable the device twice to get the enable_cnt > 1 + * so that the test's disabling does absolutely nothing. + */ + pci_enable_device_mem(pdev); pci_set_master(pdev); --
Hi Ming, I'm developing the answers in code the issues you raised. It will just take a moment to complete flushing those out. In the meantime just want to point out why I think block/011 isn't a real test. On Thu, May 17, 2018 at 07:10:59AM +0800, Ming Lei wrote: > All simulation in block/011 may happen in reality. If this test actually simulates reality, then the following one line patch (plus explanation for why) would be a real "fix" as this is very successful in passing block/011. :) ---