mbox series

[0/2] Reset timeout for paused hardware

Message ID 20190522174812.5597-1-keith.busch@intel.com (mailing list archive)
Headers show
Series Reset timeout for paused hardware | expand

Message

Keith Busch May 22, 2019, 5:48 p.m. UTC
Hardware may temporarily stop processing commands that have
been dispatched to it while activating new firmware. Some target
implementation's paused state time exceeds the default request expiry,
so any request dispatched before the driver could quiesce for the
hardware's paused state will time out, and handling this may interrupt
the firmware activation.

This two-part series provides a way for drivers to reset dispatched
requests' timeout deadline, then uses this new mechanism from the nvme
driver's fw activation work.

Keith Busch (2):
  blk-mq: provide way to reset rq timeouts
  nvme: reset request timeouts during fw activation

 block/blk-mq.c           | 30 ++++++++++++++++++++++++++++++
 drivers/nvme/host/core.c | 20 ++++++++++++++++++++
 include/linux/blk-mq.h   |  1 +
 3 files changed, 51 insertions(+)

Comments

Bart Van Assche May 22, 2019, 8:20 p.m. UTC | #1
On 5/22/19 7:48 PM, Keith Busch wrote:
> Hardware may temporarily stop processing commands that have
> been dispatched to it while activating new firmware. Some target
> implementation's paused state time exceeds the default request expiry,
> so any request dispatched before the driver could quiesce for the
> hardware's paused state will time out, and handling this may interrupt
> the firmware activation.
> 
> This two-part series provides a way for drivers to reset dispatched
> requests' timeout deadline, then uses this new mechanism from the nvme
> driver's fw activation work.

Hi Keith,

Is it essential to modify the block layer to implement this behavior
change? Would it be possible to implement this behavior change by
modifying the NVMe driver only, e.g. by modifying the nvme_timeout()
function and by making that function return BLK_EH_RESET_TIMER while new
firmware is being activated?

Thanks,

Bart.
Keith Busch May 22, 2019, 8:28 p.m. UTC | #2
On Wed, May 22, 2019 at 10:20:45PM +0200, Bart Van Assche wrote:
> On 5/22/19 7:48 PM, Keith Busch wrote:
> > Hardware may temporarily stop processing commands that have
> > been dispatched to it while activating new firmware. Some target
> > implementation's paused state time exceeds the default request expiry,
> > so any request dispatched before the driver could quiesce for the
> > hardware's paused state will time out, and handling this may interrupt
> > the firmware activation.
> > 
> > This two-part series provides a way for drivers to reset dispatched
> > requests' timeout deadline, then uses this new mechanism from the nvme
> > driver's fw activation work.
> 
> Hi Keith,
> 
> Is it essential to modify the block layer to implement this behavior
> change? Would it be possible to implement this behavior change by
> modifying the NVMe driver only, e.g. by modifying the nvme_timeout()
> function and by making that function return BLK_EH_RESET_TIMER while new
> firmware is being activated?

Good question.

We can't just do this from nvme_timeout(), though. That introduces races
between timeout_work and fw_act_work if that fw work clears the
condition that timeout needs to observe to return RESET_TIMER.

Even if we avoid that race, the rq->deadline needs to be adjusted to
the current time after the h/w unpause because the time accumulated while
h/w halted itself should not be counted against the request.
Ming Lei May 23, 2019, 3:29 a.m. UTC | #3
On Wed, May 22, 2019 at 11:48:10AM -0600, Keith Busch wrote:
> Hardware may temporarily stop processing commands that have
> been dispatched to it while activating new firmware. Some target
> implementation's paused state time exceeds the default request expiry,
> so any request dispatched before the driver could quiesce for the
> hardware's paused state will time out, and handling this may interrupt
> the firmware activation.
> 
> This two-part series provides a way for drivers to reset dispatched
> requests' timeout deadline, then uses this new mechanism from the nvme
> driver's fw activation work.

Just wondering why not freeze IO queues before updating FW?


Thanks,
Ming
Keith Busch May 23, 2019, 3:48 a.m. UTC | #4
On Wed, May 22, 2019, 9:29 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Wed, May 22, 2019 at 11:48:10AM -0600, Keith Busch wrote:
> > Hardware may temporarily stop processing commands that have
> > been dispatched to it while activating new firmware. Some target
> > implementation's paused state time exceeds the default request expiry,
> > so any request dispatched before the driver could quiesce for the
> > hardware's paused state will time out, and handling this may interrupt
> > the firmware activation.
> >
> > This two-part series provides a way for drivers to reset dispatched
> > requests' timeout deadline, then uses this new mechanism from the nvme
> > driver's fw activation work.
>
> Just wondering why not freeze IO queues before updating FW?


Yeah, that's a good question. A FW update may have been initiated out
of band or from another host entirely. The driver can't count on
preparing for hardware pausing command processing before it's
happened, but we'll always find out asynchronously after it's too late
to freeze.
Bart Van Assche May 23, 2019, 10:04 a.m. UTC | #5
On 5/22/19 10:28 PM, Keith Busch wrote:
> On Wed, May 22, 2019 at 10:20:45PM +0200, Bart Van Assche wrote:
>> On 5/22/19 7:48 PM, Keith Busch wrote:
>>> Hardware may temporarily stop processing commands that have
>>> been dispatched to it while activating new firmware. Some target
>>> implementation's paused state time exceeds the default request expiry,
>>> so any request dispatched before the driver could quiesce for the
>>> hardware's paused state will time out, and handling this may interrupt
>>> the firmware activation.
>>>
>>> This two-part series provides a way for drivers to reset dispatched
>>> requests' timeout deadline, then uses this new mechanism from the nvme
>>> driver's fw activation work.
>>
>> Hi Keith,
>>
>> Is it essential to modify the block layer to implement this behavior
>> change? Would it be possible to implement this behavior change by
>> modifying the NVMe driver only, e.g. by modifying the nvme_timeout()
>> function and by making that function return BLK_EH_RESET_TIMER while new
>> firmware is being activated?
> 
> Good question.
> 
> We can't just do this from nvme_timeout(), though. That introduces races
> between timeout_work and fw_act_work if that fw work clears the
> condition that timeout needs to observe to return RESET_TIMER.
> 
> Even if we avoid that race, the rq->deadline needs to be adjusted to
> the current time after the h/w unpause because the time accumulated while
> h/w halted itself should not be counted against the request.

Hi Keith,

How about recording the time at which the firmware upgrade finished and
making nvme_timeout() return RESET_TIMER if either a firmware upgrade is
in progress or if it has finished less than (request timeout) seconds
ago? The reason I'm asking this is because I'm concerned that the
patches you posted would need a careful review to see whether or not
these trigger new race conditions.

Thanks,

Bart.
Christoph Hellwig May 23, 2019, 10:13 a.m. UTC | #6
On Wed, May 22, 2019 at 09:48:10PM -0600, Keith Busch wrote:
> Yeah, that's a good question. A FW update may have been initiated out
> of band or from another host entirely. The driver can't count on
> preparing for hardware pausing command processing before it's
> happened, but we'll always find out asynchronously after it's too late
> to freeze.

I don't think that is the case at least for spec compliant devices.

From NVMe 1.3:

Figure 49: Asynchronous Event Information - Notice

1h		Firmware Activation Starting: The controller is starting
		a firmware activation process during which command
		processing is paused. Host software may use CSTS.PP to
		determine when command processing has resumed. To clear
		this event, host software reads the Firmware Slot
		Information log page.

So we are supposed to get an AEN before the device stops processing
commands.
Keith Busch May 23, 2019, 1:23 p.m. UTC | #7
On Thu, May 23, 2019 at 03:13:11AM -0700, Christoph Hellwig wrote:
> On Wed, May 22, 2019 at 09:48:10PM -0600, Keith Busch wrote:
> > Yeah, that's a good question. A FW update may have been initiated out
> > of band or from another host entirely. The driver can't count on
> > preparing for hardware pausing command processing before it's
> > happened, but we'll always find out asynchronously after it's too late
> > to freeze.
> 
> I don't think that is the case at least for spec compliant devices.
> 
> From NVMe 1.3:
> 
> Figure 49: Asynchronous Event Information - Notice
> 
> 1h		Firmware Activation Starting: The controller is starting
> 		a firmware activation process during which command
> 		processing is paused. Host software may use CSTS.PP to
> 		determine when command processing has resumed. To clear
> 		this event, host software reads the Firmware Slot
> 		Information log page.
> 
> So we are supposed to get an AEN before the device stops processing
> commands.

Hm, I read the same section, but conclude differently (and at least some
vendors did too). A spec compliant controller activating new firmware
without reset would stop processing commands and set CSTS.PP first,
then send the AEN. When the host is aware to poll Processing Paused,
the controller hasn't been processing new commands for some time.

Could you give some more detail on your interpretation?
Christoph Hellwig May 23, 2019, 2:10 p.m. UTC | #8
On Thu, May 23, 2019 at 07:23:04AM -0600, Keith Busch wrote:
> > Figure 49: Asynchronous Event Information - Notice
> > 
> > 1h		Firmware Activation Starting: The controller is starting
> > 		a firmware activation process during which command
> > 		processing is paused. Host software may use CSTS.PP to
> > 		determine when command processing has resumed. To clear
> > 		this event, host software reads the Firmware Slot
> > 		Information log page.
> > 
> > So we are supposed to get an AEN before the device stops processing
> > commands.
> 
> Hm, I read the same section, but conclude differently (and at least some
> vendors did too). A spec compliant controller activating new firmware
> without reset would stop processing commands and set CSTS.PP first,
> then send the AEN. When the host is aware to poll Processing Paused,
> the controller hasn't been processing new commands for some time.
> 
> Could you give some more detail on your interpretation?

What would be the point of the AEN if it wasn't sent at exactly
the point when the controller stops acceppting command?

The wording is of course NVMe-like, but "The controller is starting a.."
pretty clear implies this is sent at the beginning of the paused
state, not the end.
Keith Busch May 23, 2019, 2:19 p.m. UTC | #9
On Thu, May 23, 2019 at 04:10:54PM +0200, Christoph Hellwig wrote:
> On Thu, May 23, 2019 at 07:23:04AM -0600, Keith Busch wrote:
> > > Figure 49: Asynchronous Event Information - Notice
> > > 
> > > 1h		Firmware Activation Starting: The controller is starting
> > > 		a firmware activation process during which command
> > > 		processing is paused. Host software may use CSTS.PP to
> > > 		determine when command processing has resumed. To clear
> > > 		this event, host software reads the Firmware Slot
> > > 		Information log page.
> > > 
> > > So we are supposed to get an AEN before the device stops processing
> > > commands.
> > 
> > Hm, I read the same section, but conclude differently (and at least some
> > vendors did too). A spec compliant controller activating new firmware
> > without reset would stop processing commands and set CSTS.PP first,
> > then send the AEN. When the host is aware to poll Processing Paused,
> > the controller hasn't been processing new commands for some time.
> > 
> > Could you give some more detail on your interpretation?
> 
> What would be the point of the AEN if it wasn't sent at exactly
> the point when the controller stops acceppting command?
> 
> The wording is of course NVMe-like, but "The controller is starting a.."
> pretty clear implies this is sent at the beginning of the paused
> state, not the end.

Right, the controller starts the pause at time T1, and sends the AEN at
the same time. No new commands will be processed after T1 until the
firmware activate completes.

The host sees the AEN a short time later, time T2.

The time between T1 - T2, we may have sent many commands that are not
going to be processed, and we couldn't have known that when they were
submitted. The controller still owns those commands, and we just need
to adjust their deadlines once processing resumes.