mbox series

[v2,0/4] Write-hint for FS journal

Message ID 1547047861-7271-1-git-send-email-joshi.k@samsung.com (mailing list archive)
Headers show
Series Write-hint for FS journal | expand

Message

Kanchan Joshi Jan. 9, 2019, 3:30 p.m. UTC
Towards supporing write-hints/streams for filesystem journal.                   
                                                                                
Here is the v1 patch for background -                                           
https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2                        
                                                                                
Changes since v1:                                                               
- introduce four more hints for in-kernel use, as recommended by Dave chinner   
  & Jens axboe. This isolates kernel-mode hints from user-mode ones.            
- remove mount-option to specify write-hint, as recommended by Jan kara &       
  Dave chinner. Rather, FS always sets write-hint for journal. This gets ignored
  if device does not support stream.                                            
- Removed code-redundancy for write_dirty_buffer (Jan kara's review comment)


Kanchan Joshi (4):
  block: Increase count of supported write-hints
  fs: introduce four macros for in-kernel hints
  fs: introduce APIs to enable sending write-hint with buffer-head
  fs/ext4,jbd2: add support for passing write-hint with journal.

 fs/buffer.c                 | 18 ++++++++++++++++--
 fs/ext4/super.c             |  2 ++
 fs/jbd2/commit.c            | 11 +++++++----
 fs/jbd2/journal.c           |  3 ++-
 fs/jbd2/revoke.c            |  3 ++-
 include/linux/blkdev.h      |  5 ++++-
 include/linux/buffer_head.h |  3 +++
 include/linux/fs.h          |  5 +++++
 include/linux/jbd2.h        |  8 ++++++++
 9 files changed, 49 insertions(+), 9 deletions(-)

Comments

Javier González Jan. 23, 2019, 6:35 p.m. UTC | #1
> On 9 Jan 2019, at 16.30, Kanchan Joshi <joshi.k@samsung.com> wrote:
> 
> Towards supporing write-hints/streams for filesystem journal.
> 
> Here is the v1 patch for background -
> https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2
> 
> Changes since v1:
> - introduce four more hints for in-kernel use, as recommended by Dave chinner
>  & Jens axboe. This isolates kernel-mode hints from user-mode ones.
> - remove mount-option to specify write-hint, as recommended by Jan kara &
>  Dave chinner. Rather, FS always sets write-hint for journal. This gets ignored
>  if device does not support stream.
> - Removed code-redundancy for write_dirty_buffer (Jan kara's review comment)
> 
> 
> Kanchan Joshi (4):
>  block: Increase count of supported write-hints
>  fs: introduce four macros for in-kernel hints
>  fs: introduce APIs to enable sending write-hint with buffer-head
>  fs/ext4,jbd2: add support for passing write-hint with journal.
> 
> fs/buffer.c                 | 18 ++++++++++++++++--
> fs/ext4/super.c             |  2 ++
> fs/jbd2/commit.c            | 11 +++++++----
> fs/jbd2/journal.c           |  3 ++-
> fs/jbd2/revoke.c            |  3 ++-
> include/linux/blkdev.h      |  5 ++++-
> include/linux/buffer_head.h |  3 +++
> include/linux/fs.h          |  5 +++++
> include/linux/jbd2.h        |  8 ++++++++
> 9 files changed, 49 insertions(+), 9 deletions(-)
> 
> --
> 2.7.4

Worth sharing the paper where you describe the design and the numbers
you collected [1]. Also, addressing Dave's comment on stream support, it
points to a Samsung drive supporting streams (PM963). In this context,
you should verify in V3 that we are at least passing xfstests and
blktests with these changes.

[1] https://www.usenix.org/system/files/conference/fast18/fast18-rho.pdf

Javier
Jan Kara Jan. 24, 2019, 8:29 a.m. UTC | #2
Hello,

On Wed 09-01-19 21:00:57, Kanchan Joshi wrote:
> Towards supporing write-hints/streams for filesystem journal.                   
>                                                                                 
> Here is the v1 patch for background -                                           
> https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2                        
>                                                                                 
> Changes since v1:                                                               
> - introduce four more hints for in-kernel use, as recommended by Dave chinner   
>   & Jens axboe. This isolates kernel-mode hints from user-mode ones.            
> - remove mount-option to specify write-hint, as recommended by Jan kara &       
>   Dave chinner. Rather, FS always sets write-hint for journal. This gets ignored
>   if device does not support stream.                                            
> - Removed code-redundancy for write_dirty_buffer (Jan kara's review comment)

I guess the series should also go to Jens since he was the original author
of write-hint support so he has the best idea about the architecture.

								Honza

> Kanchan Joshi (4):
>   block: Increase count of supported write-hints
>   fs: introduce four macros for in-kernel hints
>   fs: introduce APIs to enable sending write-hint with buffer-head
>   fs/ext4,jbd2: add support for passing write-hint with journal.
> 
>  fs/buffer.c                 | 18 ++++++++++++++++--
>  fs/ext4/super.c             |  2 ++
>  fs/jbd2/commit.c            | 11 +++++++----
>  fs/jbd2/journal.c           |  3 ++-
>  fs/jbd2/revoke.c            |  3 ++-
>  include/linux/blkdev.h      |  5 ++++-
>  include/linux/buffer_head.h |  3 +++
>  include/linux/fs.h          |  5 +++++
>  include/linux/jbd2.h        |  8 ++++++++
>  9 files changed, 49 insertions(+), 9 deletions(-)
> 
> -- 
> 2.7.4
> 
>
Kanchan Joshi Jan. 25, 2019, 2:20 p.m. UTC | #3
Hi Jens,
Can you please have a glance on this patch series, given than attempt is 
to extend the original architecture of write-hints.

Thanks,

On Thursday 24 January 2019 01:59 PM, Jan Kara wrote:
> Hello,
> 
> On Wed 09-01-19 21:00:57, Kanchan Joshi wrote:
>> Towards supporing write-hints/streams for filesystem journal.
>>                                                                                  
>> Here is the v1 patch for background -
>> https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2
>>                                                                                  
>> Changes since v1:
>> - introduce four more hints for in-kernel use, as recommended by Dave chinner
>>    & Jens axboe. This isolates kernel-mode hints from user-mode ones.
>> - remove mount-option to specify write-hint, as recommended by Jan kara &
>>    Dave chinner. Rather, FS always sets write-hint for journal. This gets ignored
>>    if device does not support stream.
>> - Removed code-redundancy for write_dirty_buffer (Jan kara's review comment)
> 
> I guess the series should also go to Jens since he was the original author
> of write-hint support so he has the best idea about the architecture.
> 
> 								Honza
> 
>> Kanchan Joshi (4):
>>    block: Increase count of supported write-hints
>>    fs: introduce four macros for in-kernel hints
>>    fs: introduce APIs to enable sending write-hint with buffer-head
>>    fs/ext4,jbd2: add support for passing write-hint with journal.
>>
>>   fs/buffer.c                 | 18 ++++++++++++++++--
>>   fs/ext4/super.c             |  2 ++
>>   fs/jbd2/commit.c            | 11 +++++++----
>>   fs/jbd2/journal.c           |  3 ++-
>>   fs/jbd2/revoke.c            |  3 ++-
>>   include/linux/blkdev.h      |  5 ++++-
>>   include/linux/buffer_head.h |  3 +++
>>   include/linux/fs.h          |  5 +++++
>>   include/linux/jbd2.h        |  8 ++++++++
>>   9 files changed, 49 insertions(+), 9 deletions(-)
>>
>> -- 
>> 2.7.4
>>
>>
Keith Busch Jan. 25, 2019, 4:23 p.m. UTC | #4
On Wed, Jan 09, 2019 at 09:00:57PM +0530, Kanchan Joshi wrote:
> Towards supporing write-hints/streams for filesystem journal.                   
>                                                                                 
> Here is the v1 patch for background -                                           
> https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2                        
>                                                                                 
> Changes since v1:                                                               
> - introduce four more hints for in-kernel use, as recommended by Dave chinner   
>   & Jens axboe. This isolates kernel-mode hints from user-mode ones.            

The nvme driver disables streams if the controller doesn't support
BLK_MAX_WRITE_HINT number of streams, so this series breaks the feature
for controllers that only support up to 4.
Jan Kara Jan. 28, 2019, 12:47 p.m. UTC | #5
On Fri 25-01-19 09:23:53, Keith Busch wrote:
> On Wed, Jan 09, 2019 at 09:00:57PM +0530, Kanchan Joshi wrote:
> > Towards supporing write-hints/streams for filesystem journal.                   
> >                                                                                 
> > Here is the v1 patch for background -                                           
> > https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2                        
> >                                                                                 
> > Changes since v1:                                                               
> > - introduce four more hints for in-kernel use, as recommended by Dave chinner   
> >   & Jens axboe. This isolates kernel-mode hints from user-mode ones.            
> 
> The nvme driver disables streams if the controller doesn't support
> BLK_MAX_WRITE_HINT number of streams, so this series breaks the feature
> for controllers that only support up to 4.

Right. Do you know if there are such controllers? Or are you just afraid
that there could be?

									Honza
Keith Busch Jan. 28, 2019, 11:24 p.m. UTC | #6
On Mon, Jan 28, 2019 at 04:47:09AM -0800, Jan Kara wrote:
> On Fri 25-01-19 09:23:53, Keith Busch wrote:
> > On Wed, Jan 09, 2019 at 09:00:57PM +0530, Kanchan Joshi wrote:
> > > Towards supporing write-hints/streams for filesystem journal.                   
> > >                                                                                 
> > > Here is the v1 patch for background -                                           
> > > https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2                        
> > >                                                                                 
> > > Changes since v1:                                                               
> > > - introduce four more hints for in-kernel use, as recommended by Dave chinner   
> > >   & Jens axboe. This isolates kernel-mode hints from user-mode ones.            
> > 
> > The nvme driver disables streams if the controller doesn't support
> > BLK_MAX_WRITE_HINT number of streams, so this series breaks the feature
> > for controllers that only support up to 4.
> 
> Right. Do you know if there are such controllers? Or are you just afraid
> that there could be?

I've asked around, and the concensus I received is all currently support
at least 8, but they couldn't say if that would be true for potential
lower budget products. Can we implement a reasonable fallback to use
what's available?
Jan Kara Jan. 29, 2019, 10:07 a.m. UTC | #7
On Mon 28-01-19 16:24:24, Keith Busch wrote:
> On Mon, Jan 28, 2019 at 04:47:09AM -0800, Jan Kara wrote:
> > On Fri 25-01-19 09:23:53, Keith Busch wrote:
> > > On Wed, Jan 09, 2019 at 09:00:57PM +0530, Kanchan Joshi wrote:
> > > > Towards supporing write-hints/streams for filesystem journal.                   
> > > >                                                                                 
> > > > Here is the v1 patch for background -                                           
> > > > https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2                        
> > > >                                                                                 
> > > > Changes since v1:                                                               
> > > > - introduce four more hints for in-kernel use, as recommended by Dave chinner   
> > > >   & Jens axboe. This isolates kernel-mode hints from user-mode ones.            
> > > 
> > > The nvme driver disables streams if the controller doesn't support
> > > BLK_MAX_WRITE_HINT number of streams, so this series breaks the feature
> > > for controllers that only support up to 4.
> > 
> > Right. Do you know if there are such controllers? Or are you just afraid
> > that there could be?
> 
> I've asked around, and the concensus I received is all currently support
> at least 8, but they couldn't say if that would be true for potential
> lower budget products. Can we implement a reasonable fallback to use
> what's available?

OK, thanks for input. So probably we should just map kernel stream IDs to 0
if the device doesn't support them. But that probably means we need to
propagate number of available streams up from NVME into the block layer so
that this can be handled reasonably seamlessly. Jens, Kanchan?

								Honza
Dave Chinner Jan. 30, 2019, 12:13 a.m. UTC | #8
On Tue, Jan 29, 2019 at 11:07:02AM +0100, Jan Kara wrote:
> On Mon 28-01-19 16:24:24, Keith Busch wrote:
> > On Mon, Jan 28, 2019 at 04:47:09AM -0800, Jan Kara wrote:
> > > On Fri 25-01-19 09:23:53, Keith Busch wrote:
> > > > On Wed, Jan 09, 2019 at 09:00:57PM +0530, Kanchan Joshi wrote:
> > > > > Towards supporing write-hints/streams for filesystem journal.                   
> > > > >                                                                                 
> > > > > Here is the v1 patch for background -                                           
> > > > > https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2                        
> > > > >                                                                                 
> > > > > Changes since v1:                                                               
> > > > > - introduce four more hints for in-kernel use, as recommended by Dave chinner   
> > > > >   & Jens axboe. This isolates kernel-mode hints from user-mode ones.            
> > > > 
> > > > The nvme driver disables streams if the controller doesn't support
> > > > BLK_MAX_WRITE_HINT number of streams, so this series breaks the feature
> > > > for controllers that only support up to 4.
> > > 
> > > Right. Do you know if there are such controllers? Or are you just afraid
> > > that there could be?
> > 
> > I've asked around, and the concensus I received is all currently support
> > at least 8, but they couldn't say if that would be true for potential
> > lower budget products. Can we implement a reasonable fallback to use
> > what's available?
> 
> OK, thanks for input. So probably we should just map kernel stream IDs to 0
> if the device doesn't support them. But that probably means we need to
> propagate number of available streams up from NVME into the block layer so
> that this can be handled reasonably seamlessly. Jens, Kanchan?

Yeah, that's basically what I said we needed to do when this was
last discussed. i.e. that the block layer needed to know how many
streams the hardware had and map the 4 "kernel internal" hints
appropriately to what he device supports.

e.g. if the device only supports 4 hints, then it needs to map the
kernel hints either to zero. If it supports less than 8 streams,
then they need otbe mapped into the hints above index 5. If there
are N streams, then they need to be mapped to the hints {N-3,N}

And, to top it all off, there needs to be guards so that if we want
to grow the userspace hints to more than 4 hints, they don't crash
into ranges the kernel is already reserving because of limited
device range support.

Nothing is ever simple....

Cheers,

Dave.
Kanchan Joshi Jan. 30, 2019, 1:54 p.m. UTC | #9
On Wednesday 30 January 2019 05:43 AM, Dave Chinner wrote:
> On Tue, Jan 29, 2019 at 11:07:02AM +0100, Jan Kara wrote:
>> On Mon 28-01-19 16:24:24, Keith Busch wrote:
>>> On Mon, Jan 28, 2019 at 04:47:09AM -0800, Jan Kara wrote:
>>>> On Fri 25-01-19 09:23:53, Keith Busch wrote:
>>>>> On Wed, Jan 09, 2019 at 09:00:57PM +0530, Kanchan Joshi wrote:
>>>>>> Towards supporing write-hints/streams for filesystem journal.
>>>>>>                                                                                  
>>>>>> Here is the v1 patch for background -
>>>>>> https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2
>>>>>>                                                                                  
>>>>>> Changes since v1:
>>>>>> - introduce four more hints for in-kernel use, as recommended by Dave chinner
>>>>>>    & Jens axboe. This isolates kernel-mode hints from user-mode ones.
>>>>>
>>>>> The nvme driver disables streams if the controller doesn't support
>>>>> BLK_MAX_WRITE_HINT number of streams, so this series breaks the feature
>>>>> for controllers that only support up to 4.
>>>>
>>>> Right. Do you know if there are such controllers? Or are you just afraid
>>>> that there could be?
>>>
>>> I've asked around, and the concensus I received is all currently support
>>> at least 8, but they couldn't say if that would be true for potential
>>> lower budget products. Can we implement a reasonable fallback to use
>>> what's available?
>>
>> OK, thanks for input. So probably we should just map kernel stream IDs to 0
>> if the device doesn't support them. But that probably means we need to
>> propagate number of available streams up from NVME into the block layer so
>> that this can be handled reasonably seamlessly. Jens, Kanchan?
> 
> Yeah, that's basically what I said we needed to do when this was
> last discussed. i.e. that the block layer needed to know how many
> streams the hardware had and map the 4 "kernel internal" hints
> appropriately to what he device supports.
> 
> e.g. if the device only supports 4 hints, then it needs to map the
> kernel hints either to zero. If it supports less than 8 streams,
> then they need otbe mapped into the hints above index 5. If there
> are N streams, then they need to be mapped to the hints {N-3,N}
> 
> And, to top it all off, there needs to be guards so that if we want
> to grow the userspace hints to more than 4 hints, they don't crash
> into ranges the kernel is already reserving because of limited
> device range support.
> 
> Nothing is ever simple....
> 
Thanks all for feedback.
user-hints, when they reach to kernel via fcntl path, are sanity-checked 
(rw_hint_valid function).
Currently streams are enabled when nvme driver is made to run with 
"streams =1" option, while stream users always pass some write-hint, 
without bothering whether streams (and how many of those) are 
operational or not. This keeps configuration simple for stream users. 
Second, block layer does not translate write-hint to stream-number, 
rather it is done inside nvme driver. I suppose I should keep both these 
properties intact.
And considering all the suggestions, this is the plan for V3 -

[In block layer]
1. Introduce one macro "KERN_WRITE_HINT_MIN" which will take the value 
"user_hint_cnt + 1".
FS code will use this value (onwards) to define their own streams.

2. Introduce another macro "BLK_MAX_KERNEL_WRITE_HINTS" which will be 
set to 4 for now.

[In nvme driver]
1. Continue working as before if device supports just 4 streams. All 
these streams are used by user-hints, and kernel-hints are translated to 0.

2. If device supports any more than 4 streams, those will be mapped to 
serve kernel-hints, starting from KERN_WRITE_HINT_MIN onwards.
For example, if device has 6 streams, four streams (numbers = 1,2,3,4) 
will be used to serve user-hints and two streams ( numbers = 65535, 
65534) will be used to serve first two kernel hints. Other kernel-hints 
get mapped to 0. OTOH, if device has 10 streams, first four kernel-hints 
will be mapped to non-zero values (65535 to 65532) and anything else 
would get turned to 0.


Let me know if this sounds fine?


Thanks,
Kanchan
Jan Kara Feb. 5, 2019, 11:50 a.m. UTC | #10
On Wed 30-01-19 19:24:39, Kanchan Joshi wrote:
> 
> On Wednesday 30 January 2019 05:43 AM, Dave Chinner wrote:
> > On Tue, Jan 29, 2019 at 11:07:02AM +0100, Jan Kara wrote:
> > > On Mon 28-01-19 16:24:24, Keith Busch wrote:
> > > > On Mon, Jan 28, 2019 at 04:47:09AM -0800, Jan Kara wrote:
> > > > > On Fri 25-01-19 09:23:53, Keith Busch wrote:
> > > > > > On Wed, Jan 09, 2019 at 09:00:57PM +0530, Kanchan Joshi wrote:
> > > > > > > Towards supporing write-hints/streams for filesystem journal.
> > > > > > > Here is the v1 patch for background -
> > > > > > > https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2
> > > > > > > Changes since v1:
> > > > > > > - introduce four more hints for in-kernel use, as recommended by Dave chinner
> > > > > > >    & Jens axboe. This isolates kernel-mode hints from user-mode ones.
> > > > > > 
> > > > > > The nvme driver disables streams if the controller doesn't support
> > > > > > BLK_MAX_WRITE_HINT number of streams, so this series breaks the feature
> > > > > > for controllers that only support up to 4.
> > > > > 
> > > > > Right. Do you know if there are such controllers? Or are you just afraid
> > > > > that there could be?
> > > > 
> > > > I've asked around, and the concensus I received is all currently support
> > > > at least 8, but they couldn't say if that would be true for potential
> > > > lower budget products. Can we implement a reasonable fallback to use
> > > > what's available?
> > > 
> > > OK, thanks for input. So probably we should just map kernel stream IDs to 0
> > > if the device doesn't support them. But that probably means we need to
> > > propagate number of available streams up from NVME into the block layer so
> > > that this can be handled reasonably seamlessly. Jens, Kanchan?
> > 
> > Yeah, that's basically what I said we needed to do when this was
> > last discussed. i.e. that the block layer needed to know how many
> > streams the hardware had and map the 4 "kernel internal" hints
> > appropriately to what he device supports.
> > 
> > e.g. if the device only supports 4 hints, then it needs to map the
> > kernel hints either to zero. If it supports less than 8 streams,
> > then they need otbe mapped into the hints above index 5. If there
> > are N streams, then they need to be mapped to the hints {N-3,N}
> > 
> > And, to top it all off, there needs to be guards so that if we want
> > to grow the userspace hints to more than 4 hints, they don't crash
> > into ranges the kernel is already reserving because of limited
> > device range support.
> > 
> > Nothing is ever simple....
> > 
> Thanks all for feedback.
> user-hints, when they reach to kernel via fcntl path, are sanity-checked
> (rw_hint_valid function).
> Currently streams are enabled when nvme driver is made to run with "streams
> =1" option, while stream users always pass some write-hint, without
> bothering whether streams (and how many of those) are operational or not.
> This keeps configuration simple for stream users. Second, block layer does
> not translate write-hint to stream-number, rather it is done inside nvme
> driver. I suppose I should keep both these properties intact.
> And considering all the suggestions, this is the plan for V3 -
> 
> [In block layer]
> 1. Introduce one macro "KERN_WRITE_HINT_MIN" which will take the value
> "user_hint_cnt + 1".
> FS code will use this value (onwards) to define their own streams.
> 
> 2. Introduce another macro "BLK_MAX_KERNEL_WRITE_HINTS" which will be set to
> 4 for now.
> 
> [In nvme driver]
> 1. Continue working as before if device supports just 4 streams. All these
> streams are used by user-hints, and kernel-hints are translated to 0.
> 
> 2. If device supports any more than 4 streams, those will be mapped to serve
> kernel-hints, starting from KERN_WRITE_HINT_MIN onwards.
> For example, if device has 6 streams, four streams (numbers = 1,2,3,4) will
> be used to serve user-hints and two streams ( numbers = 65535, 65534) will
> be used to serve first two kernel hints. Other kernel-hints get mapped to 0.
> OTOH, if device has 10 streams, first four kernel-hints will be mapped to
> non-zero values (65535 to 65532) and anything else would get turned to 0.

Well, I'm not sure if the mapping should happen in the NVME driver. In
future, there will be potentially more drivers supporting write hints and
we probably don't want each of them to replicate the mapping behavior. So
IMO the mapping should rather belong to the block layer...

								Honza
Dave Chinner Feb. 5, 2019, 10:53 p.m. UTC | #11
On Tue, Feb 05, 2019 at 12:50:48PM +0100, Jan Kara wrote:
> On Wed 30-01-19 19:24:39, Kanchan Joshi wrote:
> > 
> > On Wednesday 30 January 2019 05:43 AM, Dave Chinner wrote:
> > > On Tue, Jan 29, 2019 at 11:07:02AM +0100, Jan Kara wrote:
> > > > On Mon 28-01-19 16:24:24, Keith Busch wrote:
> > > > > On Mon, Jan 28, 2019 at 04:47:09AM -0800, Jan Kara wrote:
> > > > > > On Fri 25-01-19 09:23:53, Keith Busch wrote:
> > > > > > > On Wed, Jan 09, 2019 at 09:00:57PM +0530, Kanchan Joshi wrote:
> > > > > > > > Towards supporing write-hints/streams for filesystem journal.
> > > > > > > > Here is the v1 patch for background -
> > > > > > > > https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2
> > > > > > > > Changes since v1:
> > > > > > > > - introduce four more hints for in-kernel use, as recommended by Dave chinner
> > > > > > > >    & Jens axboe. This isolates kernel-mode hints from user-mode ones.
> > > > > > > 
> > > > > > > The nvme driver disables streams if the controller doesn't support
> > > > > > > BLK_MAX_WRITE_HINT number of streams, so this series breaks the feature
> > > > > > > for controllers that only support up to 4.
> > > > > > 
> > > > > > Right. Do you know if there are such controllers? Or are you just afraid
> > > > > > that there could be?
> > > > > 
> > > > > I've asked around, and the concensus I received is all currently support
> > > > > at least 8, but they couldn't say if that would be true for potential
> > > > > lower budget products. Can we implement a reasonable fallback to use
> > > > > what's available?
> > > > 
> > > > OK, thanks for input. So probably we should just map kernel stream IDs to 0
> > > > if the device doesn't support them. But that probably means we need to
> > > > propagate number of available streams up from NVME into the block layer so
> > > > that this can be handled reasonably seamlessly. Jens, Kanchan?
> > > 
> > > Yeah, that's basically what I said we needed to do when this was
> > > last discussed. i.e. that the block layer needed to know how many
> > > streams the hardware had and map the 4 "kernel internal" hints
> > > appropriately to what he device supports.
> > > 
> > > e.g. if the device only supports 4 hints, then it needs to map the
> > > kernel hints either to zero. If it supports less than 8 streams,
> > > then they need otbe mapped into the hints above index 5. If there
> > > are N streams, then they need to be mapped to the hints {N-3,N}
> > > 
> > > And, to top it all off, there needs to be guards so that if we want
> > > to grow the userspace hints to more than 4 hints, they don't crash
> > > into ranges the kernel is already reserving because of limited
> > > device range support.
> > > 
> > > Nothing is ever simple....
> > > 
> > Thanks all for feedback.
> > user-hints, when they reach to kernel via fcntl path, are sanity-checked
> > (rw_hint_valid function).
> > Currently streams are enabled when nvme driver is made to run with "streams
> > =1" option, while stream users always pass some write-hint, without
> > bothering whether streams (and how many of those) are operational or not.
> > This keeps configuration simple for stream users. Second, block layer does
> > not translate write-hint to stream-number, rather it is done inside nvme
> > driver. I suppose I should keep both these properties intact.
> > And considering all the suggestions, this is the plan for V3 -
> > 
> > [In block layer]
> > 1. Introduce one macro "KERN_WRITE_HINT_MIN" which will take the value
> > "user_hint_cnt + 1".
> > FS code will use this value (onwards) to define their own streams.
> > 
> > 2. Introduce another macro "BLK_MAX_KERNEL_WRITE_HINTS" which will be set to
> > 4 for now.
> > 
> > [In nvme driver]
> > 1. Continue working as before if device supports just 4 streams. All these
> > streams are used by user-hints, and kernel-hints are translated to 0.
> > 
> > 2. If device supports any more than 4 streams, those will be mapped to serve
> > kernel-hints, starting from KERN_WRITE_HINT_MIN onwards.
> > For example, if device has 6 streams, four streams (numbers = 1,2,3,4) will
> > be used to serve user-hints and two streams ( numbers = 65535, 65534) will
> > be used to serve first two kernel hints. Other kernel-hints get mapped to 0.
> > OTOH, if device has 10 streams, first four kernel-hints will be mapped to
> > non-zero values (65535 to 65532) and anything else would get turned to 0.
> 
> Well, I'm not sure if the mapping should happen in the NVME driver. In
> future, there will be potentially more drivers supporting write hints and
> we probably don't want each of them to replicate the mapping behavior. So
> IMO the mapping should rather belong to the block layer...

*nod*

That's what I was suggesting. All the driver does is supply the
block layer with the number of hints it supports, and the block
layer does the rest. After all, this has to work with DM, MD, etc
so it really does need to bubble up from the driver to the block
layer so it can be handled appropriately by multi-device block
drivers. e.g. md raid might want to reserve a kernel channel for
itself (e.g. internal metadata) and so only present 7 channels to
the next layer up (4 user and 3 kernel)....

Cheers,

Dave.