diff mbox series

fuse: avoid unnecessary spinlock bump

Message ID 20220402103250.68027-1-jefflexu@linux.alibaba.com (mailing list archive)
State New, archived
Headers show
Series fuse: avoid unnecessary spinlock bump | expand

Commit Message

Jingbo Xu April 2, 2022, 10:32 a.m. UTC
Move dmap free worker kicker inside the critical region, so that extra
spinlock lock/unlock could be avoided.

Suggested-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Stefan Hajnoczi April 4, 2022, 8:49 a.m. UTC | #1
On Sat, Apr 02, 2022 at 06:32:50PM +0800, Jeffle Xu wrote:
> Move dmap free worker kicker inside the critical region, so that extra
> spinlock lock/unlock could be avoided.
> 
> Suggested-by: Liu Jiang <gerry@linux.alibaba.com>
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/fuse/dax.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Vivek Goyal April 7, 2022, 2:10 p.m. UTC | #2
On Sat, Apr 02, 2022 at 06:32:50PM +0800, Jeffle Xu wrote:
> Move dmap free worker kicker inside the critical region, so that extra
> spinlock lock/unlock could be avoided.
> 
> Suggested-by: Liu Jiang <gerry@linux.alibaba.com>
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Looks good to me. Have you done any testing to make sure nothing is
broken.

Reviewed-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

> ---
>  fs/fuse/dax.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index d7d3a7f06862..b9f8795d52c4 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -138,9 +138,9 @@ static struct fuse_dax_mapping *alloc_dax_mapping(struct fuse_conn_dax *fcd)
>  		WARN_ON(fcd->nr_free_ranges <= 0);
>  		fcd->nr_free_ranges--;
>  	}
> +	__kick_dmap_free_worker(fcd, 0);
>  	spin_unlock(&fcd->lock);
>  
> -	kick_dmap_free_worker(fcd, 0);
>  	return dmap;
>  }
>  
> -- 
> 2.27.0
>
Jingbo Xu April 8, 2022, 2:36 a.m. UTC | #3
On 4/7/22 10:10 PM, Vivek Goyal wrote:
> On Sat, Apr 02, 2022 at 06:32:50PM +0800, Jeffle Xu wrote:
>> Move dmap free worker kicker inside the critical region, so that extra
>> spinlock lock/unlock could be avoided.
>>
>> Suggested-by: Liu Jiang <gerry@linux.alibaba.com>
>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> 
> Looks good to me. Have you done any testing to make sure nothing is
> broken.

xfstests -g quick shows no regression. The tested virtiofs is mounted
with "dax=always".
Vivek Goyal April 8, 2022, 11:25 a.m. UTC | #4
On Fri, Apr 08, 2022 at 10:36:40AM +0800, JeffleXu wrote:
> 
> 
> On 4/7/22 10:10 PM, Vivek Goyal wrote:
> > On Sat, Apr 02, 2022 at 06:32:50PM +0800, Jeffle Xu wrote:
> >> Move dmap free worker kicker inside the critical region, so that extra
> >> spinlock lock/unlock could be avoided.
> >>
> >> Suggested-by: Liu Jiang <gerry@linux.alibaba.com>
> >> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> > 
> > Looks good to me. Have you done any testing to make sure nothing is
> > broken.
> 
> xfstests -g quick shows no regression. The tested virtiofs is mounted
> with "dax=always".

I think xfstests might not trigger reclaim. You probably will have to
run something like blogbench with a small dax window like 1G so that
heavy reclaim happens.

For fun, I sometimes used to run it with a window of just say 16 dax
ranges so that reclaim was so heavy that if there was a bug, it will
show up.

Thanks
Vivek
Jingbo Xu April 8, 2022, 11:50 a.m. UTC | #5
On 4/8/22 7:25 PM, Vivek Goyal wrote:
> On Fri, Apr 08, 2022 at 10:36:40AM +0800, JeffleXu wrote:
>>
>>
>> On 4/7/22 10:10 PM, Vivek Goyal wrote:
>>> On Sat, Apr 02, 2022 at 06:32:50PM +0800, Jeffle Xu wrote:
>>>> Move dmap free worker kicker inside the critical region, so that extra
>>>> spinlock lock/unlock could be avoided.
>>>>
>>>> Suggested-by: Liu Jiang <gerry@linux.alibaba.com>
>>>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>>>
>>> Looks good to me. Have you done any testing to make sure nothing is
>>> broken.
>>
>> xfstests -g quick shows no regression. The tested virtiofs is mounted
>> with "dax=always".
> 
> I think xfstests might not trigger reclaim. You probably will have to
> run something like blogbench with a small dax window like 1G so that
> heavy reclaim happens.


Actually, I configured the DAX window to 8MB, i.e. 4 slots when running
xfstests. Thus I think the reclaim path is most likely triggered.


> 
> For fun, I sometimes used to run it with a window of just say 16 dax
> ranges so that reclaim was so heavy that if there was a bug, it will
> show up.
> 

Yeah, my colleague had ever reported that a DAX window of 4KB will cause
hang in our internal OS (which is 4.19, we back ported virtiofs to
4.19). But then I found that this issue doesn't exist in the latest
upstream. The reason seems that in the upstream kernel,
devm_memremap_pages() called in virtio_fs_setup_dax() will fail directly
since the dax window (4KB) is not aligned with the sparse memory section.
Vivek Goyal April 8, 2022, 12:06 p.m. UTC | #6
On Fri, Apr 08, 2022 at 07:50:55PM +0800, JeffleXu wrote:
> 
> 
> On 4/8/22 7:25 PM, Vivek Goyal wrote:
> > On Fri, Apr 08, 2022 at 10:36:40AM +0800, JeffleXu wrote:
> >>
> >>
> >> On 4/7/22 10:10 PM, Vivek Goyal wrote:
> >>> On Sat, Apr 02, 2022 at 06:32:50PM +0800, Jeffle Xu wrote:
> >>>> Move dmap free worker kicker inside the critical region, so that extra
> >>>> spinlock lock/unlock could be avoided.
> >>>>
> >>>> Suggested-by: Liu Jiang <gerry@linux.alibaba.com>
> >>>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> >>>
> >>> Looks good to me. Have you done any testing to make sure nothing is
> >>> broken.
> >>
> >> xfstests -g quick shows no regression. The tested virtiofs is mounted
> >> with "dax=always".
> > 
> > I think xfstests might not trigger reclaim. You probably will have to
> > run something like blogbench with a small dax window like 1G so that
> > heavy reclaim happens.
> 
> 
> Actually, I configured the DAX window to 8MB, i.e. 4 slots when running
> xfstests. Thus I think the reclaim path is most likely triggered.
> 
> 
> > 
> > For fun, I sometimes used to run it with a window of just say 16 dax
> > ranges so that reclaim was so heavy that if there was a bug, it will
> > show up.
> > 
> 
> Yeah, my colleague had ever reported that a DAX window of 4KB will cause
> hang in our internal OS (which is 4.19, we back ported virtiofs to
> 4.19). But then I found that this issue doesn't exist in the latest
> upstream. The reason seems that in the upstream kernel,
> devm_memremap_pages() called in virtio_fs_setup_dax() will fail directly
> since the dax window (4KB) is not aligned with the sparse memory section.

Given our default chunk size is 2MB (FUSE_DAX_SHIFT), may be it is not
a bad idea to enforce some minimum cache window size. IIRC, even one
range is not enough. Minimum 2 are required for reclaim to not deadlock.

Hence, I guess it is not a bad idea to check for cache window size and
if it is too small, reject it and disable dax.

Thanks
Vivek
Jingbo Xu April 11, 2022, 2:10 a.m. UTC | #7
On 4/8/22 8:06 PM, Vivek Goyal wrote:
> On Fri, Apr 08, 2022 at 07:50:55PM +0800, JeffleXu wrote:
>>
>>
>> On 4/8/22 7:25 PM, Vivek Goyal wrote:
>>> On Fri, Apr 08, 2022 at 10:36:40AM +0800, JeffleXu wrote:
>>>>
>>>>
>>>> On 4/7/22 10:10 PM, Vivek Goyal wrote:
>>>>> On Sat, Apr 02, 2022 at 06:32:50PM +0800, Jeffle Xu wrote:
>>>>>> Move dmap free worker kicker inside the critical region, so that extra
>>>>>> spinlock lock/unlock could be avoided.
>>>>>>
>>>>>> Suggested-by: Liu Jiang <gerry@linux.alibaba.com>
>>>>>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>>>>>
>>>>> Looks good to me. Have you done any testing to make sure nothing is
>>>>> broken.
>>>>
>>>> xfstests -g quick shows no regression. The tested virtiofs is mounted
>>>> with "dax=always".
>>>
>>> I think xfstests might not trigger reclaim. You probably will have to
>>> run something like blogbench with a small dax window like 1G so that
>>> heavy reclaim happens.
>>
>>
>> Actually, I configured the DAX window to 8MB, i.e. 4 slots when running
>> xfstests. Thus I think the reclaim path is most likely triggered.
>>
>>
>>>
>>> For fun, I sometimes used to run it with a window of just say 16 dax
>>> ranges so that reclaim was so heavy that if there was a bug, it will
>>> show up.
>>>
>>
>> Yeah, my colleague had ever reported that a DAX window of 4KB will cause
>> hang in our internal OS (which is 4.19, we back ported virtiofs to
>> 4.19). But then I found that this issue doesn't exist in the latest
>> upstream. The reason seems that in the upstream kernel,
>> devm_memremap_pages() called in virtio_fs_setup_dax() will fail directly
>> since the dax window (4KB) is not aligned with the sparse memory section.
> 
> Given our default chunk size is 2MB (FUSE_DAX_SHIFT), may be it is not
> a bad idea to enforce some minimum cache window size. IIRC, even one
> range is not enough. Minimum 2 are required for reclaim to not deadlock.

Curiously, why minimum 1 range is not adequate? In which case minimum 2
are required?
Vivek Goyal April 11, 2022, 11:52 a.m. UTC | #8
On Mon, Apr 11, 2022 at 10:10:23AM +0800, JeffleXu wrote:
> 
> 
> On 4/8/22 8:06 PM, Vivek Goyal wrote:
> > On Fri, Apr 08, 2022 at 07:50:55PM +0800, JeffleXu wrote:
> >>
> >>
> >> On 4/8/22 7:25 PM, Vivek Goyal wrote:
> >>> On Fri, Apr 08, 2022 at 10:36:40AM +0800, JeffleXu wrote:
> >>>>
> >>>>
> >>>> On 4/7/22 10:10 PM, Vivek Goyal wrote:
> >>>>> On Sat, Apr 02, 2022 at 06:32:50PM +0800, Jeffle Xu wrote:
> >>>>>> Move dmap free worker kicker inside the critical region, so that extra
> >>>>>> spinlock lock/unlock could be avoided.
> >>>>>>
> >>>>>> Suggested-by: Liu Jiang <gerry@linux.alibaba.com>
> >>>>>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> >>>>>
> >>>>> Looks good to me. Have you done any testing to make sure nothing is
> >>>>> broken.
> >>>>
> >>>> xfstests -g quick shows no regression. The tested virtiofs is mounted
> >>>> with "dax=always".
> >>>
> >>> I think xfstests might not trigger reclaim. You probably will have to
> >>> run something like blogbench with a small dax window like 1G so that
> >>> heavy reclaim happens.
> >>
> >>
> >> Actually, I configured the DAX window to 8MB, i.e. 4 slots when running
> >> xfstests. Thus I think the reclaim path is most likely triggered.
> >>
> >>
> >>>
> >>> For fun, I sometimes used to run it with a window of just say 16 dax
> >>> ranges so that reclaim was so heavy that if there was a bug, it will
> >>> show up.
> >>>
> >>
> >> Yeah, my colleague had ever reported that a DAX window of 4KB will cause
> >> hang in our internal OS (which is 4.19, we back ported virtiofs to
> >> 4.19). But then I found that this issue doesn't exist in the latest
> >> upstream. The reason seems that in the upstream kernel,
> >> devm_memremap_pages() called in virtio_fs_setup_dax() will fail directly
> >> since the dax window (4KB) is not aligned with the sparse memory section.
> > 
> > Given our default chunk size is 2MB (FUSE_DAX_SHIFT), may be it is not
> > a bad idea to enforce some minimum cache window size. IIRC, even one
> > range is not enough. Minimum 2 are required for reclaim to not deadlock.
> 
> Curiously, why minimum 1 range is not adequate? In which case minimum 2
> are required?

Frankly speaking, right now I don't remember. I have vague memories
of concluding in the past that 1 range is not sufficient. But if you
like dive deeper, and try with one range and see if you can introduce
deadlock.

Thanks
Vivek
Jingbo Xu April 11, 2022, 11:54 a.m. UTC | #9
On 4/11/22 7:52 PM, Vivek Goyal wrote:
> On Mon, Apr 11, 2022 at 10:10:23AM +0800, JeffleXu wrote:
>>
>>
>> On 4/8/22 8:06 PM, Vivek Goyal wrote:
>>> On Fri, Apr 08, 2022 at 07:50:55PM +0800, JeffleXu wrote:
>>>>
>>>>
>>>> On 4/8/22 7:25 PM, Vivek Goyal wrote:
>>>>> On Fri, Apr 08, 2022 at 10:36:40AM +0800, JeffleXu wrote:
>>>>>>
>>>>>>
>>>>>> On 4/7/22 10:10 PM, Vivek Goyal wrote:
>>>>>>> On Sat, Apr 02, 2022 at 06:32:50PM +0800, Jeffle Xu wrote:
>>>>>>>> Move dmap free worker kicker inside the critical region, so that extra
>>>>>>>> spinlock lock/unlock could be avoided.
>>>>>>>>
>>>>>>>> Suggested-by: Liu Jiang <gerry@linux.alibaba.com>
>>>>>>>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>>>>>>>
>>>>>>> Looks good to me. Have you done any testing to make sure nothing is
>>>>>>> broken.
>>>>>>
>>>>>> xfstests -g quick shows no regression. The tested virtiofs is mounted
>>>>>> with "dax=always".
>>>>>
>>>>> I think xfstests might not trigger reclaim. You probably will have to
>>>>> run something like blogbench with a small dax window like 1G so that
>>>>> heavy reclaim happens.
>>>>
>>>>
>>>> Actually, I configured the DAX window to 8MB, i.e. 4 slots when running
>>>> xfstests. Thus I think the reclaim path is most likely triggered.
>>>>
>>>>
>>>>>
>>>>> For fun, I sometimes used to run it with a window of just say 16 dax
>>>>> ranges so that reclaim was so heavy that if there was a bug, it will
>>>>> show up.
>>>>>
>>>>
>>>> Yeah, my colleague had ever reported that a DAX window of 4KB will cause
>>>> hang in our internal OS (which is 4.19, we back ported virtiofs to
>>>> 4.19). But then I found that this issue doesn't exist in the latest
>>>> upstream. The reason seems that in the upstream kernel,
>>>> devm_memremap_pages() called in virtio_fs_setup_dax() will fail directly
>>>> since the dax window (4KB) is not aligned with the sparse memory section.
>>>
>>> Given our default chunk size is 2MB (FUSE_DAX_SHIFT), may be it is not
>>> a bad idea to enforce some minimum cache window size. IIRC, even one
>>> range is not enough. Minimum 2 are required for reclaim to not deadlock.
>>
>> Curiously, why minimum 1 range is not adequate? In which case minimum 2
>> are required?
> 
> Frankly speaking, right now I don't remember. I have vague memories
> of concluding in the past that 1 range is not sufficient. But if you
> like dive deeper, and try with one range and see if you can introduce
> deadlock.
> 

Alright, thanks.
Bernd Schubert April 11, 2022, 1:20 p.m. UTC | #10
On 4/11/22 13:54, JeffleXu wrote:
> 
> 
> On 4/11/22 7:52 PM, Vivek Goyal wrote:
>> On Mon, Apr 11, 2022 at 10:10:23AM +0800, JeffleXu wrote:
>>>
>>>
>>> On 4/8/22 8:06 PM, Vivek Goyal wrote:
>>> Curiously, why minimum 1 range is not adequate? In which case minimum 2
>>> are required?
>>
>> Frankly speaking, right now I don't remember. I have vague memories
>> of concluding in the past that 1 range is not sufficient. But if you
>> like dive deeper, and try with one range and see if you can introduce
>> deadlock.
>>
> 
> Alright, thanks.
> 


Out of interest, how are you testing this at all? A patch from 
Dharmendra had been merged last week into libfuse to let it know about 
flags2, as we need that for our patches. But we didn't update the FLAGS 
yet to add in DAX on the libfuse side.

Is this used by virtio fs? Or is there another libfuse out there that 
should know about these flags (I think glusterfs has its own, but 
probably not using dax?).

Also, testing is always good, although I don't see how Jeffs patch would 
be able break anything here.



Thanks,
Bernd
Vivek Goyal April 11, 2022, 2 p.m. UTC | #11
On Mon, Apr 11, 2022 at 03:20:05PM +0200, Bernd Schubert wrote:
> 
> 
> On 4/11/22 13:54, JeffleXu wrote:
> > 
> > 
> > On 4/11/22 7:52 PM, Vivek Goyal wrote:
> > > On Mon, Apr 11, 2022 at 10:10:23AM +0800, JeffleXu wrote:
> > > > 
> > > > 
> > > > On 4/8/22 8:06 PM, Vivek Goyal wrote:
> > > > Curiously, why minimum 1 range is not adequate? In which case minimum 2
> > > > are required?
> > > 
> > > Frankly speaking, right now I don't remember. I have vague memories
> > > of concluding in the past that 1 range is not sufficient. But if you
> > > like dive deeper, and try with one range and see if you can introduce
> > > deadlock.
> > > 
> > 
> > Alright, thanks.
> > 
> 
> 
> Out of interest, how are you testing this at all? A patch from Dharmendra
> had been merged last week into libfuse to let it know about flags2, as we
> need that for our patches. But we didn't update the FLAGS yet to add in DAX
> on the libfuse side.
> 
> Is this used by virtio fs?

Yes, idea is that this is used by virtiofs. Now looks like there are
multiple implementations of virtiofs daemon and they are either not
using libfuse or have forked off libfuse or created a new libfuse
equivalent in rust etc. So as fuse kernel gets updated, people are
updating their corresponding code as need be.

For example, we have C version of virtiofsd in qemu. That has taken
code from libfuse and built on top of it. BTW, C version of virtiofsd
is deprecated now and not lot of new development is expected to take
place there.

Then there is rust version of virtiofsd where new development is taking
place and which is replacement of C virtiofsd.

https://gitlab.com/virtio-fs/virtiofsd

This does not use libfuse at all.

And I think other folks (like developers from Alibaba) have probably 
written their own implementation of virtiofsd. I am not sure what
exactly are they using.

I see there is rust crate for fuse.

https://crates.io/crates/fuse

And there is one in cloud-hypervisor project.

https://github.com/cloud-hypervisor/fuse-backend-rs


> Or is there another libfuse out there that should
> know about these flags (I think glusterfs has its own, but probably not
> using dax?).
> 

So server side of fuse seem to be all fragmented to me. People have
written their own implementations based on their needs.

> Also, testing is always good, although I don't see how Jeffs patch would be
> able break anything here.

Agreed. I worry about testing constantly as well. qemu version of
virtiofsd does not have DAX support yet. Rust version DAX support is
also minimal. 

So for testing DAX, I have to rely on out of tree patches from qemu
here if any changes in virtiofs client happen.

https://gitlab.com/virtio-fs/qemu/-/tree/virtio-fs-dev

Jeffle is probably relying on their own virtiofsd implementation for DAX
testing.

Thanks
Vivek
Jingbo Xu April 13, 2022, 3:09 a.m. UTC | #12
On 4/11/22 10:00 PM, Vivek Goyal wrote:
> On Mon, Apr 11, 2022 at 03:20:05PM +0200, Bernd Schubert wrote:
> 
> So for testing DAX, I have to rely on out of tree patches from qemu
> here if any changes in virtiofs client happen.
> 
> https://gitlab.com/virtio-fs/qemu/-/tree/virtio-fs-dev
> 
> Jeffle is probably relying on their own virtiofsd implementation for DAX
> testing.
> 

Actually I also use the C version virtiofsd in the above described
repository for testing :)
Miklos Szeredi April 22, 2022, 1:36 p.m. UTC | #13
On Sat, 2 Apr 2022 at 12:32, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>
> Move dmap free worker kicker inside the critical region, so that extra
> spinlock lock/unlock could be avoided.
>
> Suggested-by: Liu Jiang <gerry@linux.alibaba.com>
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Thanks, applied.

Miklos
diff mbox series

Patch

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index d7d3a7f06862..b9f8795d52c4 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -138,9 +138,9 @@  static struct fuse_dax_mapping *alloc_dax_mapping(struct fuse_conn_dax *fcd)
 		WARN_ON(fcd->nr_free_ranges <= 0);
 		fcd->nr_free_ranges--;
 	}
+	__kick_dmap_free_worker(fcd, 0);
 	spin_unlock(&fcd->lock);
 
-	kick_dmap_free_worker(fcd, 0);
 	return dmap;
 }