Message ID | 20220402103250.68027-1-jefflexu@linux.alibaba.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fuse: avoid unnecessary spinlock bump | expand |
On Sat, Apr 02, 2022 at 06:32:50PM +0800, Jeffle Xu wrote: > Move dmap free worker kicker inside the critical region, so that extra > spinlock lock/unlock could be avoided. > > Suggested-by: Liu Jiang <gerry@linux.alibaba.com> > Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> > --- > fs/fuse/dax.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
On Sat, Apr 02, 2022 at 06:32:50PM +0800, Jeffle Xu wrote: > Move dmap free worker kicker inside the critical region, so that extra > spinlock lock/unlock could be avoided. > > Suggested-by: Liu Jiang <gerry@linux.alibaba.com> > Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Looks good to me. Have you done any testing to make sure nothing is broken. Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Vivek > --- > fs/fuse/dax.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c > index d7d3a7f06862..b9f8795d52c4 100644 > --- a/fs/fuse/dax.c > +++ b/fs/fuse/dax.c > @@ -138,9 +138,9 @@ static struct fuse_dax_mapping *alloc_dax_mapping(struct fuse_conn_dax *fcd) > WARN_ON(fcd->nr_free_ranges <= 0); > fcd->nr_free_ranges--; > } > + __kick_dmap_free_worker(fcd, 0); > spin_unlock(&fcd->lock); > > - kick_dmap_free_worker(fcd, 0); > return dmap; > } > > -- > 2.27.0 >
On 4/7/22 10:10 PM, Vivek Goyal wrote: > On Sat, Apr 02, 2022 at 06:32:50PM +0800, Jeffle Xu wrote: >> Move dmap free worker kicker inside the critical region, so that extra >> spinlock lock/unlock could be avoided. >> >> Suggested-by: Liu Jiang <gerry@linux.alibaba.com> >> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> > > Looks good to me. Have you done any testing to make sure nothing is > broken. xfstests -g quick shows no regression. The tested virtiofs is mounted with "dax=always".
On Fri, Apr 08, 2022 at 10:36:40AM +0800, JeffleXu wrote: > > > On 4/7/22 10:10 PM, Vivek Goyal wrote: > > On Sat, Apr 02, 2022 at 06:32:50PM +0800, Jeffle Xu wrote: > >> Move dmap free worker kicker inside the critical region, so that extra > >> spinlock lock/unlock could be avoided. > >> > >> Suggested-by: Liu Jiang <gerry@linux.alibaba.com> > >> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> > > > > Looks good to me. Have you done any testing to make sure nothing is > > broken. > > xfstests -g quick shows no regression. The tested virtiofs is mounted > with "dax=always". I think xfstests might not trigger reclaim. You probably will have to run something like blogbench with a small dax window like 1G so that heavy reclaim happens. For fun, I sometimes used to run it with a window of just say 16 dax ranges so that reclaim was so heavy that if there was a bug, it will show up. Thanks Vivek
On 4/8/22 7:25 PM, Vivek Goyal wrote: > On Fri, Apr 08, 2022 at 10:36:40AM +0800, JeffleXu wrote: >> >> >> On 4/7/22 10:10 PM, Vivek Goyal wrote: >>> On Sat, Apr 02, 2022 at 06:32:50PM +0800, Jeffle Xu wrote: >>>> Move dmap free worker kicker inside the critical region, so that extra >>>> spinlock lock/unlock could be avoided. >>>> >>>> Suggested-by: Liu Jiang <gerry@linux.alibaba.com> >>>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> >>> >>> Looks good to me. Have you done any testing to make sure nothing is >>> broken. >> >> xfstests -g quick shows no regression. The tested virtiofs is mounted >> with "dax=always". > > I think xfstests might not trigger reclaim. You probably will have to > run something like blogbench with a small dax window like 1G so that > heavy reclaim happens. Actually, I configured the DAX window to 8MB, i.e. 4 slots when running xfstests. Thus I think the reclaim path is most likely triggered. > > For fun, I sometimes used to run it with a window of just say 16 dax > ranges so that reclaim was so heavy that if there was a bug, it will > show up. > Yeah, my colleague had ever reported that a DAX window of 4KB will cause hang in our internal OS (which is 4.19, we back ported virtiofs to 4.19). But then I found that this issue doesn't exist in the latest upstream. The reason seems that in the upstream kernel, devm_memremap_pages() called in virtio_fs_setup_dax() will fail directly since the dax window (4KB) is not aligned with the sparse memory section.
On Fri, Apr 08, 2022 at 07:50:55PM +0800, JeffleXu wrote: > > > On 4/8/22 7:25 PM, Vivek Goyal wrote: > > On Fri, Apr 08, 2022 at 10:36:40AM +0800, JeffleXu wrote: > >> > >> > >> On 4/7/22 10:10 PM, Vivek Goyal wrote: > >>> On Sat, Apr 02, 2022 at 06:32:50PM +0800, Jeffle Xu wrote: > >>>> Move dmap free worker kicker inside the critical region, so that extra > >>>> spinlock lock/unlock could be avoided. > >>>> > >>>> Suggested-by: Liu Jiang <gerry@linux.alibaba.com> > >>>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> > >>> > >>> Looks good to me. Have you done any testing to make sure nothing is > >>> broken. > >> > >> xfstests -g quick shows no regression. The tested virtiofs is mounted > >> with "dax=always". > > > > I think xfstests might not trigger reclaim. You probably will have to > > run something like blogbench with a small dax window like 1G so that > > heavy reclaim happens. > > > Actually, I configured the DAX window to 8MB, i.e. 4 slots when running > xfstests. Thus I think the reclaim path is most likely triggered. > > > > > > For fun, I sometimes used to run it with a window of just say 16 dax > > ranges so that reclaim was so heavy that if there was a bug, it will > > show up. > > > > Yeah, my colleague had ever reported that a DAX window of 4KB will cause > hang in our internal OS (which is 4.19, we back ported virtiofs to > 4.19). But then I found that this issue doesn't exist in the latest > upstream. The reason seems that in the upstream kernel, > devm_memremap_pages() called in virtio_fs_setup_dax() will fail directly > since the dax window (4KB) is not aligned with the sparse memory section. Given our default chunk size is 2MB (FUSE_DAX_SHIFT), may be it is not a bad idea to enforce some minimum cache window size. IIRC, even one range is not enough. Minimum 2 are required for reclaim to not deadlock. Hence, I guess it is not a bad idea to check for cache window size and if it is too small, reject it and disable dax. Thanks Vivek
On 4/8/22 8:06 PM, Vivek Goyal wrote: > On Fri, Apr 08, 2022 at 07:50:55PM +0800, JeffleXu wrote: >> >> >> On 4/8/22 7:25 PM, Vivek Goyal wrote: >>> On Fri, Apr 08, 2022 at 10:36:40AM +0800, JeffleXu wrote: >>>> >>>> >>>> On 4/7/22 10:10 PM, Vivek Goyal wrote: >>>>> On Sat, Apr 02, 2022 at 06:32:50PM +0800, Jeffle Xu wrote: >>>>>> Move dmap free worker kicker inside the critical region, so that extra >>>>>> spinlock lock/unlock could be avoided. >>>>>> >>>>>> Suggested-by: Liu Jiang <gerry@linux.alibaba.com> >>>>>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> >>>>> >>>>> Looks good to me. Have you done any testing to make sure nothing is >>>>> broken. >>>> >>>> xfstests -g quick shows no regression. The tested virtiofs is mounted >>>> with "dax=always". >>> >>> I think xfstests might not trigger reclaim. You probably will have to >>> run something like blogbench with a small dax window like 1G so that >>> heavy reclaim happens. >> >> >> Actually, I configured the DAX window to 8MB, i.e. 4 slots when running >> xfstests. Thus I think the reclaim path is most likely triggered. >> >> >>> >>> For fun, I sometimes used to run it with a window of just say 16 dax >>> ranges so that reclaim was so heavy that if there was a bug, it will >>> show up. >>> >> >> Yeah, my colleague had ever reported that a DAX window of 4KB will cause >> hang in our internal OS (which is 4.19, we back ported virtiofs to >> 4.19). But then I found that this issue doesn't exist in the latest >> upstream. The reason seems that in the upstream kernel, >> devm_memremap_pages() called in virtio_fs_setup_dax() will fail directly >> since the dax window (4KB) is not aligned with the sparse memory section. > > Given our default chunk size is 2MB (FUSE_DAX_SHIFT), may be it is not > a bad idea to enforce some minimum cache window size. IIRC, even one > range is not enough. Minimum 2 are required for reclaim to not deadlock. Curiously, why minimum 1 range is not adequate? In which case minimum 2 are required?
On Mon, Apr 11, 2022 at 10:10:23AM +0800, JeffleXu wrote: > > > On 4/8/22 8:06 PM, Vivek Goyal wrote: > > On Fri, Apr 08, 2022 at 07:50:55PM +0800, JeffleXu wrote: > >> > >> > >> On 4/8/22 7:25 PM, Vivek Goyal wrote: > >>> On Fri, Apr 08, 2022 at 10:36:40AM +0800, JeffleXu wrote: > >>>> > >>>> > >>>> On 4/7/22 10:10 PM, Vivek Goyal wrote: > >>>>> On Sat, Apr 02, 2022 at 06:32:50PM +0800, Jeffle Xu wrote: > >>>>>> Move dmap free worker kicker inside the critical region, so that extra > >>>>>> spinlock lock/unlock could be avoided. > >>>>>> > >>>>>> Suggested-by: Liu Jiang <gerry@linux.alibaba.com> > >>>>>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> > >>>>> > >>>>> Looks good to me. Have you done any testing to make sure nothing is > >>>>> broken. > >>>> > >>>> xfstests -g quick shows no regression. The tested virtiofs is mounted > >>>> with "dax=always". > >>> > >>> I think xfstests might not trigger reclaim. You probably will have to > >>> run something like blogbench with a small dax window like 1G so that > >>> heavy reclaim happens. > >> > >> > >> Actually, I configured the DAX window to 8MB, i.e. 4 slots when running > >> xfstests. Thus I think the reclaim path is most likely triggered. > >> > >> > >>> > >>> For fun, I sometimes used to run it with a window of just say 16 dax > >>> ranges so that reclaim was so heavy that if there was a bug, it will > >>> show up. > >>> > >> > >> Yeah, my colleague had ever reported that a DAX window of 4KB will cause > >> hang in our internal OS (which is 4.19, we back ported virtiofs to > >> 4.19). But then I found that this issue doesn't exist in the latest > >> upstream. The reason seems that in the upstream kernel, > >> devm_memremap_pages() called in virtio_fs_setup_dax() will fail directly > >> since the dax window (4KB) is not aligned with the sparse memory section. > > > > Given our default chunk size is 2MB (FUSE_DAX_SHIFT), may be it is not > > a bad idea to enforce some minimum cache window size. IIRC, even one > > range is not enough. Minimum 2 are required for reclaim to not deadlock. > > Curiously, why minimum 1 range is not adequate? In which case minimum 2 > are required? Frankly speaking, right now I don't remember. I have vague memories of concluding in the past that 1 range is not sufficient. But if you like dive deeper, and try with one range and see if you can introduce deadlock. Thanks Vivek
On 4/11/22 7:52 PM, Vivek Goyal wrote: > On Mon, Apr 11, 2022 at 10:10:23AM +0800, JeffleXu wrote: >> >> >> On 4/8/22 8:06 PM, Vivek Goyal wrote: >>> On Fri, Apr 08, 2022 at 07:50:55PM +0800, JeffleXu wrote: >>>> >>>> >>>> On 4/8/22 7:25 PM, Vivek Goyal wrote: >>>>> On Fri, Apr 08, 2022 at 10:36:40AM +0800, JeffleXu wrote: >>>>>> >>>>>> >>>>>> On 4/7/22 10:10 PM, Vivek Goyal wrote: >>>>>>> On Sat, Apr 02, 2022 at 06:32:50PM +0800, Jeffle Xu wrote: >>>>>>>> Move dmap free worker kicker inside the critical region, so that extra >>>>>>>> spinlock lock/unlock could be avoided. >>>>>>>> >>>>>>>> Suggested-by: Liu Jiang <gerry@linux.alibaba.com> >>>>>>>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> >>>>>>> >>>>>>> Looks good to me. Have you done any testing to make sure nothing is >>>>>>> broken. >>>>>> >>>>>> xfstests -g quick shows no regression. The tested virtiofs is mounted >>>>>> with "dax=always". >>>>> >>>>> I think xfstests might not trigger reclaim. You probably will have to >>>>> run something like blogbench with a small dax window like 1G so that >>>>> heavy reclaim happens. >>>> >>>> >>>> Actually, I configured the DAX window to 8MB, i.e. 4 slots when running >>>> xfstests. Thus I think the reclaim path is most likely triggered. >>>> >>>> >>>>> >>>>> For fun, I sometimes used to run it with a window of just say 16 dax >>>>> ranges so that reclaim was so heavy that if there was a bug, it will >>>>> show up. >>>>> >>>> >>>> Yeah, my colleague had ever reported that a DAX window of 4KB will cause >>>> hang in our internal OS (which is 4.19, we back ported virtiofs to >>>> 4.19). But then I found that this issue doesn't exist in the latest >>>> upstream. The reason seems that in the upstream kernel, >>>> devm_memremap_pages() called in virtio_fs_setup_dax() will fail directly >>>> since the dax window (4KB) is not aligned with the sparse memory section. >>> >>> Given our default chunk size is 2MB (FUSE_DAX_SHIFT), may be it is not >>> a bad idea to enforce some minimum cache window size. IIRC, even one >>> range is not enough. Minimum 2 are required for reclaim to not deadlock. >> >> Curiously, why minimum 1 range is not adequate? In which case minimum 2 >> are required? > > Frankly speaking, right now I don't remember. I have vague memories > of concluding in the past that 1 range is not sufficient. But if you > like dive deeper, and try with one range and see if you can introduce > deadlock. > Alright, thanks.
On 4/11/22 13:54, JeffleXu wrote: > > > On 4/11/22 7:52 PM, Vivek Goyal wrote: >> On Mon, Apr 11, 2022 at 10:10:23AM +0800, JeffleXu wrote: >>> >>> >>> On 4/8/22 8:06 PM, Vivek Goyal wrote: >>> Curiously, why minimum 1 range is not adequate? In which case minimum 2 >>> are required? >> >> Frankly speaking, right now I don't remember. I have vague memories >> of concluding in the past that 1 range is not sufficient. But if you >> like dive deeper, and try with one range and see if you can introduce >> deadlock. >> > > Alright, thanks. > Out of interest, how are you testing this at all? A patch from Dharmendra had been merged last week into libfuse to let it know about flags2, as we need that for our patches. But we didn't update the FLAGS yet to add in DAX on the libfuse side. Is this used by virtio fs? Or is there another libfuse out there that should know about these flags (I think glusterfs has its own, but probably not using dax?). Also, testing is always good, although I don't see how Jeffs patch would be able break anything here. Thanks, Bernd
On Mon, Apr 11, 2022 at 03:20:05PM +0200, Bernd Schubert wrote: > > > On 4/11/22 13:54, JeffleXu wrote: > > > > > > On 4/11/22 7:52 PM, Vivek Goyal wrote: > > > On Mon, Apr 11, 2022 at 10:10:23AM +0800, JeffleXu wrote: > > > > > > > > > > > > On 4/8/22 8:06 PM, Vivek Goyal wrote: > > > > Curiously, why minimum 1 range is not adequate? In which case minimum 2 > > > > are required? > > > > > > Frankly speaking, right now I don't remember. I have vague memories > > > of concluding in the past that 1 range is not sufficient. But if you > > > like dive deeper, and try with one range and see if you can introduce > > > deadlock. > > > > > > > Alright, thanks. > > > > > Out of interest, how are you testing this at all? A patch from Dharmendra > had been merged last week into libfuse to let it know about flags2, as we > need that for our patches. But we didn't update the FLAGS yet to add in DAX > on the libfuse side. > > Is this used by virtio fs? Yes, idea is that this is used by virtiofs. Now looks like there are multiple implementations of virtiofs daemon and they are either not using libfuse or have forked off libfuse or created a new libfuse equivalent in rust etc. So as fuse kernel gets updated, people are updating their corresponding code as need be. For example, we have C version of virtiofsd in qemu. That has taken code from libfuse and built on top of it. BTW, C version of virtiofsd is deprecated now and not lot of new development is expected to take place there. Then there is rust version of virtiofsd where new development is taking place and which is replacement of C virtiofsd. https://gitlab.com/virtio-fs/virtiofsd This does not use libfuse at all. And I think other folks (like developers from Alibaba) have probably written their own implementation of virtiofsd. I am not sure what exactly are they using. I see there is rust crate for fuse. https://crates.io/crates/fuse And there is one in cloud-hypervisor project. https://github.com/cloud-hypervisor/fuse-backend-rs > Or is there another libfuse out there that should > know about these flags (I think glusterfs has its own, but probably not > using dax?). > So server side of fuse seem to be all fragmented to me. People have written their own implementations based on their needs. > Also, testing is always good, although I don't see how Jeffs patch would be > able break anything here. Agreed. I worry about testing constantly as well. qemu version of virtiofsd does not have DAX support yet. Rust version DAX support is also minimal. So for testing DAX, I have to rely on out of tree patches from qemu here if any changes in virtiofs client happen. https://gitlab.com/virtio-fs/qemu/-/tree/virtio-fs-dev Jeffle is probably relying on their own virtiofsd implementation for DAX testing. Thanks Vivek
On 4/11/22 10:00 PM, Vivek Goyal wrote: > On Mon, Apr 11, 2022 at 03:20:05PM +0200, Bernd Schubert wrote: > > So for testing DAX, I have to rely on out of tree patches from qemu > here if any changes in virtiofs client happen. > > https://gitlab.com/virtio-fs/qemu/-/tree/virtio-fs-dev > > Jeffle is probably relying on their own virtiofsd implementation for DAX > testing. > Actually I also use the C version virtiofsd in the above described repository for testing :)
On Sat, 2 Apr 2022 at 12:32, Jeffle Xu <jefflexu@linux.alibaba.com> wrote: > > Move dmap free worker kicker inside the critical region, so that extra > spinlock lock/unlock could be avoided. > > Suggested-by: Liu Jiang <gerry@linux.alibaba.com> > Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Thanks, applied. Miklos
diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c index d7d3a7f06862..b9f8795d52c4 100644 --- a/fs/fuse/dax.c +++ b/fs/fuse/dax.c @@ -138,9 +138,9 @@ static struct fuse_dax_mapping *alloc_dax_mapping(struct fuse_conn_dax *fcd) WARN_ON(fcd->nr_free_ranges <= 0); fcd->nr_free_ranges--; } + __kick_dmap_free_worker(fcd, 0); spin_unlock(&fcd->lock); - kick_dmap_free_worker(fcd, 0); return dmap; }
Move dmap free worker kicker inside the critical region, so that extra spinlock lock/unlock could be avoided. Suggested-by: Liu Jiang <gerry@linux.alibaba.com> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> --- fs/fuse/dax.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)