Message ID | 56C02E4F.6030303@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Jevon Qiao <scaleqiao@gmail.com> writes: > The following patch is to fix alignment issue when host filesystem block > size > is larger than client msize. > > Thanks, > Jevon That is not the right format to send patch. You can send them as a series using git-send-email. > > From: Jevon Qiao <scaleqiao@gmail.com> > Date: Sun, 14 Feb 2016 15:11:08 +0800 > Subject: [PATCH] hw/9pfs: fix alignment issue when host filesystem block > size > is larger than client msize. > > Per the previous implementation, iounit will be assigned to be 0 after the > first if statement as (s->msize - P9_IOHDRSZ)/stbuf.f_bsize will be zero > when > host filesystem block size is larger than msize. Finally, iounit will be > equal > to s->msize - P9_IOHDRSZ, which is usually not aligned. > > Signed-off-by: Jevon Qiao <scaleqiao@gmail.com> > --- > hw/9pfs/virtio-9p.c | 19 ++++++++++++++++--- > 1 file changed, 16 insertions(+), 3 deletions(-) > > diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c > index f972731..005d3a8 100644 > --- a/hw/9pfs/virtio-9p.c > +++ b/hw/9pfs/virtio-9p.c > @@ -1326,7 +1326,7 @@ out_nofid: > static int32_t get_iounit(V9fsPDU *pdu, V9fsPath *path) > { > struct statfs stbuf; > - int32_t iounit = 0; > + int32_t iounit = 0, unit = 0; > V9fsState *s = pdu->s; > > /* > @@ -1334,8 +1334,21 @@ static int32_t get_iounit(V9fsPDU *pdu, V9fsPath > *path) > * and as well as less than (client msize - P9_IOHDRSZ)) > */ > if (!v9fs_co_statfs(pdu, path, &stbuf)) { > - iounit = stbuf.f_bsize; > - iounit *= (s->msize - P9_IOHDRSZ)/stbuf.f_bsize; > + /* > + * If host filesystem block size is larger than client msize, > + * we will use PAGESIZE as the unit. The reason why we choose > + * PAGESIZE is because the data will be splitted in terms of > + * PAGESIZE in the virtio layer. In this case, the final > + * iounit is equal to the value of ((msize/unit) - 1) * unit. > + */ > + if (stbuf.f_bsize > s->msize) { > + iounit = 4096; > + unit = 4096; What page size it should be guest or host ?. Also why 4096 ?. ppc64 use 64K page size. > + } else { > + iounit = stbuf.f_bsize; > + unit = stbuf.f_bsize; > + } > + iounit *= (s->msize - P9_IOHDRSZ)/unit; > } > if (!iounit) { > iounit = s->msize - P9_IOHDRSZ; > -- -aneesh
Hi Aneesh, Thank you for reviewing my code, please see my reply in-line. On 14/2/16 21:38, Aneesh Kumar K.V wrote: > Jevon Qiao <scaleqiao@gmail.com> writes: > >> The following patch is to fix alignment issue when host filesystem block >> size >> is larger than client msize. >> >> Thanks, >> Jevon > That is not the right format to send patch. You can send them as a > series using git-send-email. Yes, you're correct. I will send the patches later after I address all the technical comments. >> From: Jevon Qiao <scaleqiao@gmail.com> >> Date: Sun, 14 Feb 2016 15:11:08 +0800 >> Subject: [PATCH] hw/9pfs: fix alignment issue when host filesystem block >> size >> is larger than client msize. >> >> Per the previous implementation, iounit will be assigned to be 0 after the >> first if statement as (s->msize - P9_IOHDRSZ)/stbuf.f_bsize will be zero >> when >> host filesystem block size is larger than msize. Finally, iounit will be >> equal >> to s->msize - P9_IOHDRSZ, which is usually not aligned. >> >> Signed-off-by: Jevon Qiao <scaleqiao@gmail.com> >> --- >> hw/9pfs/virtio-9p.c | 19 ++++++++++++++++--- >> 1 file changed, 16 insertions(+), 3 deletions(-) >> >> diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c >> index f972731..005d3a8 100644 >> --- a/hw/9pfs/virtio-9p.c >> +++ b/hw/9pfs/virtio-9p.c >> @@ -1326,7 +1326,7 @@ out_nofid: >> static int32_t get_iounit(V9fsPDU *pdu, V9fsPath *path) >> { >> struct statfs stbuf; >> - int32_t iounit = 0; >> + int32_t iounit = 0, unit = 0; >> V9fsState *s = pdu->s; >> >> /* >> @@ -1334,8 +1334,21 @@ static int32_t get_iounit(V9fsPDU *pdu, V9fsPath >> *path) >> * and as well as less than (client msize - P9_IOHDRSZ)) >> */ >> if (!v9fs_co_statfs(pdu, path, &stbuf)) { >> - iounit = stbuf.f_bsize; >> - iounit *= (s->msize - P9_IOHDRSZ)/stbuf.f_bsize; >> + /* >> + * If host filesystem block size is larger than client msize, >> + * we will use PAGESIZE as the unit. The reason why we choose >> + * PAGESIZE is because the data will be splitted in terms of >> + * PAGESIZE in the virtio layer. In this case, the final >> + * iounit is equal to the value of ((msize/unit) - 1) * unit. >> + */ >> + if (stbuf.f_bsize > s->msize) { >> + iounit = 4096; >> + unit = 4096; > What page size it should be guest or host ?. Also why 4096 ?. ppc64 use > 64K page size. The data to be read or written will be divided into pieces according to the size of iounit and msize firstly, and then mapped to pages before being added into virtqueue. Since all these operations happen in the guest side, so the page size should be guest. Please correct me if I'm wrong. As for the number 4096, It's the default value in Linux OS. I did not take other platforms into account, it's my fault. To make it suitable for all platforms, shall I use the function getpagesize() here? Thanks, Jevon >> + } else { >> + iounit = stbuf.f_bsize; >> + unit = stbuf.f_bsize; >> + } >> + iounit *= (s->msize - P9_IOHDRSZ)/unit; >> } >> if (!iounit) { >> iounit = s->msize - P9_IOHDRSZ; >> -- > -aneesh >
On Wed, 17 Feb 2016 15:14:48 +0800 Jevon Qiao <scaleqiao@gmail.com> wrote: > Hi Aneesh, > > Thank you for reviewing my code, please see my reply in-line. Jevon, Please read comments below. > On 14/2/16 21:38, Aneesh Kumar K.V wrote: > > Jevon Qiao <scaleqiao@gmail.com> writes: > > > >> The following patch is to fix alignment issue when host filesystem block > >> size > >> is larger than client msize. > >> > >> Thanks, > >> Jevon > > That is not the right format to send patch. You can send them as a > > series using git-send-email. > Yes, you're correct. I will send the patches later after I address all > the technical comments. > >> From: Jevon Qiao <scaleqiao@gmail.com> > >> Date: Sun, 14 Feb 2016 15:11:08 +0800 > >> Subject: [PATCH] hw/9pfs: fix alignment issue when host filesystem block > >> size > >> is larger than client msize. > >> > >> Per the previous implementation, iounit will be assigned to be 0 after the > >> first if statement as (s->msize - P9_IOHDRSZ)/stbuf.f_bsize will be zero > >> when > >> host filesystem block size is larger than msize. Finally, iounit will be > >> equal > >> to s->msize - P9_IOHDRSZ, which is usually not aligned. > >> > >> Signed-off-by: Jevon Qiao <scaleqiao@gmail.com> > >> --- > >> hw/9pfs/virtio-9p.c | 19 ++++++++++++++++--- Hmmm I just realize your tree is not up-to-date since hw/9pfs/virtio-9p.c got renamed with this recent commit: commit 60ce86c7140d5ca33d5fd87ce821681165d06b2a Author: Wei Liu <wei.liu2@citrix.com> Date: Thu Jan 7 18:42:20 2016 +0000 9pfs: rename virtio-9p.c to 9p.c Also 9p.c only contains generic code now, not related to virtio... see below. > >> 1 file changed, 16 insertions(+), 3 deletions(-) > >> > >> diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c > >> index f972731..005d3a8 100644 > >> --- a/hw/9pfs/virtio-9p.c > >> +++ b/hw/9pfs/virtio-9p.c > >> @@ -1326,7 +1326,7 @@ out_nofid: > >> static int32_t get_iounit(V9fsPDU *pdu, V9fsPath *path) > >> { > >> struct statfs stbuf; > >> - int32_t iounit = 0; > >> + int32_t iounit = 0, unit = 0; > >> V9fsState *s = pdu->s; > >> > >> /* > >> @@ -1334,8 +1334,21 @@ static int32_t get_iounit(V9fsPDU *pdu, V9fsPath > >> *path) > >> * and as well as less than (client msize - P9_IOHDRSZ)) > >> */ > >> if (!v9fs_co_statfs(pdu, path, &stbuf)) { > >> - iounit = stbuf.f_bsize; > >> - iounit *= (s->msize - P9_IOHDRSZ)/stbuf.f_bsize; > >> + /* > >> + * If host filesystem block size is larger than client msize, > >> + * we will use PAGESIZE as the unit. The reason why we choose > >> + * PAGESIZE is because the data will be splitted in terms of > >> + * PAGESIZE in the virtio layer. In this case, the final ... and here you mention virtio. Does this code really belong here ? > >> + * iounit is equal to the value of ((msize/unit) - 1) * unit. > >> + */ > >> + if (stbuf.f_bsize > s->msize) { > >> + iounit = 4096; > >> + unit = 4096; This looks weird when reading the initial comment in get_iounit()... is iounit a multiple of stbuf.f_bsize in this case ? > > What page size it should be guest or host ?. Also why 4096 ?. ppc64 use > > 64K page size. > The data to be read or written will be divided into pieces according to the > size of iounit and msize firstly, and then mapped to pages before being > added > into virtqueue. Since all these operations happen in the guest side, so the > page size should be guest. Please correct me if I'm wrong. > > As for the number 4096, It's the default value in Linux OS. I did not take > other platforms into account, it's my fault. To make it suitable for all > platforms, > shall I use the function getpagesize() here? > getpagesize() will return the host page size. If you need the guest page size, you should use TARGET_PAGE_SIZE. And then you will hit another problem: the 9p.c file is in common-obj and cannot contain target specific code... Along with the other remark, I'm beginning to think you may need to move this to virtio-9p-device.c. > Thanks, > Jevon > >> + } else { > >> + iounit = stbuf.f_bsize; > >> + unit = stbuf.f_bsize; > >> + } > >> + iounit *= (s->msize - P9_IOHDRSZ)/unit; > >> } > >> if (!iounit) { > >> iounit = s->msize - P9_IOHDRSZ; > >> -- > > -aneesh > > >
Jevon Qiao <scaleqiao@gmail.com> writes: > Hi Aneesh, > > Thank you for reviewing my code, please see my reply in-line. > On 14/2/16 21:38, Aneesh Kumar K.V wrote: >> Jevon Qiao <scaleqiao@gmail.com> writes: >> >>> The following patch is to fix alignment issue when host filesystem block >>> size >>> is larger than client msize. >>> >>> Thanks, >>> Jevon >> That is not the right format to send patch. You can send them as a >> series using git-send-email. > Yes, you're correct. I will send the patches later after I address all > the technical comments. >>> From: Jevon Qiao <scaleqiao@gmail.com> >>> Date: Sun, 14 Feb 2016 15:11:08 +0800 >>> Subject: [PATCH] hw/9pfs: fix alignment issue when host filesystem block >>> size >>> is larger than client msize. >>> >>> Per the previous implementation, iounit will be assigned to be 0 after the >>> first if statement as (s->msize - P9_IOHDRSZ)/stbuf.f_bsize will be zero >>> when >>> host filesystem block size is larger than msize. Finally, iounit will be >>> equal >>> to s->msize - P9_IOHDRSZ, which is usually not aligned. >>> >>> Signed-off-by: Jevon Qiao <scaleqiao@gmail.com> >>> --- >>> hw/9pfs/virtio-9p.c | 19 ++++++++++++++++--- >>> 1 file changed, 16 insertions(+), 3 deletions(-) >>> >>> diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c >>> index f972731..005d3a8 100644 >>> --- a/hw/9pfs/virtio-9p.c >>> +++ b/hw/9pfs/virtio-9p.c >>> @@ -1326,7 +1326,7 @@ out_nofid: >>> static int32_t get_iounit(V9fsPDU *pdu, V9fsPath *path) >>> { >>> struct statfs stbuf; >>> - int32_t iounit = 0; >>> + int32_t iounit = 0, unit = 0; >>> V9fsState *s = pdu->s; >>> >>> /* >>> @@ -1334,8 +1334,21 @@ static int32_t get_iounit(V9fsPDU *pdu, V9fsPath >>> *path) >>> * and as well as less than (client msize - P9_IOHDRSZ)) >>> */ >>> if (!v9fs_co_statfs(pdu, path, &stbuf)) { >>> - iounit = stbuf.f_bsize; >>> - iounit *= (s->msize - P9_IOHDRSZ)/stbuf.f_bsize; >>> + /* >>> + * If host filesystem block size is larger than client msize, >>> + * we will use PAGESIZE as the unit. The reason why we choose >>> + * PAGESIZE is because the data will be splitted in terms of >>> + * PAGESIZE in the virtio layer. In this case, the final >>> + * iounit is equal to the value of ((msize/unit) - 1) * unit. >>> + */ >>> + if (stbuf.f_bsize > s->msize) { >>> + iounit = 4096; >>> + unit = 4096; >> What page size it should be guest or host ?. Also why 4096 ?. ppc64 use >> 64K page size. > The data to be read or written will be divided into pieces according to the > size of iounit and msize firstly, and then mapped to pages before being > added > into virtqueue. Since all these operations happen in the guest side, so the > page size should be guest. Please correct me if I'm wrong. I am not sure I understand the details correctly. iounit is the size that we use in client_read to determine the size in which we should request I/O from the client. But we still can't do I/O in size larger than s->msize. If you look at the client side (kernel 9p fs), you will find rsize = fid->iounit; if (!rsize || rsize > clnt->msize-P9_IOHDRSZ) rsize = clnt->msize - P9_IOHDRSZ; if your iounit calculation ends up zero, that should be handled correctly by if (!iounit) { iounit = s->msize - P9_IOHDRSZ; } return iounit; So what is the issue here. ? -aneesh
Hi Aneesh, > I am not sure I understand the details correctly. iounit is the size > that we use in client_read to determine the size in which > we should request I/O from the client. But we still can't do I/O in size > larger than s->msize. If you look at the client side (kernel 9p fs), you > will find > > rsize = fid->iounit; > if (!rsize || rsize > clnt->msize-P9_IOHDRSZ) > rsize = clnt->msize - P9_IOHDRSZ; Yes, I know this. > if your iounit calculation ends up zero, that should be handled > correctly by > > if (!iounit) { > iounit = s->msize - P9_IOHDRSZ; > } > return iounit; > > > So what is the issue here. ? This will result in an alignment issue while mapping the I/O requested by client into pages in the function of p9_nr_pages(). int p9_nr_pages(char *data, int len) { unsigned long start_page, end_page; start_page = (unsigned long)data >> PAGE_SHIFT; end_page = ((unsigned long)data + len + PAGE_SIZE - 1) >> PAGE_SHIFT; return end_page - start_page; } Please see the following experiment I did without the fix. 1) Start qemu with cephfs, $ qemu-system-x86_64 /root/CentOS---6.6-64bit---2015-03-06-a.qcow2 -smp 4 -m 4096 -fsdev cephfs,security_model=passthrough,id=fsdev0,path=/ -device virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=cephfs --enable-kvm -nographic -net nic -net tap,ifname=tap0,script=no,downscript=no 2) Mount the fs in the guest. [root@localhost ~]# mount -t 9p -o trans=virtio,version=9p2000.L cephfs /mnt [root@localhost ~]# ls -lah /mnt/8kfile -rw-r--r-- 1 root root 8.0K 2016-02-19 09:37 /mnt/8kfile In this case, I used the default msize which is 8192(in Byte). Since cephfs is using 4M as the f_bsize, the iounit will be 8168 as P9_IOHDRSZ is equal to 24. 3) Run the following systemtap script to trace the paging result, [root@localhost ~]# cat p9_read.stp probe kernel.function("p9_virtio_zc_request").call { printf("p9_virtio_zc_request: inlen size is %d\n", int_arg(5)); } probe kernel.function("p9_nr_pages").call { printf("p9_nr_pages: start_page = %ld\n", int_arg(1) >> 12); printf("p9_nr_pages: end_age = %ld\n", (int_arg(1) + 8168 + 4096 -1) >> 12); } 4) The output I got when I copied out the file /mnt/8kfile to /tmp/ directory, p9_virtio_zc_request: inlen size is 8168 p9_nr_pages: start_page = 34293757815 p9_nr_pages: end_age = 34293757818 Per the text in red(start_page = 34293757815, end_page = 34293757818), it turns out 8k data will be mapped into three pages. This could hurt the performance. Actually, I enabled the cephfs debug functionality added by me to see how the data is distributed in this case, the result is as follows, CEPHFS_DEBUG: cephfs_preadv iov_len=4096 CEPHFS_DEBUG: cephfs_preadv iov_len=4072 CEPHFS_DEBUG: cephfs_preadv iov_len=24 This patch aims to fix this. And the result turns out it works quite well, all the data is well aligned. p9_virtio_zc_request: inlen size is 4096 p9_nr_pages: start_page = 34203171814 p9_nr_pages: end_age = 34203171815 p9_virtio_zc_request: inlen size is 4096 p9_nr_pages: start_page = 34203171815 p9_nr_pages: end_age = 34203171816 CEPHFS_DEBUG: cephfs_preadv iov_len=4096 CEPHFS_DEBUG: cephfs_preadv iov_len=4096 Thanks, Jevon > -aneesh >
Hi Greg, >>>> From: Jevon Qiao <scaleqiao@gmail.com> >>>> Date: Sun, 14 Feb 2016 15:11:08 +0800 >>>> Subject: [PATCH] hw/9pfs: fix alignment issue when host filesystem block >>>> size >>>> is larger than client msize. >>>> >>>> Per the previous implementation, iounit will be assigned to be 0 after the >>>> first if statement as (s->msize - P9_IOHDRSZ)/stbuf.f_bsize will be zero >>>> when >>>> host filesystem block size is larger than msize. Finally, iounit will be >>>> equal >>>> to s->msize - P9_IOHDRSZ, which is usually not aligned. >>>> >>>> Signed-off-by: Jevon Qiao <scaleqiao@gmail.com> >>>> --- >>>> hw/9pfs/virtio-9p.c | 19 ++++++++++++++++--- > Hmmm I just realize your tree is not up-to-date since hw/9pfs/virtio-9p.c got > renamed with this recent commit: > > commit 60ce86c7140d5ca33d5fd87ce821681165d06b2a > Author: Wei Liu <wei.liu2@citrix.com> > Date: Thu Jan 7 18:42:20 2016 +0000 > > 9pfs: rename virtio-9p.c to 9p.c > > Also 9p.c only contains generic code now, not related to virtio... see below. The feature was finished before it happened, I'm sorry I did not sync my tree up with master. >>>> 1 file changed, 16 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c >>>> index f972731..005d3a8 100644 >>>> --- a/hw/9pfs/virtio-9p.c >>>> +++ b/hw/9pfs/virtio-9p.c >>>> @@ -1326,7 +1326,7 @@ out_nofid: >>>> static int32_t get_iounit(V9fsPDU *pdu, V9fsPath *path) >>>> { >>>> struct statfs stbuf; >>>> - int32_t iounit = 0; >>>> + int32_t iounit = 0, unit = 0; >>>> V9fsState *s = pdu->s; >>>> >>>> /* >>>> @@ -1334,8 +1334,21 @@ static int32_t get_iounit(V9fsPDU *pdu, V9fsPath >>>> *path) >>>> * and as well as less than (client msize - P9_IOHDRSZ)) >>>> */ >>>> if (!v9fs_co_statfs(pdu, path, &stbuf)) { >>>> - iounit = stbuf.f_bsize; >>>> - iounit *= (s->msize - P9_IOHDRSZ)/stbuf.f_bsize; >>>> + /* >>>> + * If host filesystem block size is larger than client msize, >>>> + * we will use PAGESIZE as the unit. The reason why we choose >>>> + * PAGESIZE is because the data will be splitted in terms of >>>> + * PAGESIZE in the virtio layer. In this case, the final > ... and here you mention virtio. Does this code really belong here ? Sorry for confusing you, the comment might not be very clear. Here I mean the issue I mentioned in another thread with Aneesh. It's not related to virtio. >>>> + * iounit is equal to the value of ((msize/unit) - 1) * unit. >>>> + */ >>>> + if (stbuf.f_bsize > s->msize) { >>>> + iounit = 4096; >>>> + unit = 4096; > This looks weird when reading the initial comment in get_iounit()... is > iounit a multiple of stbuf.f_bsize in this case ? Yes, I think so. The stbuf.f_bsize refers to the iounit of backend filesystem, and to comply with the backend is a right way to go always. >>> What page size it should be guest or host ?. Also why 4096 ?. ppc64 use >>> 64K page size. >> The data to be read or written will be divided into pieces according to the >> size of iounit and msize firstly, and then mapped to pages before being >> added >> into virtqueue. Since all these operations happen in the guest side, so the >> page size should be guest. Please correct me if I'm wrong. >> >> As for the number 4096, It's the default value in Linux OS. I did not take >> other platforms into account, it's my fault. To make it suitable for all >> platforms, >> shall I use the function getpagesize() here? >> > getpagesize() will return the host page size. If you need the guest page size, > you should use TARGET_PAGE_SIZE. > And then you will hit another problem: the 9p.c file is in common-obj and > cannot contain target specific code... Well, good to know this, thank you for sharing this. > Along with the other remark, I'm beginning to think you may need to move this > to virtio-9p-device.c. I'll think of this, thank you for the option. Thanks, Jevon >> Thanks, >> Jevon >>>> + } else { >>>> + iounit = stbuf.f_bsize; >>>> + unit = stbuf.f_bsize; >>>> + } >>>> + iounit *= (s->msize - P9_IOHDRSZ)/unit; >>>> } >>>> if (!iounit) { >>>> iounit = s->msize - P9_IOHDRSZ; >>>> -- >>> -aneesh >>>
[Removing ceph-devel alias] Hi Aneesh, Any further comment on my reply below? Thanks, Jevon On 19/2/16 16:56, Jevon Qiao wrote: > Hi Aneesh, >> I am not sure I understand the details correctly. iounit is the size >> that we use in client_read to determine the size in which >> we should request I/O from the client. But we still can't do I/O in size >> larger than s->msize. If you look at the client side (kernel 9p fs), you >> will find >> >> rsize = fid->iounit; >> if (!rsize || rsize > clnt->msize-P9_IOHDRSZ) >> rsize = clnt->msize - P9_IOHDRSZ; > Yes, I know this. >> if your iounit calculation ends up zero, that should be handled >> correctly by >> >> if (!iounit) { >> iounit = s->msize - P9_IOHDRSZ; >> } >> return iounit; >> >> >> So what is the issue here. ? > This will result in an alignment issue while mapping the I/O requested by > client into pages in the function of p9_nr_pages(). > > int p9_nr_pages(char *data, int len) > { > unsigned long start_page, end_page; > start_page = (unsigned long)data >> PAGE_SHIFT; > end_page = ((unsigned long)data + len + PAGE_SIZE - 1) >> > PAGE_SHIFT; > return end_page - start_page; > } > > Please see the following experiment I did without the fix. > > 1) Start qemu with cephfs, > > $ qemu-system-x86_64 /root/CentOS---6.6-64bit---2015-03-06-a.qcow2 > -smp 4 -m 4096 -fsdev > cephfs,security_model=passthrough,id=fsdev0,path=/ -device > virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=cephfs --enable-kvm > -nographic -net nic -net tap,ifname=tap0,script=no,downscript=no > > > 2) Mount the fs in the guest. > > [root@localhost ~]# mount -t 9p -o trans=virtio,version=9p2000.L > cephfs /mnt > [root@localhost ~]# ls -lah /mnt/8kfile > -rw-r--r-- 1 root root 8.0K 2016-02-19 09:37 /mnt/8kfile > > In this case, I used the default msize which is 8192(in Byte). Since > cephfs > is using 4M as the f_bsize, the iounit will be 8168 as P9_IOHDRSZ is > equal to 24. > > 3) Run the following systemtap script to trace the paging result, > > [root@localhost ~]# cat p9_read.stp > probe kernel.function("p9_virtio_zc_request").call > { > printf("p9_virtio_zc_request: inlen size is %d\n", int_arg(5)); > } > > probe kernel.function("p9_nr_pages").call > { > printf("p9_nr_pages: start_page = %ld\n", int_arg(1) >> 12); > printf("p9_nr_pages: end_age = %ld\n", (int_arg(1) + 8168 + > 4096 -1) >> 12); > } > > 4) The output I got when I copied out the file /mnt/8kfile to /tmp/ > directory, > > p9_virtio_zc_request: inlen size is 8168 > p9_nr_pages: start_page = 34293757815 > p9_nr_pages: end_age = 34293757818 > > Per the text in red(start_page = 34293757815, end_page = 34293757818), > it turns out 8k data will be mapped into three pages. This could hurt the > performance. > > Actually, I enabled the cephfs debug functionality added by me to see > how the data is distributed in this case, the result is as follows, > > CEPHFS_DEBUG: cephfs_preadv iov_len=4096 > CEPHFS_DEBUG: cephfs_preadv iov_len=4072 > CEPHFS_DEBUG: cephfs_preadv iov_len=24 > > This patch aims to fix this. And the result turns out it works quite > well, all the > data is well aligned. > > p9_virtio_zc_request: inlen size is 4096 > p9_nr_pages: start_page = 34203171814 > p9_nr_pages: end_age = 34203171815 > p9_virtio_zc_request: inlen size is 4096 > p9_nr_pages: start_page = 34203171815 > p9_nr_pages: end_age = 34203171816 > > CEPHFS_DEBUG: cephfs_preadv iov_len=4096 > CEPHFS_DEBUG: cephfs_preadv iov_len=4096 > > Thanks, > Jevon >> -aneesh >> > >
Any further question/comment on this patch? Thanks, Jevon On 24/2/16 15:04, Jevon Qiao wrote: > [Removing ceph-devel alias] > > Hi Aneesh, > > Any further comment on my reply below? > > Thanks, > Jevon > On 19/2/16 16:56, Jevon Qiao wrote: >> Hi Aneesh, >>> I am not sure I understand the details correctly. iounit is the size >>> that we use in client_read to determine the size in which >>> we should request I/O from the client. But we still can't do I/O in >>> size >>> larger than s->msize. If you look at the client side (kernel 9p fs), >>> you >>> will find >>> >>> rsize = fid->iounit; >>> if (!rsize || rsize > clnt->msize-P9_IOHDRSZ) >>> rsize = clnt->msize - P9_IOHDRSZ; >> Yes, I know this. >>> if your iounit calculation ends up zero, that should be handled >>> correctly by >>> >>> if (!iounit) { >>> iounit = s->msize - P9_IOHDRSZ; >>> } >>> return iounit; >>> >>> >>> So what is the issue here. ? >> This will result in an alignment issue while mapping the I/O >> requested by >> client into pages in the function of p9_nr_pages(). >> >> int p9_nr_pages(char *data, int len) >> { >> unsigned long start_page, end_page; >> start_page = (unsigned long)data >> PAGE_SHIFT; >> end_page = ((unsigned long)data + len + PAGE_SIZE - 1) >> >> PAGE_SHIFT; >> return end_page - start_page; >> } >> >> Please see the following experiment I did without the fix. >> >> 1) Start qemu with cephfs, >> >> $ qemu-system-x86_64 /root/CentOS---6.6-64bit---2015-03-06-a.qcow2 >> -smp 4 -m 4096 -fsdev >> cephfs,security_model=passthrough,id=fsdev0,path=/ -device >> virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=cephfs --enable-kvm >> -nographic -net nic -net tap,ifname=tap0,script=no,downscript=no >> >> >> 2) Mount the fs in the guest. >> >> [root@localhost ~]# mount -t 9p -o trans=virtio,version=9p2000.L >> cephfs /mnt >> [root@localhost ~]# ls -lah /mnt/8kfile >> -rw-r--r-- 1 root root 8.0K 2016-02-19 09:37 /mnt/8kfile >> >> In this case, I used the default msize which is 8192(in Byte). Since >> cephfs >> is using 4M as the f_bsize, the iounit will be 8168 as P9_IOHDRSZ is >> equal to 24. >> >> 3) Run the following systemtap script to trace the paging result, >> >> [root@localhost ~]# cat p9_read.stp >> probe kernel.function("p9_virtio_zc_request").call >> { >> printf("p9_virtio_zc_request: inlen size is %d\n", int_arg(5)); >> } >> >> probe kernel.function("p9_nr_pages").call >> { >> printf("p9_nr_pages: start_page = %ld\n", int_arg(1) >> 12); >> printf("p9_nr_pages: end_age = %ld\n", (int_arg(1) + 8168 + >> 4096 -1) >> 12); >> } >> >> 4) The output I got when I copied out the file /mnt/8kfile to /tmp/ >> directory, >> >> p9_virtio_zc_request: inlen size is 8168 >> p9_nr_pages: start_page = 34293757815 >> p9_nr_pages: end_age = 34293757818 >> >> Per the text in red(start_page = 34293757815, end_page = 34293757818), >> it turns out 8k data will be mapped into three pages. This could hurt >> the >> performance. >> >> Actually, I enabled the cephfs debug functionality added by me to see >> how the data is distributed in this case, the result is as follows, >> >> CEPHFS_DEBUG: cephfs_preadv iov_len=4096 >> CEPHFS_DEBUG: cephfs_preadv iov_len=4072 >> CEPHFS_DEBUG: cephfs_preadv iov_len=24 >> >> This patch aims to fix this. And the result turns out it works quite >> well, all the >> data is well aligned. >> >> p9_virtio_zc_request: inlen size is 4096 >> p9_nr_pages: start_page = 34203171814 >> p9_nr_pages: end_age = 34203171815 >> p9_virtio_zc_request: inlen size is 4096 >> p9_nr_pages: start_page = 34203171815 >> p9_nr_pages: end_age = 34203171816 >> >> CEPHFS_DEBUG: cephfs_preadv iov_len=4096 >> CEPHFS_DEBUG: cephfs_preadv iov_len=4096 >> >> Thanks, >> Jevon >>> -aneesh >>> >> >> >
diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c index f972731..005d3a8 100644 --- a/hw/9pfs/virtio-9p.c +++ b/hw/9pfs/virtio-9p.c @@ -1326,7 +1326,7 @@ out_nofid: static int32_t get_iounit(V9fsPDU *pdu, V9fsPath *path) { struct statfs stbuf; - int32_t iounit = 0; + int32_t iounit = 0, unit = 0; V9fsState *s = pdu->s; /* @@ -1334,8 +1334,21 @@ static int32_t get_iounit(V9fsPDU *pdu, V9fsPath *path) * and as well as less than (client msize - P9_IOHDRSZ)) */ if (!v9fs_co_statfs(pdu, path, &stbuf)) { - iounit = stbuf.f_bsize; - iounit *= (s->msize - P9_IOHDRSZ)/stbuf.f_bsize; + /* + * If host filesystem block size is larger than client msize, + * we will use PAGESIZE as the unit. The reason why we choose + * PAGESIZE is because the data will be splitted in terms of + * PAGESIZE in the virtio layer. In this case, the final + * iounit is equal to the value of ((msize/unit) - 1) * unit. + */ + if (stbuf.f_bsize > s->msize) { + iounit = 4096; + unit = 4096; + } else { + iounit = stbuf.f_bsize; + unit = stbuf.f_bsize; + } + iounit *= (s->msize - P9_IOHDRSZ)/unit; } if (!iounit) {