diff mbox

[8/9] pnfs/blocklayout: return layouts on setattr

Message ID 1410362617-28018-9-git-send-email-hch@lst.de (mailing list archive)
State New, archived
Headers show

Commit Message

Christoph Hellwig Sept. 10, 2014, 3:23 p.m. UTC
This speads up truncate-heavy workloads like fsx by multiple orders of
magnitude.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/nfs/blocklayout/blocklayout.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Peng Tao Sept. 11, 2014, 2:24 p.m. UTC | #1
On Wed, Sep 10, 2014 at 11:23 PM, Christoph Hellwig <hch@lst.de> wrote:
> This speads up truncate-heavy workloads like fsx by multiple orders of
> magnitude.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/nfs/blocklayout/blocklayout.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
> index a7524c4..d5a2b87 100644
> --- a/fs/nfs/blocklayout/blocklayout.c
> +++ b/fs/nfs/blocklayout/blocklayout.c
> @@ -799,7 +799,8 @@ static struct pnfs_layoutdriver_type blocklayout_type = {
>         .id                             = LAYOUT_BLOCK_VOLUME,
>         .name                           = "LAYOUT_BLOCK_VOLUME",
>         .owner                          = THIS_MODULE,
> -       .flags                          = PNFS_READ_WHOLE_PAGE,
> +       .flags                          = PNFS_LAYOUTRET_ON_SETATTR |
> +                                         PNFS_READ_WHOLE_PAGE,
The reason I didn't add it was because PNFS_LAYOUTRET_ON_SETATTR is
too much for blocks layout. What we really want is to return layouts
on truncate and chown, instead of _all_ setattr requests.

Boaz, does object layout require return on setattr for other reasons?
If not, I'd suggest we change PNFS_LAYOUTRET_ON_SETATTR to return only
on chown/truncate events.

Thanks,
Tao
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Boaz Harrosh Sept. 11, 2014, 2:42 p.m. UTC | #2
On 09/11/2014 05:24 PM, Peng Tao wrote:
> On Wed, Sep 10, 2014 at 11:23 PM, Christoph Hellwig <hch@lst.de> wrote:
>> This speads up truncate-heavy workloads like fsx by multiple orders of
>> magnitude.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> ---
>>  fs/nfs/blocklayout/blocklayout.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
>> index a7524c4..d5a2b87 100644
>> --- a/fs/nfs/blocklayout/blocklayout.c
>> +++ b/fs/nfs/blocklayout/blocklayout.c
>> @@ -799,7 +799,8 @@ static struct pnfs_layoutdriver_type blocklayout_type = {
>>         .id                             = LAYOUT_BLOCK_VOLUME,
>>         .name                           = "LAYOUT_BLOCK_VOLUME",
>>         .owner                          = THIS_MODULE,
>> -       .flags                          = PNFS_READ_WHOLE_PAGE,
>> +       .flags                          = PNFS_LAYOUTRET_ON_SETATTR |
>> +                                         PNFS_READ_WHOLE_PAGE,
> The reason I didn't add it was because PNFS_LAYOUTRET_ON_SETATTR is
> too much for blocks layout. What we really want is to return layouts
> on truncate and chown, instead of _all_ setattr requests.
> 
> Boaz, does object layout require return on setattr for other reasons?
> If not, I'd suggest we change PNFS_LAYOUTRET_ON_SETATTR to return only
> on chown/truncate events.
> 

ACK. chown/truncate

> Thanks,
> Tao
> 

Thanks
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Sept. 11, 2014, 3:25 p.m. UTC | #3
On Thu, Sep 11, 2014 at 10:24:04PM +0800, Peng Tao wrote:
> The reason I didn't add it was because PNFS_LAYOUTRET_ON_SETATTR is
> too much for blocks layout. What we really want is to return layouts
> on truncate and chown, instead of _all_ setattr requests.
> 
> Boaz, does object layout require return on setattr for other reasons?
> If not, I'd suggest we change PNFS_LAYOUTRET_ON_SETATTR to return only
> on chown/truncate events.

I was actually going to ask the same question, I can't see a point
why the object layout driver would want it on any other setattr.

In fact it could probably be narrowed down to chown or truncate to a smaller
size.

I'd also love to know why we don't want to do this for the filelayout driver.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust Sept. 11, 2014, 3:38 p.m. UTC | #4
On Thu, Sep 11, 2014 at 11:25 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Thu, Sep 11, 2014 at 10:24:04PM +0800, Peng Tao wrote:
>> The reason I didn't add it was because PNFS_LAYOUTRET_ON_SETATTR is
>> too much for blocks layout. What we really want is to return layouts
>> on truncate and chown, instead of _all_ setattr requests.
>>
>> Boaz, does object layout require return on setattr for other reasons?
>> If not, I'd suggest we change PNFS_LAYOUTRET_ON_SETATTR to return only
>> on chown/truncate events.
>
> I was actually going to ask the same question, I can't see a point
> why the object layout driver would want it on any other setattr.
>
> In fact it could probably be narrowed down to chown or truncate to a smaller
> size.
>
> I'd also love to know why we don't want to do this for the filelayout driver.
>

Why would it be needed? The layout isn't expected to change. If the
chown affects permissions then it is up to the DS to enforce that
(although POSIX does not require it to do that).
Christoph Hellwig Sept. 11, 2014, 3:48 p.m. UTC | #5
On Thu, Sep 11, 2014 at 11:38:24AM -0400, Trond Myklebust wrote:
> Why would it be needed? The layout isn't expected to change. If the
> chown affects permissions then it is up to the DS to enforce that
> (although POSIX does not require it to do that).

I was wondering about the truncate case.  Even if the DS needs to be
able to enforce the new size it seems pointless to keep a layout beyond the
size around.

I don't really see a need to drop on a chown for the blocklayout or
objlayout drivers either, given that these semantics are enforced at a higher
level.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust Sept. 11, 2014, 4:14 p.m. UTC | #6
On Thu, Sep 11, 2014 at 11:48 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Thu, Sep 11, 2014 at 11:38:24AM -0400, Trond Myklebust wrote:
>> Why would it be needed? The layout isn't expected to change. If the
>> chown affects permissions then it is up to the DS to enforce that
>> (although POSIX does not require it to do that).
>
> I was wondering about the truncate case.  Even if the DS needs to be
> able to enforce the new size it seems pointless to keep a layout beyond the
> size around.

In the files layout case, it is actually quite common for the server
to hand out an "infinite" sized layout in response to a LAYOUTGET. It
means that the client doesn't need to ask for a new layout in order to
append to the file.

> I don't really see a need to drop on a chown for the blocklayout or
> objlayout drivers either, given that these semantics are enforced at a higher
> level.
Boaz Harrosh Sept. 14, 2014, 10:55 a.m. UTC | #7
On 09/11/2014 07:14 PM, Trond Myklebust wrote:
<>
> In the files layout case, it is actually quite common for the server
> to hand out an "infinite" sized layout in response to a LAYOUTGET. It
> means that the client doesn't need to ask for a new layout in order to
> append to the file.
> 

You mean to say that: "The Linux files-layout driver only supports infinite
 sized layout, so it is pointless to return a layout which will be returned
 again, exactly the same"

Because I agree that "an infinite sized layout" need not be returned but
an none-infinite should, only we do not have any.

Cheers
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust Sept. 14, 2014, 1:24 p.m. UTC | #8
On Sun, Sep 14, 2014 at 6:55 AM, Boaz Harrosh <openosd@gmail.com> wrote:
> On 09/11/2014 07:14 PM, Trond Myklebust wrote:
> <>
>> In the files layout case, it is actually quite common for the server
>> to hand out an "infinite" sized layout in response to a LAYOUTGET. It
>> means that the client doesn't need to ask for a new layout in order to
>> append to the file.
>>
>
> You mean to say that: "The Linux files-layout driver only supports infinite
>  sized layout, so it is pointless to return a layout which will be returned
>  again, exactly the same"
>
> Because I agree that "an infinite sized layout" need not be returned but
> an none-infinite should, only we do not have any.
>

No Boaz. I mean that it is utterly pointless and stupid to return a
layout that doesn't preallocate any resources when it isn't necessary
to do so.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Boaz Harrosh Sept. 14, 2014, 1:51 p.m. UTC | #9
On 09/14/2014 04:24 PM, Trond Myklebust wrote:
> On Sun, Sep 14, 2014 at 6:55 AM, Boaz Harrosh <openosd@gmail.com> wrote:
<>
> 
> No Boaz. I mean that it is utterly pointless and stupid to return a
> layout that doesn't preallocate any resources when it isn't necessary
> to do so.
> 

"preallocate any resources" where? at the client it might not but the server
might have allocated resources per lo-segment.
For example a CEPH server allocates a device_id per file-lo-segment. If you
lo_return a truncated segment it might be able to free that device_id.

Cheers
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust Sept. 14, 2014, 2:16 p.m. UTC | #10
On Sun, Sep 14, 2014 at 9:51 AM, Boaz Harrosh <boaz@plexistor.com> wrote:
> On 09/14/2014 04:24 PM, Trond Myklebust wrote:
>> On Sun, Sep 14, 2014 at 6:55 AM, Boaz Harrosh <openosd@gmail.com> wrote:
> <>
>>
>> No Boaz. I mean that it is utterly pointless and stupid to return a
>> layout that doesn't preallocate any resources when it isn't necessary
>> to do so.
>>
>
> "preallocate any resources" where? at the client it might not but the server
> might have allocated resources per lo-segment.
> For example a CEPH server allocates a device_id per file-lo-segment. If you
> lo_return a truncated segment it might be able to free that device_id.
>

There is no requirement anywhere in RFC5661 to state that a truncate
must be prefixed by a layout return. Not in section 12 (Parallel NFS)
nor in section 13 (NFSv4.1 as a Storage Protocol in pNFS: the File
Layout Type), nor in the ERRATA. Existing pNFS files servers (i.e.
NetApp) don't need it either.

If a future CEPH server wants to add its own requirements, then it is
free to issue a layoutrecall. However my understanding from
conversations with Matt (the Ganesha release notes appear to confirm)
was that the CRUSH algorithm is pretty much impossible to implement as
a files layout type, so I'm confused as to why it would be a problem
in the first place.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Boaz Harrosh Sept. 14, 2014, 2:28 p.m. UTC | #11
On 09/14/2014 05:16 PM, Trond Myklebust wrote:
<>
> 
> There is no requirement anywhere in RFC5661 to state that a truncate
> must be prefixed by a layout return. Not in section 12 (Parallel NFS)
> nor in section 13 (NFSv4.1 as a Storage Protocol in pNFS: the File
> Layout Type), nor in the ERRATA. Existing pNFS files servers (i.e.
> NetApp) don't need it either.
> 

Yes! which is why it sucks ;-)

> If a future CEPH server wants to add its own requirements, then it is
> free to issue a layoutrecall. However my understanding from
> conversations with Matt (the Ganesha release notes appear to confirm)
> was that the CRUSH algorithm is pretty much impossible to implement as
> a files layout type, so I'm confused as to why it would be a problem
> in the first place.
> 

"impossible to implement" without lo-segments yes. *With* lo-segments it
is my opinion that it is possible. With the contraption of device_id per
segment.

But this argument is mute we both agree that in current code it is not
needed. 

[ If at future time someone will want to implement lo-segments for
 Linux files layout, I think it could be a nice optimization, just as it
 is with objects and blocks.

 Imagine a most simple files MDS that wants to not create filehandles(files)
 on DSs that will not be reached. (Create on demand). In such case deleting
 the DS-files on truncate might make sense no?
]

Xheers
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index a7524c4..d5a2b87 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -799,7 +799,8 @@  static struct pnfs_layoutdriver_type blocklayout_type = {
 	.id				= LAYOUT_BLOCK_VOLUME,
 	.name				= "LAYOUT_BLOCK_VOLUME",
 	.owner				= THIS_MODULE,
-	.flags				= PNFS_READ_WHOLE_PAGE,
+	.flags				= PNFS_LAYOUTRET_ON_SETATTR |
+					  PNFS_READ_WHOLE_PAGE,
 	.read_pagelist			= bl_read_pagelist,
 	.write_pagelist			= bl_write_pagelist,
 	.alloc_layout_hdr		= bl_alloc_layout_hdr,